Merkle-AGI v7-W — Spatial-Temporal Substrate#

Author:

fox + agent blackops, on the unsandbox / unturf / permacomputer platform

Date:

2026-05-09 (draft v0)

Status:

substrate paper for ticket #000013; closure draft.

This document specifies the third commitment substrate in the Merkle-AGI lineage — sister to v7 (logic / math) and arborist v9.8 (language / claim-lattice). v7-W commits derived spatial-temporal world-state: objects, relations, events, places, agent traces, observations. It is the substrate a world-model dreams in.

Read alongside:

  • docs/v7w-frontier-catalog.md — canonical ε-frontier catalog.

  • docs/pi-star.rst — the π* registry where the v7-W kernels will register once Phase 1 lands.

  • The v7 plastic-training spec (sister substrate; multimodal composition rules apply at v7-W ↔ v7 boundaries).

Part 1 — Introduction & motivation#

Fox’s framing identifies three substrates needed for ASI:

3 > (language, logic, substrate)
1 = recursive-falsification merkle-agi (logic / math)
2 = language / claim-lattice (arborist v9.8)
3 = spatial-temporal vision / world-models  ← v7-W

Without a world-state substrate, an agent cannot:

  • Maintain persistent identity across time. Cache is keyed on text; physical “where am I” has no commitment.

  • Reason causally about physics. Logic substrate verifies proof steps; world substrate verifies “what happened next.”

  • Ground multi-modal claims. A claim about an object’s location can’t be falsified without a state commitment.

  • Support robotic / embodied use. No way to commit “I observed X at (t, x, y, z) with confidence c.”

  • Audit video / time-series. No π* for temporal signals beyond time-series-quantized@v1 (which handles 1D scalar streams, not multi-object scenes).

v7 § 11 multimodal composition handles vision + language structurally — commit the conv kernel, the bridge, the projection. It does not commit world-state: the abstract scene representation, spatial relations, temporal predicates. Those are derived from the model’s forward pass and never committed as first-class objects.

What this substrate is NOT#

  • Not a SLAM stack, not a renderer, not a video codec.

  • Not raw pixels or raw audio.

  • Not the model that produces world-state. (That’s v7’s domain.)

What it IS#

The canonical-encoding + commitment layer for derived world-state:

- objects:    { id, class, bbox, pose, confidence }
- relations:  { subject_id, predicate, object_id, time_window }
- events:     { type, t_start, t_end, participants, place }
- places:     { id, frame_of_reference, geometry, parent_place }
- agents:     { id, position_trace, pose_trace, attention_trace }
- observations: { observer_agent, t, frame, claim_about,
                  confidence }

Each tuple-class has a canonical-projection π*_w mapping the free-form input to byte-deterministic canonical bytes. SHA-256 of the canonical bytes is the equivalence-class identity.

Part 2 — Substrate definition#

§2.1 Hard constraints (carried from #000013)#

  • Stays inside SQD A1 (canonical encoding), A2 (public quantization), A3 (collision-resistant hash).

  • No new axiom.

  • Every π*_w is defined on quantized integer state, not on continuous tensors. Continuous data (poses, bounding boxes, confidences) gets gridded explicitly; the grid choice is part of the commitment.

§2.2 Hierarchical grid spatial discretization#

Recommended: S2-style hierarchical cell hierarchy for Earth- scale geographic frames; octree levels for object-fixed local frames; quadtree levels for image-plane / 2D-floor plans.

The substrate manifest declares its set of grid levels:

{
  "grid": {
    "type": "octree",
    "frame": "object-fixed",
    "origin_committed": "<32-byte SHA-256 of frame definition>",
    "level_min": 0,
    "level_max": 24,
    "extent_meters": [10.0, 10.0, 10.0]
  }
}

Each commitment names which level it’s anchored at:

{
  "kind": "object_observation",
  "grid_level": 18,
  "cell_id": "0xabc12...",
  "...": "..."
}

Why hierarchical: matches how spatial reasoning works (coarse-to-fine), matches established standards (S2 / H3 for geographic indexing, octree for SLAM voxel grids, quadtree for mapping), and keeps π*_w finitely specified with one cell-id per declared level.

Trade-off captured by ticket §6: discretization tax. Every spatial claim incurs grid-rounding cost; ε at world-model frontiers must remain tight enough to be useful. Empirical validation in Part 4.

§2.3 Frame canonicalization#

Recommendation: B from ticket §2.2 — frame as committed object with explicit transforms.

Every observation declares its frame:

{
  "frame_id": "0xdead123",
  "frame_kind": "object-fixed | global-ECEF | gravity-aligned-local | …",
  "frame_definition": "<canonical bytes>",
  "parent_frame_id": "0xparent…",
  "transform_to_parent": "<pose 4×4 quantized>"
}

The frame_id is SHA-256 of the canonical frame definition. Cross-frame reasoning requires explicit transforms — also committed. Matches v7’s “every causally relevant transformation must be committed” axiom.

Transform commitment shape (4×4 SE(3) homogeneous matrix, rotational components quantized via SO(3) → axis-angle integer encoding):

{
  "kind": "frame_transform",
  "from_frame_id": "...",
  "to_frame_id": "...",
  "axis_angle_quantized": [<int32 x 3>],
  "translation_quantized": [<int32 x 3>],
  "Δ_rot_milli_radians": 1,
  "Δ_trans_micrometers": 100
}

The Δ_rot / Δ_trans values pin the quantization grid; reusing the same Δ across transforms in a manifest is encouraged.

§2.4 Temporal canonicalization#

Recommendation: B (per-substrate clock) for single-agent deployments; C (Lamport / vector clocks) for multi-agent. Substrate manifest declares which mode.

Single-agent manifest:

{
  "clock": {
    "kind": "wall_clock_ms_since_epoch",
    "epoch": "2025-01-01T00:00:00Z",
    "Δ_t_ms": 1
  }
}

Multi-agent manifest:

{
  "clock": {
    "kind": "lamport",
    "agent_id": "0xagent42",
    "vector_dim": 8,
    "tie_break": "agent_id_lexical_order"
  }
}

Mixed deployments use B locally + C across agents — observations record both their wall-clock value AND their Lamport vector position. Cross-substrate joins (v7-W to arborist’s time-series-quantized@v1) need explicit clock transforms, also committed.

§2.5 Probabilistic commitment#

Default: A — quantized centi-confidence (0–100 integer).

{
  "claim": "...",
  "confidence_centi": 73
}

Opt-in: B — range commitment for safety-critical deployments (medical, robotics) where confidence intervals matter:

{
  "claim": "...",
  "confidence_lo_centi": 65,
  "confidence_hi_centi": 80
}

The substrate manifest declares which mode is in effect for the deployment. Mixed deployments split per claim-class (object_class_probability uses B; observation_existence uses A).

§2.6 The five canonical tuple-classes#

Each gets its own π*_w canonical projection.

π*_w_object — object instances:

{ id, class_label, frame_id, bbox_quantized, pose_quantized,
  confidence_centi, observed_at_logical_time }

π*_w_relation — directed relations between two objects:

{ subject_id, predicate, object_id, time_window,
  confidence_centi }

The predicate is from a substrate-declared whitelist (contains, adjacent_to, approaching, occludes, supports, …) to keep the canonical bytes deterministic. Adding a new predicate requires a substrate version bump.

π*_w_event — temporally-extended interactions:

{ type, t_start, t_end, participants, place_id, confidence_centi }

π*_w_place — recognizable locations:

{ id, frame_of_reference, geometry, parent_place_id }

Geometry is hierarchical-cell-set encoded — a list of (level, cell_id) pairs covering the place’s extent.

π*_w_agent_trace — an agent’s path through space-time:

{ id, frame_id, trace: [(t, pose_quantized, attention_target?), …] }

The trace is a temporally-quantized sequence; reuses time-series-quantized@v1’s sample-array discipline at each of the trace’s spatial dimensions.

Part 3 — Theorems#

§3.1 T1-W — State binding#

Statement. For any world-state commitment C(W) produced by π*_w, the SHA-256 preimage of C(W) uniquely determines the canonical (object, relation, event, place, agent_trace) tuples that π*_w accepted as input, modulo the published grid / frame / clock manifest.

Proof sketch. π*_w is a deterministic byte-projection on quantized integer state (per §2.1 hard constraint). Its output is canonical bytes; SHA-256 is collision-resistant under A3. Therefore commitment determines input modulo the equivalence class π*_w defines (different surface representations that canonicalize identically map to the same commitment).

The “modulo manifest” qualifier matters: same input under different manifests (different Δ_x, Δ_t, frame definition) can produce different commitments. This is by design — the manifest is part of identity.

§3.2 T2-W — Causal completeness#

Statement. A v7-W commitment chain (a sequence of committed observations + transforms over time) is causally complete if and only if every state transition between consecutive observations is justified by a committed transform OR a committed event.

What this rules out. Silent state changes — an object’s pose updating between two observations without a corresponding ego-motion transform OR a corresponding event explaining the update — break causal completeness.

Operational meaning. A v7-W chain that fails T2-W’s check is missing data: either the agent moved without recording its ego-motion (covered by pose_integration ε-frontier), or something happened in the world (covered by a committed event). Either way, the gap is observable as a chain-check failure.

§3.3 T3-W — Frame-transform soundness#

Statement. For any two committed frames F_a, F_b and a committed transform T_{a→b}, applying T_{a→b} to a v7-W commitment in F_a produces a v7-W commitment in F_b that is ε-equivalent to a direct observation in F_b, where ε is bounded by the canonical-projection π*_w’s grid quantization.

Practical implication. Cross-frame reasoning is sound up to the ε floor. Operators trading off precision (coarser grid → faster) accept a measurable error budget per transform.

§3.4 T4-W — ε at affine frontiers#

Statement. At each of the four canonical ε-frontiers (pose_integration, observation_update, object_logits, relation_logits; see Part 4), the verifier kernel is affine on appropriately-canonicalized integer state, and the per-step ε bound is the quantization granularity Δ_step.

Why affine. Pose integration under small-time-step is a linear operator on the previous pose’s integer-encoded state. The Kalman-style observation update is affine in the residual. Object / relation classification logits are affine pre-softmax. Each maps cleanly to v7’s affine-frontier discipline.

This is the load-bearing theorem for ε-proof composability across v7 + v7-W. Without it, multimodal pipelines that route through both substrates can’t bound their cumulative ε.

Part 4 — Verifier kernels#

The four canonical ε-frontiers, each implemented as a deterministic integer kernel.

§4.1 pose_integration#

Input: {prev_pose_quantized, ego_motion_axis_angle_quantized, ego_motion_translation_quantized, Δ_t_ticks}.

Output: next_pose_quantized.

Operator: SE(3) composition. Affine in axis-angle + translation under small-time-step assumption. Quantization follows the manifest’s Δ_rot / Δ_trans declarations.

ε bound: Δ_step_pose = Δ_rot + Δ_trans · |translation_max|.

§4.2 observation_update#

Input: {prior_state_quantized, observation_quantized, innovation_covariance_quantized}.

Output: posterior_state_quantized.

Operator: standard Kalman update — affine in the innovation z - H·prior. Covariance arithmetic uses bigint accumulator discipline (v7 §5) to avoid analog leakage; final result is re-quantized to the manifest’s grid.

ε bound: Δ_step_observation = innovation_quantization + covariance_quantization.

§4.3 object_logits#

Input: {object_features_quantized, classifier_weights_quantized}.

Output: logits_quantized (pre-softmax integer vector).

Operator: linear projection (matrix-vector multiplication on quantized integers). Affine. Reuses v7’s existing object_logits ε-frontier definition; this entry pins the v7-W view of the same kernel for cross-substrate ε proofs.

ε bound: Δ_step_object = manifest’s per-class quantization threshold.

§4.4 relation_logits#

Input: {subject_features_quantized, object_features_quantized, relation_classifier_weights_quantized}.

Output: relation_logits_quantized (pre-softmax over the substrate-declared predicate whitelist).

Operator: bilinear scoring on the (subject, object) feature pair. Affine after the bilinear factorization is canonicalized to its tensor-decomposed form (Tucker / CP).

ε bound: Δ_step_relation = predicate-whitelist size factor + feature quantization.

Part 5 — Multimodal composition with v7#

§5.1 Where v7 ends, v7-W begins#

A v7 vision encoder maps pixels → embeddings. The encoder is v7’s domain (commit conv kernels, attention layers, etc.). When the embedding is consumed by a world-model that produces state, the boundary crosses into v7-W.

Concrete example: a YOLO-style detector outputs (bounding boxes, class probabilities) per image. The convolutional + detection heads are v7. The (bbox, class, confidence) tuples per detected object are v7-W π*_w_object inputs.

§5.2 Cross-substrate ε composition#

When a multimodal pipeline routes through both substrates, cumulative ε is the sum of each substrate’s ε bounds (under the data-processing inequality, lossy projections never reduce ε):

ε_total = ε_v7_encoder + ε_v7w_pose_integration +
          ε_v7w_observation_update + …

Each term is a frontier-named ε from the respective frontier catalog. T4-W (above) ensures the v7-W terms are well-defined.

§5.3 Frame-transform anchoring#

When a v7 vision encoder operates on images from a moving camera, the camera’s frame is the v7-W output’s frame. The transform from camera-frame → world-frame is a v7-W commitment, not a v7 model parameter. Splitting responsibility this way keeps the v7 encoder generic (image-in, embedding-out) and puts spatial-frame discipline entirely on the v7-W side.

Part 6 — Adversarial corners#

§6.1 Frame spoofing#

Adversary commits a frame definition with adversarially-chosen geometry that makes innocuous observations look like target events under the canonical projection. Mitigation:

  • Frame definitions cite their physical anchor: a SHA-256 of the surveying / measurement procedure that established the frame.

  • Frames without committed anchors stay at minimum-warrant level (analogous to arborist’s COPYRIGHT_FOOTER tier).

  • Cross-frame transforms inherit warrant from their source frames; chains route through the weakest link.

§6.2 Time skew#

Adversary commits observations with manipulated timestamps to fabricate causal dependencies. Mitigation:

  • Single-agent: substrate manifest pins clock skew bound Δ_clock_max; observations outside that window are flagged.

  • Multi-agent: Lamport vector clocks make skew observable as a vector-clock inconsistency rather than a wall-clock rewrite.

  • Cross-substrate: v7-W observation timestamps must agree with arborist’s time-series-quantized@v1 Δ_t for any signal the observation references; mismatches surface as cross-substrate chain breaks.

§6.3 Observation injection#

Adversary fabricates observations with high confidence_centi to poison the world-model’s state. Mitigation:

  • Per-observer observation budget committed in the manifest — caps the rate at which a single agent can publish high- confidence claims without supporting evidence.

  • Cross-witness agreement (analogous to arborist’s #000028 multi-witness): an observation backed by independent observers in independent frames warrants more strongly than a solo observation.

  • Adversarial-corner ticket (#000018-W follow-up) for the formal version of this — modeled on #000018’s threat-model + reduction approach.

§6.4 Privacy#

A world-state commitment substrate is also a surveillance substrate. Frame discipline + Phase-2 ZK (ticket #000016) more important here than for text/logic. Substrate manifest MUST declare the deployment’s privacy class:

{
  "privacy": {
    "class": "public | aggregated_only | ZK_with_selective_disclosure",
    "...": "..."
  }
}

Public class is the default for openly-published research / mapping deployments. ZK-with-selective-disclosure is required for any deployment where individual agents’ positions or identities should not be inferable from the commitment chain.

Appendix — Worked example: toy SLAM#

A small concrete trace demonstrating the substrate end-to-end.

Setup. One agent navigating a 10×10×3 m room. RGB camera + IMU. Detects three persistent objects (a chair, a desk, a door) and produces a path trace.

Manifest (committed once at boot).

{
  "v7w_version": "v0",
  "grid": {"type": "octree", "level_min": 0, "level_max": 18,
           "extent_meters": [10.0, 10.0, 3.0]},
  "frames": {
    "world": {"kind": "gravity-aligned-local",
              "anchor": "<sha256 of survey procedure>"},
    "camera_initial": {"parent": "world",
                       "transform": "<identity>"}
  },
  "clock": {"kind": "wall_clock_ms_since_epoch",
            "Δ_t_ms": 1},
  "predicates": ["adjacent_to", "supports", "occludes",
                 "approaches"],
  "privacy": {"class": "public"}
}

Per-tick observations. At each tick, the agent commits:

  1. π*_w_agent_trace — appends a (t, pose, attention) point.

  2. π*_w_object per detected object — bbox + confidence.

  3. π*_w_relation per detected relation — e.g. “chair adjacent_to desk”, “agent approaches door”.

  4. π*_w_event if a state-change occurred — e.g. “door opens at t=4521”.

Commitment chain (10-tick trace, ~50 observations, ~30 KB compressed bytes). Audit replay walks the chain, verifying each observation_update ε bound against the manifest’s grid declarations and confirming T2-W causal completeness.

ε budget for the worked example.

per-tick total ε:
    pose_integration  ≈ Δ_rot + Δ_trans
                      ≈ 1 mrad + 100 µm                   = bounded
    observation_update per object ≈ bbox-quantization
                      ≈ 1 cell at level 18                = bounded
    object_logits     ≈ class-probability granularity
                      ≈ 1/100 (centi-confidence)          = bounded
    relation_logits   ≈ predicate-whitelist + feature
                      quantization                        = bounded

ε_total over 10 ticks: bounded sum of the above; well under
    what a downstream consumer (motion planner / safety filter)
    would care about for room-scale navigation.

Out of scope (re-stated from #000013)#

  • Running a SLAM stack inside arborist. v7-W defines the commitment substrate; world-model engines (SLAM, Gaussian splatting, predictive video) plug in via adapters that are separate tickets.

  • Cross-modal joint reasoning (text claim + spatial state). Needs the cross-domain π* composition theorem (#000015).

  • Specific sensor adapters (LIDAR, RGB-D, IMU). Each is a separate source-adapter ticket once v7-W lands.

Closure#

Closure criterion (#000013 §7): this document plus docs/v7w-frontier-catalog.md plus arborist/world/__init__.py namespace stub. All three land in the same commit closing the ticket.

Open questions tracked separately:

  • Standards adoption — S2 vs custom octree (left-as-recommendation per §2.2; deployment can override via manifest).

  • Privacy implementation — Phase-2 ZK ticket (#000016) covers the cryptographic side once #000013 has a deployment target.

  • Discretization tax — empirical bench of ε at frontier kernels on representative deployments. Captured here as future work.

Implementation tickets that cite this paper land later, one per verifier kernel + sensor adapter.

Status: closure-draft 2026-05-09. Ready for fox + downstream review.


Permacomputer Preamble — License: AGPL-3.0-only

This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.

Our permacomputer is community-owned infrastructure optimized around four values:

  • TRUTH — First principles, math & science, open source code freely distributed.

  • FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.

  • HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.

  • LOVE — Be yourself without hurting others, cooperation through natural law.

NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.