Merkle-AGI v7-W — Spatial-Temporal Substrate
=============================================

:Author: fox + agent blackops, on the unsandbox / unturf / permacomputer platform
:Date: 2026-05-09 (draft v0)
:Status: substrate paper for ticket #000013; closure draft.

This document specifies the **third commitment substrate** in the
Merkle-AGI lineage — sister to v7 (logic / math) and arborist v9.8
(language / claim-lattice). v7-W commits **derived spatial-temporal
world-state**: objects, relations, events, places, agent traces,
observations. It is the substrate a world-model dreams in.

Read alongside:

- ``docs/v7w-frontier-catalog.md`` — canonical ε-frontier catalog.
- ``docs/pi-star.rst`` — the π* registry where the v7-W kernels
  will register once Phase 1 lands.
- The v7 plastic-training spec (sister substrate; multimodal
  composition rules apply at v7-W ↔ v7 boundaries).


Part 1 — Introduction & motivation
----------------------------------

Fox's framing identifies three substrates needed for ASI:

.. code-block:: text

   3 > (language, logic, substrate)
   1 = recursive-falsification merkle-agi (logic / math)
   2 = language / claim-lattice (arborist v9.8)
   3 = spatial-temporal vision / world-models  ← v7-W

Without a world-state substrate, an agent cannot:

- Maintain persistent identity across time. Cache is keyed on text;
  *physical "where am I"* has no commitment.
- Reason causally about physics. Logic substrate verifies proof
  steps; world substrate verifies "what happened next."
- Ground multi-modal claims. A claim about an object's location
  can't be falsified without a state commitment.
- Support robotic / embodied use. No way to commit "I observed X
  at (t, x, y, z) with confidence c."
- Audit video / time-series. No π* for temporal signals beyond
  ``time-series-quantized@v1`` (which handles 1D scalar streams,
  not multi-object scenes).

v7 § 11 multimodal composition handles vision + language
*structurally* — commit the conv kernel, the bridge, the
projection. It does **not** commit world-state: the abstract
scene representation, spatial relations, temporal predicates.
Those are derived from the model's forward pass and never
committed as first-class objects.

What this substrate is NOT
~~~~~~~~~~~~~~~~~~~~~~~~~~

- Not a SLAM stack, not a renderer, not a video codec.
- Not raw pixels or raw audio.
- Not the model that produces world-state. (That's v7's domain.)

What it IS
~~~~~~~~~~

The **canonical-encoding + commitment** layer for derived
world-state:

.. code-block:: text

   - objects:    { id, class, bbox, pose, confidence }
   - relations:  { subject_id, predicate, object_id, time_window }
   - events:     { type, t_start, t_end, participants, place }
   - places:     { id, frame_of_reference, geometry, parent_place }
   - agents:     { id, position_trace, pose_trace, attention_trace }
   - observations: { observer_agent, t, frame, claim_about,
                     confidence }

Each tuple-class has a canonical-projection π*_w mapping the
free-form input to byte-deterministic canonical bytes. SHA-256 of
the canonical bytes is the equivalence-class identity.


Part 2 — Substrate definition
-----------------------------

§2.1 Hard constraints (carried from #000013)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- Stays inside SQD A1 (canonical encoding), A2 (public
  quantization), A3 (collision-resistant hash).
- No new axiom.
- Every π*_w is defined on **quantized integer state**, not on
  continuous tensors. Continuous data (poses, bounding boxes,
  confidences) gets gridded explicitly; the grid choice is part
  of the commitment.

§2.2 Hierarchical grid spatial discretization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Recommended: **S2-style hierarchical cell hierarchy** for Earth-
scale geographic frames; **octree levels** for object-fixed
local frames; **quadtree levels** for image-plane / 2D-floor
plans.

The substrate manifest declares its set of grid levels:

.. code-block:: json

   {
     "grid": {
       "type": "octree",
       "frame": "object-fixed",
       "origin_committed": "<32-byte SHA-256 of frame definition>",
       "level_min": 0,
       "level_max": 24,
       "extent_meters": [10.0, 10.0, 10.0]
     }
   }

Each commitment names which level it's anchored at:

.. code-block:: json

   {
     "kind": "object_observation",
     "grid_level": 18,
     "cell_id": "0xabc12...",
     "...": "..."
   }

**Why hierarchical:** matches how spatial reasoning works
(coarse-to-fine), matches established standards (S2 / H3 for
geographic indexing, octree for SLAM voxel grids, quadtree for
mapping), and keeps π*_w finitely specified with one cell-id
per declared level.

**Trade-off captured by ticket §6:** discretization tax. Every
spatial claim incurs grid-rounding cost; ε at world-model
frontiers must remain tight enough to be useful. Empirical
validation in Part 4.

§2.3 Frame canonicalization
~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommendation: B from ticket §2.2 — frame as committed
object with explicit transforms.**

Every observation declares its frame:

.. code-block:: json

   {
     "frame_id": "0xdead123",
     "frame_kind": "object-fixed | global-ECEF | gravity-aligned-local | …",
     "frame_definition": "<canonical bytes>",
     "parent_frame_id": "0xparent…",
     "transform_to_parent": "<pose 4×4 quantized>"
   }

The ``frame_id`` is SHA-256 of the canonical frame definition.
Cross-frame reasoning requires explicit transforms — also
committed. Matches v7's "every causally relevant transformation
must be committed" axiom.

Transform commitment shape (4×4 SE(3) homogeneous matrix,
rotational components quantized via SO(3) → axis-angle integer
encoding):

.. code-block:: text

   {
     "kind": "frame_transform",
     "from_frame_id": "...",
     "to_frame_id": "...",
     "axis_angle_quantized": [<int32 x 3>],
     "translation_quantized": [<int32 x 3>],
     "Δ_rot_milli_radians": 1,
     "Δ_trans_micrometers": 100
   }

The Δ_rot / Δ_trans values pin the quantization grid; reusing
the same Δ across transforms in a manifest is encouraged.

§2.4 Temporal canonicalization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Recommendation:** B (per-substrate clock) for single-agent
deployments; C (Lamport / vector clocks) for multi-agent.
Substrate manifest declares which mode.

Single-agent manifest:

.. code-block:: json

   {
     "clock": {
       "kind": "wall_clock_ms_since_epoch",
       "epoch": "2025-01-01T00:00:00Z",
       "Δ_t_ms": 1
     }
   }

Multi-agent manifest:

.. code-block:: json

   {
     "clock": {
       "kind": "lamport",
       "agent_id": "0xagent42",
       "vector_dim": 8,
       "tie_break": "agent_id_lexical_order"
     }
   }

Mixed deployments use B locally + C across agents — observations
record both their wall-clock value AND their Lamport vector
position. Cross-substrate joins (v7-W to arborist's
``time-series-quantized@v1``) need explicit clock transforms,
also committed.

§2.5 Probabilistic commitment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Default: A — quantized centi-confidence (0–100 integer).**

.. code-block:: json

   {
     "claim": "...",
     "confidence_centi": 73
   }

**Opt-in: B — range commitment** for safety-critical deployments
(medical, robotics) where confidence intervals matter:

.. code-block:: json

   {
     "claim": "...",
     "confidence_lo_centi": 65,
     "confidence_hi_centi": 80
   }

The substrate manifest declares which mode is in effect for the
deployment. Mixed deployments split per claim-class
(``object_class_probability`` uses B; ``observation_existence``
uses A).

§2.6 The five canonical tuple-classes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Each gets its own π*_w canonical projection.

**π*_w_object** — object instances:

.. code-block:: text

   { id, class_label, frame_id, bbox_quantized, pose_quantized,
     confidence_centi, observed_at_logical_time }

**π*_w_relation** — directed relations between two objects:

.. code-block:: text

   { subject_id, predicate, object_id, time_window,
     confidence_centi }

The ``predicate`` is from a substrate-declared whitelist
(``contains``, ``adjacent_to``, ``approaching``, ``occludes``,
``supports``, …) to keep the canonical bytes deterministic.
Adding a new predicate requires a substrate version bump.

**π*_w_event** — temporally-extended interactions:

.. code-block:: text

   { type, t_start, t_end, participants, place_id, confidence_centi }

**π*_w_place** — recognizable locations:

.. code-block:: text

   { id, frame_of_reference, geometry, parent_place_id }

Geometry is hierarchical-cell-set encoded — a list of (level,
cell_id) pairs covering the place's extent.

**π*_w_agent_trace** — an agent's path through space-time:

.. code-block:: text

   { id, frame_id, trace: [(t, pose_quantized, attention_target?), …] }

The trace is a temporally-quantized sequence; reuses
``time-series-quantized@v1``'s sample-array discipline at each
of the trace's spatial dimensions.


Part 3 — Theorems
-----------------

§3.1 T1-W — State binding
~~~~~~~~~~~~~~~~~~~~~~~~~

**Statement.** For any world-state commitment ``C(W)`` produced
by π*_w, the SHA-256 preimage of ``C(W)`` uniquely determines
the canonical (object, relation, event, place, agent_trace)
tuples that π*_w accepted as input, modulo the published grid /
frame / clock manifest.

**Proof sketch.** π*_w is a deterministic byte-projection on
quantized integer state (per §2.1 hard constraint). Its output
is canonical bytes; SHA-256 is collision-resistant under A3.
Therefore commitment determines input modulo the equivalence
class π*_w defines (different surface representations that
canonicalize identically map to the same commitment).

The "modulo manifest" qualifier matters: same input under
different manifests (different Δ_x, Δ_t, frame definition) can
produce different commitments. This is by design — the manifest
is part of identity.

§3.2 T2-W — Causal completeness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Statement.** A v7-W commitment chain (a sequence of
committed observations + transforms over time) is
**causally complete** if and only if every state transition
between consecutive observations is justified by a committed
transform OR a committed event.

**What this rules out.** Silent state changes — an object's
pose updating between two observations without a corresponding
ego-motion transform OR a corresponding event explaining the
update — break causal completeness.

**Operational meaning.** A v7-W chain that fails T2-W's check
is missing data: either the agent moved without recording its
ego-motion (covered by ``pose_integration`` ε-frontier), or
something happened in the world (covered by a committed event).
Either way, the gap is observable as a chain-check failure.

§3.3 T3-W — Frame-transform soundness
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Statement.** For any two committed frames F_a, F_b and a
committed transform T_{a→b}, applying T_{a→b} to a v7-W
commitment in F_a produces a v7-W commitment in F_b that is
ε-equivalent to a direct observation in F_b, where ε is bounded
by the canonical-projection π*_w's grid quantization.

**Practical implication.** Cross-frame reasoning is sound up to
the ε floor. Operators trading off precision (coarser grid →
faster) accept a measurable error budget per transform.

§3.4 T4-W — ε at affine frontiers
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

**Statement.** At each of the four canonical ε-frontiers
(``pose_integration``, ``observation_update``, ``object_logits``,
``relation_logits``; see Part 4), the verifier kernel is affine
on appropriately-canonicalized integer state, and the per-step
ε bound is the quantization granularity Δ_step.

**Why affine.** Pose integration under small-time-step is a
linear operator on the previous pose's integer-encoded state.
The Kalman-style observation update is affine in the residual.
Object / relation classification logits are affine pre-softmax.
Each maps cleanly to v7's affine-frontier discipline.

This is the **load-bearing theorem** for ε-proof composability
across v7 + v7-W. Without it, multimodal pipelines that route
through both substrates can't bound their cumulative ε.


Part 4 — Verifier kernels
--------------------------

The four canonical ε-frontiers, each implemented as a
deterministic integer kernel.

§4.1 pose_integration
~~~~~~~~~~~~~~~~~~~~~

Input: ``{prev_pose_quantized, ego_motion_axis_angle_quantized,
ego_motion_translation_quantized, Δ_t_ticks}``.

Output: ``next_pose_quantized``.

Operator: SE(3) composition. Affine in axis-angle +
translation under small-time-step assumption. Quantization
follows the manifest's Δ_rot / Δ_trans declarations.

ε bound: ``Δ_step_pose = Δ_rot + Δ_trans · |translation_max|``.

§4.2 observation_update
~~~~~~~~~~~~~~~~~~~~~~~

Input: ``{prior_state_quantized, observation_quantized,
innovation_covariance_quantized}``.

Output: ``posterior_state_quantized``.

Operator: standard Kalman update — affine in the innovation
``z - H·prior``. Covariance arithmetic uses bigint accumulator
discipline (v7 §5) to avoid analog leakage; final result is
re-quantized to the manifest's grid.

ε bound: Δ_step_observation = innovation_quantization +
covariance_quantization.

§4.3 object_logits
~~~~~~~~~~~~~~~~~~

Input: ``{object_features_quantized, classifier_weights_quantized}``.

Output: ``logits_quantized`` (pre-softmax integer vector).

Operator: linear projection (matrix-vector multiplication on
quantized integers). Affine. Reuses v7's existing
``object_logits`` ε-frontier definition; this entry pins the
v7-W view of the same kernel for cross-substrate ε proofs.

ε bound: Δ_step_object = manifest's per-class quantization
threshold.

§4.4 relation_logits
~~~~~~~~~~~~~~~~~~~~

Input: ``{subject_features_quantized, object_features_quantized,
relation_classifier_weights_quantized}``.

Output: ``relation_logits_quantized`` (pre-softmax over the
substrate-declared predicate whitelist).

Operator: bilinear scoring on the (subject, object) feature
pair. Affine after the bilinear factorization is canonicalized
to its tensor-decomposed form (Tucker / CP).

ε bound: Δ_step_relation = predicate-whitelist size factor +
feature quantization.


Part 5 — Multimodal composition with v7
----------------------------------------

§5.1 Where v7 ends, v7-W begins
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A v7 vision encoder maps pixels → embeddings. The encoder is
v7's domain (commit conv kernels, attention layers, etc.). When
the embedding is consumed by a world-model that produces
**state**, the boundary crosses into v7-W.

Concrete example: a YOLO-style detector outputs (bounding boxes,
class probabilities) per image. The convolutional + detection
heads are v7. The (bbox, class, confidence) tuples per detected
object are v7-W ``π*_w_object`` inputs.

§5.2 Cross-substrate ε composition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a multimodal pipeline routes through both substrates,
cumulative ε is the sum of each substrate's ε bounds (under the
data-processing inequality, lossy projections never reduce ε):

.. code-block:: text

   ε_total = ε_v7_encoder + ε_v7w_pose_integration +
             ε_v7w_observation_update + …

Each term is a frontier-named ε from the respective frontier
catalog. T4-W (above) ensures the v7-W terms are well-defined.

§5.3 Frame-transform anchoring
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

When a v7 vision encoder operates on images from a moving
camera, the camera's frame is the v7-W output's frame. The
transform from camera-frame → world-frame is a v7-W commitment,
not a v7 model parameter. Splitting responsibility this way
keeps the v7 encoder generic (image-in, embedding-out) and
puts spatial-frame discipline entirely on the v7-W side.


Part 6 — Adversarial corners
-----------------------------

§6.1 Frame spoofing
~~~~~~~~~~~~~~~~~~~

Adversary commits a frame definition with adversarially-chosen
geometry that makes innocuous observations look like target
events under the canonical projection. Mitigation:

- Frame definitions cite their **physical anchor**: a SHA-256
  of the surveying / measurement procedure that established the
  frame.
- Frames without committed anchors stay at minimum-warrant
  level (analogous to arborist's COPYRIGHT_FOOTER tier).
- Cross-frame transforms inherit warrant from their source
  frames; chains route through the weakest link.

§6.2 Time skew
~~~~~~~~~~~~~~

Adversary commits observations with manipulated timestamps to
fabricate causal dependencies. Mitigation:

- Single-agent: substrate manifest pins clock skew bound
  Δ_clock_max; observations outside that window are flagged.
- Multi-agent: Lamport vector clocks make skew observable as a
  vector-clock inconsistency rather than a wall-clock rewrite.
- Cross-substrate: v7-W observation timestamps must agree with
  arborist's ``time-series-quantized@v1`` Δ_t for any signal
  the observation references; mismatches surface as
  cross-substrate chain breaks.

§6.3 Observation injection
~~~~~~~~~~~~~~~~~~~~~~~~~~

Adversary fabricates observations with high confidence_centi to
poison the world-model's state. Mitigation:

- Per-observer **observation budget** committed in the manifest
  — caps the rate at which a single agent can publish high-
  confidence claims without supporting evidence.
- Cross-witness agreement (analogous to arborist's #000028
  multi-witness): an observation backed by independent
  observers in independent frames warrants more strongly than a
  solo observation.
- Adversarial-corner ticket (#000018-W follow-up) for the
  formal version of this — modeled on #000018's threat-model
  + reduction approach.

§6.4 Privacy
~~~~~~~~~~~~

A world-state commitment substrate is also a surveillance
substrate. Frame discipline + Phase-2 ZK (ticket #000016) more
important here than for text/logic. Substrate manifest
**MUST** declare the deployment's privacy class:

.. code-block:: json

   {
     "privacy": {
       "class": "public | aggregated_only | ZK_with_selective_disclosure",
       "...": "..."
     }
   }

Public class is the default for openly-published research /
mapping deployments. ZK-with-selective-disclosure is required
for any deployment where individual agents' positions or
identities should not be inferable from the commitment chain.


Appendix — Worked example: toy SLAM
-----------------------------------

A small concrete trace demonstrating the substrate end-to-end.

**Setup.** One agent navigating a 10×10×3 m room. RGB camera +
IMU. Detects three persistent objects (a chair, a desk, a door)
and produces a path trace.

**Manifest (committed once at boot).**

.. code-block:: json

   {
     "v7w_version": "v0",
     "grid": {"type": "octree", "level_min": 0, "level_max": 18,
              "extent_meters": [10.0, 10.0, 3.0]},
     "frames": {
       "world": {"kind": "gravity-aligned-local",
                 "anchor": "<sha256 of survey procedure>"},
       "camera_initial": {"parent": "world",
                          "transform": "<identity>"}
     },
     "clock": {"kind": "wall_clock_ms_since_epoch",
               "Δ_t_ms": 1},
     "predicates": ["adjacent_to", "supports", "occludes",
                    "approaches"],
     "privacy": {"class": "public"}
   }

**Per-tick observations.** At each tick, the agent commits:

1. ``π*_w_agent_trace`` — appends a (t, pose, attention) point.
2. ``π*_w_object`` per detected object — bbox + confidence.
3. ``π*_w_relation`` per detected relation — e.g. "chair
   adjacent_to desk", "agent approaches door".
4. ``π*_w_event`` if a state-change occurred — e.g. "door opens
   at t=4521".

**Commitment chain** (10-tick trace, ~50 observations, ~30 KB
compressed bytes). Audit replay walks the chain, verifying each
``observation_update`` ε bound against the manifest's grid
declarations and confirming T2-W causal completeness.

**ε budget for the worked example.**

.. code-block:: text

   per-tick total ε:
       pose_integration  ≈ Δ_rot + Δ_trans
                         ≈ 1 mrad + 100 µm                   = bounded
       observation_update per object ≈ bbox-quantization
                         ≈ 1 cell at level 18                = bounded
       object_logits     ≈ class-probability granularity
                         ≈ 1/100 (centi-confidence)          = bounded
       relation_logits   ≈ predicate-whitelist + feature
                         quantization                        = bounded

   ε_total over 10 ticks: bounded sum of the above; well under
       what a downstream consumer (motion planner / safety filter)
       would care about for room-scale navigation.


Out of scope (re-stated from #000013)
-------------------------------------

- Running a SLAM stack inside arborist. v7-W defines the
  commitment substrate; world-model engines (SLAM, Gaussian
  splatting, predictive video) plug in via adapters that are
  separate tickets.
- Cross-modal joint reasoning (text claim + spatial state).
  Needs the cross-domain π* composition theorem (#000015).
- Specific sensor adapters (LIDAR, RGB-D, IMU). Each is a
  separate source-adapter ticket once v7-W lands.


Closure
-------

Closure criterion (#000013 §7): this document plus
``docs/v7w-frontier-catalog.md`` plus ``arborist/world/__init__.py``
namespace stub. All three land in the same commit closing the
ticket.

Open questions tracked separately:

- Standards adoption — S2 vs custom octree (left-as-recommendation
  per §2.2; deployment can override via manifest).
- Privacy implementation — Phase-2 ZK ticket (#000016) covers the
  cryptographic side once #000013 has a deployment target.
- Discretization tax — empirical bench of ε at frontier kernels
  on representative deployments. Captured here as future work.

Implementation tickets that cite this paper land later, one per
verifier kernel + sensor adapter.

Status: closure-draft 2026-05-09. Ready for fox + downstream
review.
