Merkle-AGI v7-W — Spatial-Temporal Substrate#
- Author:
fox + agent blackops, on the unsandbox / unturf / permacomputer platform
- Date:
2026-05-09 (draft v0)
- Status:
substrate paper for ticket #000013; closure draft.
This document specifies the third commitment substrate in the Merkle-AGI lineage — sister to v7 (logic / math) and arborist v9.8 (language / claim-lattice). v7-W commits derived spatial-temporal world-state: objects, relations, events, places, agent traces, observations. It is the substrate a world-model dreams in.
Read alongside:
docs/v7w-frontier-catalog.md— canonical ε-frontier catalog.docs/pi-star.rst— the π* registry where the v7-W kernels will register once Phase 1 lands.The v7 plastic-training spec (sister substrate; multimodal composition rules apply at v7-W ↔ v7 boundaries).
Part 1 — Introduction & motivation#
Fox’s framing identifies three substrates needed for ASI:
3 > (language, logic, substrate)
1 = recursive-falsification merkle-agi (logic / math)
2 = language / claim-lattice (arborist v9.8)
3 = spatial-temporal vision / world-models ← v7-W
Without a world-state substrate, an agent cannot:
Maintain persistent identity across time. Cache is keyed on text; physical “where am I” has no commitment.
Reason causally about physics. Logic substrate verifies proof steps; world substrate verifies “what happened next.”
Ground multi-modal claims. A claim about an object’s location can’t be falsified without a state commitment.
Support robotic / embodied use. No way to commit “I observed X at (t, x, y, z) with confidence c.”
Audit video / time-series. No π* for temporal signals beyond
time-series-quantized@v1(which handles 1D scalar streams, not multi-object scenes).
v7 § 11 multimodal composition handles vision + language structurally — commit the conv kernel, the bridge, the projection. It does not commit world-state: the abstract scene representation, spatial relations, temporal predicates. Those are derived from the model’s forward pass and never committed as first-class objects.
What this substrate is NOT#
Not a SLAM stack, not a renderer, not a video codec.
Not raw pixels or raw audio.
Not the model that produces world-state. (That’s v7’s domain.)
What it IS#
The canonical-encoding + commitment layer for derived world-state:
- objects: { id, class, bbox, pose, confidence }
- relations: { subject_id, predicate, object_id, time_window }
- events: { type, t_start, t_end, participants, place }
- places: { id, frame_of_reference, geometry, parent_place }
- agents: { id, position_trace, pose_trace, attention_trace }
- observations: { observer_agent, t, frame, claim_about,
confidence }
Each tuple-class has a canonical-projection π*_w mapping the free-form input to byte-deterministic canonical bytes. SHA-256 of the canonical bytes is the equivalence-class identity.
Part 2 — Substrate definition#
§2.1 Hard constraints (carried from #000013)#
Stays inside SQD A1 (canonical encoding), A2 (public quantization), A3 (collision-resistant hash).
No new axiom.
Every π*_w is defined on quantized integer state, not on continuous tensors. Continuous data (poses, bounding boxes, confidences) gets gridded explicitly; the grid choice is part of the commitment.
§2.2 Hierarchical grid spatial discretization#
Recommended: S2-style hierarchical cell hierarchy for Earth- scale geographic frames; octree levels for object-fixed local frames; quadtree levels for image-plane / 2D-floor plans.
The substrate manifest declares its set of grid levels:
{
"grid": {
"type": "octree",
"frame": "object-fixed",
"origin_committed": "<32-byte SHA-256 of frame definition>",
"level_min": 0,
"level_max": 24,
"extent_meters": [10.0, 10.0, 10.0]
}
}
Each commitment names which level it’s anchored at:
{
"kind": "object_observation",
"grid_level": 18,
"cell_id": "0xabc12...",
"...": "..."
}
Why hierarchical: matches how spatial reasoning works (coarse-to-fine), matches established standards (S2 / H3 for geographic indexing, octree for SLAM voxel grids, quadtree for mapping), and keeps π*_w finitely specified with one cell-id per declared level.
Trade-off captured by ticket §6: discretization tax. Every spatial claim incurs grid-rounding cost; ε at world-model frontiers must remain tight enough to be useful. Empirical validation in Part 4.
§2.3 Frame canonicalization#
Recommendation: B from ticket §2.2 — frame as committed object with explicit transforms.
Every observation declares its frame:
{
"frame_id": "0xdead123",
"frame_kind": "object-fixed | global-ECEF | gravity-aligned-local | …",
"frame_definition": "<canonical bytes>",
"parent_frame_id": "0xparent…",
"transform_to_parent": "<pose 4×4 quantized>"
}
The frame_id is SHA-256 of the canonical frame definition.
Cross-frame reasoning requires explicit transforms — also
committed. Matches v7’s “every causally relevant transformation
must be committed” axiom.
Transform commitment shape (4×4 SE(3) homogeneous matrix, rotational components quantized via SO(3) → axis-angle integer encoding):
{
"kind": "frame_transform",
"from_frame_id": "...",
"to_frame_id": "...",
"axis_angle_quantized": [<int32 x 3>],
"translation_quantized": [<int32 x 3>],
"Δ_rot_milli_radians": 1,
"Δ_trans_micrometers": 100
}
The Δ_rot / Δ_trans values pin the quantization grid; reusing the same Δ across transforms in a manifest is encouraged.
§2.4 Temporal canonicalization#
Recommendation: B (per-substrate clock) for single-agent deployments; C (Lamport / vector clocks) for multi-agent. Substrate manifest declares which mode.
Single-agent manifest:
{
"clock": {
"kind": "wall_clock_ms_since_epoch",
"epoch": "2025-01-01T00:00:00Z",
"Δ_t_ms": 1
}
}
Multi-agent manifest:
{
"clock": {
"kind": "lamport",
"agent_id": "0xagent42",
"vector_dim": 8,
"tie_break": "agent_id_lexical_order"
}
}
Mixed deployments use B locally + C across agents — observations
record both their wall-clock value AND their Lamport vector
position. Cross-substrate joins (v7-W to arborist’s
time-series-quantized@v1) need explicit clock transforms,
also committed.
§2.5 Probabilistic commitment#
Default: A — quantized centi-confidence (0–100 integer).
{
"claim": "...",
"confidence_centi": 73
}
Opt-in: B — range commitment for safety-critical deployments (medical, robotics) where confidence intervals matter:
{
"claim": "...",
"confidence_lo_centi": 65,
"confidence_hi_centi": 80
}
The substrate manifest declares which mode is in effect for the
deployment. Mixed deployments split per claim-class
(object_class_probability uses B; observation_existence
uses A).
§2.6 The five canonical tuple-classes#
Each gets its own π*_w canonical projection.
π*_w_object — object instances:
{ id, class_label, frame_id, bbox_quantized, pose_quantized,
confidence_centi, observed_at_logical_time }
π*_w_relation — directed relations between two objects:
{ subject_id, predicate, object_id, time_window,
confidence_centi }
The predicate is from a substrate-declared whitelist
(contains, adjacent_to, approaching, occludes,
supports, …) to keep the canonical bytes deterministic.
Adding a new predicate requires a substrate version bump.
π*_w_event — temporally-extended interactions:
{ type, t_start, t_end, participants, place_id, confidence_centi }
π*_w_place — recognizable locations:
{ id, frame_of_reference, geometry, parent_place_id }
Geometry is hierarchical-cell-set encoded — a list of (level, cell_id) pairs covering the place’s extent.
π*_w_agent_trace — an agent’s path through space-time:
{ id, frame_id, trace: [(t, pose_quantized, attention_target?), …] }
The trace is a temporally-quantized sequence; reuses
time-series-quantized@v1’s sample-array discipline at each
of the trace’s spatial dimensions.
Part 3 — Theorems#
§3.1 T1-W — State binding#
Statement. For any world-state commitment C(W) produced
by π*_w, the SHA-256 preimage of C(W) uniquely determines
the canonical (object, relation, event, place, agent_trace)
tuples that π*_w accepted as input, modulo the published grid /
frame / clock manifest.
Proof sketch. π*_w is a deterministic byte-projection on quantized integer state (per §2.1 hard constraint). Its output is canonical bytes; SHA-256 is collision-resistant under A3. Therefore commitment determines input modulo the equivalence class π*_w defines (different surface representations that canonicalize identically map to the same commitment).
The “modulo manifest” qualifier matters: same input under different manifests (different Δ_x, Δ_t, frame definition) can produce different commitments. This is by design — the manifest is part of identity.
§3.2 T2-W — Causal completeness#
Statement. A v7-W commitment chain (a sequence of committed observations + transforms over time) is causally complete if and only if every state transition between consecutive observations is justified by a committed transform OR a committed event.
What this rules out. Silent state changes — an object’s pose updating between two observations without a corresponding ego-motion transform OR a corresponding event explaining the update — break causal completeness.
Operational meaning. A v7-W chain that fails T2-W’s check
is missing data: either the agent moved without recording its
ego-motion (covered by pose_integration ε-frontier), or
something happened in the world (covered by a committed event).
Either way, the gap is observable as a chain-check failure.
§3.3 T3-W — Frame-transform soundness#
Statement. For any two committed frames F_a, F_b and a committed transform T_{a→b}, applying T_{a→b} to a v7-W commitment in F_a produces a v7-W commitment in F_b that is ε-equivalent to a direct observation in F_b, where ε is bounded by the canonical-projection π*_w’s grid quantization.
Practical implication. Cross-frame reasoning is sound up to the ε floor. Operators trading off precision (coarser grid → faster) accept a measurable error budget per transform.
§3.4 T4-W — ε at affine frontiers#
Statement. At each of the four canonical ε-frontiers
(pose_integration, observation_update, object_logits,
relation_logits; see Part 4), the verifier kernel is affine
on appropriately-canonicalized integer state, and the per-step
ε bound is the quantization granularity Δ_step.
Why affine. Pose integration under small-time-step is a linear operator on the previous pose’s integer-encoded state. The Kalman-style observation update is affine in the residual. Object / relation classification logits are affine pre-softmax. Each maps cleanly to v7’s affine-frontier discipline.
This is the load-bearing theorem for ε-proof composability across v7 + v7-W. Without it, multimodal pipelines that route through both substrates can’t bound their cumulative ε.
Part 4 — Verifier kernels#
The four canonical ε-frontiers, each implemented as a deterministic integer kernel.
§4.1 pose_integration#
Input: {prev_pose_quantized, ego_motion_axis_angle_quantized,
ego_motion_translation_quantized, Δ_t_ticks}.
Output: next_pose_quantized.
Operator: SE(3) composition. Affine in axis-angle + translation under small-time-step assumption. Quantization follows the manifest’s Δ_rot / Δ_trans declarations.
ε bound: Δ_step_pose = Δ_rot + Δ_trans · |translation_max|.
§4.2 observation_update#
Input: {prior_state_quantized, observation_quantized,
innovation_covariance_quantized}.
Output: posterior_state_quantized.
Operator: standard Kalman update — affine in the innovation
z - H·prior. Covariance arithmetic uses bigint accumulator
discipline (v7 §5) to avoid analog leakage; final result is
re-quantized to the manifest’s grid.
ε bound: Δ_step_observation = innovation_quantization + covariance_quantization.
§4.3 object_logits#
Input: {object_features_quantized, classifier_weights_quantized}.
Output: logits_quantized (pre-softmax integer vector).
Operator: linear projection (matrix-vector multiplication on
quantized integers). Affine. Reuses v7’s existing
object_logits ε-frontier definition; this entry pins the
v7-W view of the same kernel for cross-substrate ε proofs.
ε bound: Δ_step_object = manifest’s per-class quantization threshold.
§4.4 relation_logits#
Input: {subject_features_quantized, object_features_quantized,
relation_classifier_weights_quantized}.
Output: relation_logits_quantized (pre-softmax over the
substrate-declared predicate whitelist).
Operator: bilinear scoring on the (subject, object) feature pair. Affine after the bilinear factorization is canonicalized to its tensor-decomposed form (Tucker / CP).
ε bound: Δ_step_relation = predicate-whitelist size factor + feature quantization.
Part 5 — Multimodal composition with v7#
§5.1 Where v7 ends, v7-W begins#
A v7 vision encoder maps pixels → embeddings. The encoder is v7’s domain (commit conv kernels, attention layers, etc.). When the embedding is consumed by a world-model that produces state, the boundary crosses into v7-W.
Concrete example: a YOLO-style detector outputs (bounding boxes,
class probabilities) per image. The convolutional + detection
heads are v7. The (bbox, class, confidence) tuples per detected
object are v7-W π*_w_object inputs.
§5.2 Cross-substrate ε composition#
When a multimodal pipeline routes through both substrates, cumulative ε is the sum of each substrate’s ε bounds (under the data-processing inequality, lossy projections never reduce ε):
ε_total = ε_v7_encoder + ε_v7w_pose_integration +
ε_v7w_observation_update + …
Each term is a frontier-named ε from the respective frontier catalog. T4-W (above) ensures the v7-W terms are well-defined.
§5.3 Frame-transform anchoring#
When a v7 vision encoder operates on images from a moving camera, the camera’s frame is the v7-W output’s frame. The transform from camera-frame → world-frame is a v7-W commitment, not a v7 model parameter. Splitting responsibility this way keeps the v7 encoder generic (image-in, embedding-out) and puts spatial-frame discipline entirely on the v7-W side.
Part 6 — Adversarial corners#
§6.1 Frame spoofing#
Adversary commits a frame definition with adversarially-chosen geometry that makes innocuous observations look like target events under the canonical projection. Mitigation:
Frame definitions cite their physical anchor: a SHA-256 of the surveying / measurement procedure that established the frame.
Frames without committed anchors stay at minimum-warrant level (analogous to arborist’s COPYRIGHT_FOOTER tier).
Cross-frame transforms inherit warrant from their source frames; chains route through the weakest link.
§6.2 Time skew#
Adversary commits observations with manipulated timestamps to fabricate causal dependencies. Mitigation:
Single-agent: substrate manifest pins clock skew bound Δ_clock_max; observations outside that window are flagged.
Multi-agent: Lamport vector clocks make skew observable as a vector-clock inconsistency rather than a wall-clock rewrite.
Cross-substrate: v7-W observation timestamps must agree with arborist’s
time-series-quantized@v1Δ_t for any signal the observation references; mismatches surface as cross-substrate chain breaks.
§6.3 Observation injection#
Adversary fabricates observations with high confidence_centi to poison the world-model’s state. Mitigation:
Per-observer observation budget committed in the manifest — caps the rate at which a single agent can publish high- confidence claims without supporting evidence.
Cross-witness agreement (analogous to arborist’s #000028 multi-witness): an observation backed by independent observers in independent frames warrants more strongly than a solo observation.
Adversarial-corner ticket (#000018-W follow-up) for the formal version of this — modeled on #000018’s threat-model + reduction approach.
§6.4 Privacy#
A world-state commitment substrate is also a surveillance substrate. Frame discipline + Phase-2 ZK (ticket #000016) more important here than for text/logic. Substrate manifest MUST declare the deployment’s privacy class:
{
"privacy": {
"class": "public | aggregated_only | ZK_with_selective_disclosure",
"...": "..."
}
}
Public class is the default for openly-published research / mapping deployments. ZK-with-selective-disclosure is required for any deployment where individual agents’ positions or identities should not be inferable from the commitment chain.
Appendix — Worked example: toy SLAM#
A small concrete trace demonstrating the substrate end-to-end.
Setup. One agent navigating a 10×10×3 m room. RGB camera + IMU. Detects three persistent objects (a chair, a desk, a door) and produces a path trace.
Manifest (committed once at boot).
{
"v7w_version": "v0",
"grid": {"type": "octree", "level_min": 0, "level_max": 18,
"extent_meters": [10.0, 10.0, 3.0]},
"frames": {
"world": {"kind": "gravity-aligned-local",
"anchor": "<sha256 of survey procedure>"},
"camera_initial": {"parent": "world",
"transform": "<identity>"}
},
"clock": {"kind": "wall_clock_ms_since_epoch",
"Δ_t_ms": 1},
"predicates": ["adjacent_to", "supports", "occludes",
"approaches"],
"privacy": {"class": "public"}
}
Per-tick observations. At each tick, the agent commits:
π*_w_agent_trace— appends a (t, pose, attention) point.π*_w_objectper detected object — bbox + confidence.π*_w_relationper detected relation — e.g. “chair adjacent_to desk”, “agent approaches door”.π*_w_eventif a state-change occurred — e.g. “door opens at t=4521”.
Commitment chain (10-tick trace, ~50 observations, ~30 KB
compressed bytes). Audit replay walks the chain, verifying each
observation_update ε bound against the manifest’s grid
declarations and confirming T2-W causal completeness.
ε budget for the worked example.
per-tick total ε:
pose_integration ≈ Δ_rot + Δ_trans
≈ 1 mrad + 100 µm = bounded
observation_update per object ≈ bbox-quantization
≈ 1 cell at level 18 = bounded
object_logits ≈ class-probability granularity
≈ 1/100 (centi-confidence) = bounded
relation_logits ≈ predicate-whitelist + feature
quantization = bounded
ε_total over 10 ticks: bounded sum of the above; well under
what a downstream consumer (motion planner / safety filter)
would care about for room-scale navigation.
Out of scope (re-stated from #000013)#
Running a SLAM stack inside arborist. v7-W defines the commitment substrate; world-model engines (SLAM, Gaussian splatting, predictive video) plug in via adapters that are separate tickets.
Cross-modal joint reasoning (text claim + spatial state). Needs the cross-domain π* composition theorem (#000015).
Specific sensor adapters (LIDAR, RGB-D, IMU). Each is a separate source-adapter ticket once v7-W lands.
Closure#
Closure criterion (#000013 §7): this document plus
docs/v7w-frontier-catalog.md plus arborist/world/__init__.py
namespace stub. All three land in the same commit closing the
ticket.
Open questions tracked separately:
Standards adoption — S2 vs custom octree (left-as-recommendation per §2.2; deployment can override via manifest).
Privacy implementation — Phase-2 ZK ticket (#000016) covers the cryptographic side once #000013 has a deployment target.
Discretization tax — empirical bench of ε at frontier kernels on representative deployments. Captured here as future work.
Implementation tickets that cite this paper land later, one per verifier kernel + sensor adapter.
Status: closure-draft 2026-05-09. Ready for fox + downstream review.
Permacomputer Preamble — License: AGPL-3.0-only
This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.
Our permacomputer is community-owned infrastructure optimized around four values:
TRUTH — First principles, math & science, open source code freely distributed.
FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.
HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.
LOVE — Be yourself without hurting others, cooperation through natural law.
NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.