π* domain library#

Arborist’s canonical-projection registry — the substrate that implements SQD whitepaper §3’s invariant projection π*: Σ* 𝓘 {⊥}. Every modality the bench surface or audit chain touches has (or reserves) a registered π* that maps surface bytes to canonical bytes; the SHA-256 of the canonical bytes is the equivalence-class identity.

Registry overview#

Lookup is by name@version key. Sixteen concrete π*’s ship today; the registry has no remaining reserved stubs.

Key

Domain

Status

wikitext-base@v1

text

Wikitext → plain prose

claim-lattice@v1

text

Claim lines → JSON parsed-claim list

code-py-ast@v1

code

Python source → canonical AST S-expression

arithmetic@v1

arithmetic

Expression → exact rational num/den (SQD §14.1)

logic-kernel@v1

logic

Boolean expression → canonical CNF (SQD §14.3)

time-series-quantized@v1

time-series

JSON sample array → quantized integer vector (SQD §13.5)

tabular-pinned@v1

tabular

JSON-rows → pinned-schema canonical bytes

algebra-symbolic@v1

symbolic-algebra

SymPy expand + srepr (#000030 Phase 1)

algebra-symbolic-simplified@v1

symbolic-algebra

SymPy simplify + srepr — collapses trig identities

calculus-derivative@v1

calculus

sp.diff re-canonicalized through algebra-symbolic

calculus-integral@v1

calculus

sp.integrate + unevaluated-Integral sentinel

calculus-limit@v1

calculus

sp.limit + ±∞/complex-infinity sentinels

calculus-series@v1

calculus

Truncated Taylor / Maclaurin (drops O(x**n))

linear-algebra@v1

linear-algebra

RREF / det / eigenvalues / inverse via {op, matrix}

function-sampled@v1

function-sampled

SymPy expr → quantized integer vector (bridge to time-series)

combinatorics@v1

combinatorics

Pure-integer counting kernel (#000032; fails closed on non-non-negative-integer results)

Authoritative count comes from arborist.pi_star.registry.REGISTRY itself; if this table drifts from len(REGISTRY) the table is wrong, not the registry. Each entry is behaviorally immutable — any byte-affecting change to a registered name@version requires a new version key, never an in-place patch (otherwise prior persisted canonical- cache rows become semantically unstable). Full versioning rule in docs/spec-methodology.md §1.1.

The math π*’s (algebra-symbolic / calculus-* / linear-algebra / function-sampled) gate on the optional [math] extra: pip install 'arborist[math]' installs SymPy. A fresh checkout without the extra still passes the test suite — the modules self-skip registration when import sympy fails.

Cross-modality discipline#

Every bench fixture and audit-bound canonicalization names its PiStar via:

  • carrier — a Phase-1 whitelist enforced by bench.batteries.base.PHASE_1_CARRIERS

  • pi_star_ref — the registry key

Unsupported carriers fail or skip explicitly with reason="unsupported_carrier" — never silently accepted. Hidden-channel work is defensive only (detection / flagging, never generation or concealment).

The discipline gives every benchmark, every audit row, and every selection score a stable answer to “what canonicalizer was used” that survives schema migrations and is reproducible from the canonical bytes alone.

Math π*’s — SQD §14#

Two of the most recently graduated π*’s implement the SQD whitepaper’s math substrate:

arithmetic@v1 — closed-form rational arithmetic. Solves the canonical SQD test exactly:

from arborist.pi_star import get
ps = get("arithmetic@v1")

ps.canonicalize(b"0.1+0.2")   # → b"3/10"
ps.canonicalize(b"0.3")       # → b"3/10"
ps.canonicalize(b"1+2")       # → b"3/1"
ps.canonicalize(b"6/4")       # → b"3/2"   (lowest terms)

No floating-point drift — Decimal(str(0.1)) gives exact 1/10, then fractions.Fraction arithmetic stays in ℚ. Identifiers, function calls, division by zero, and non-integer exponents raise PiStarError.

logic-kernel@v1 — propositional Boolean expression → Conjunctive Normal Form (CNF):

ps = get("logic-kernel@v1")

ps.canonicalize(b"A AND B")              # → b"A AND B"
ps.canonicalize(b"B AND A")              # → b"A AND B"   (commutativity)
ps.canonicalize(b"A IMPL B")             # → b"(NOT A OR B)"
ps.canonicalize(b"(NOT B) IMPL (NOT A)") # → b"(NOT A OR B)"   (contrapositive)
ps.canonicalize(b"NOT NOT A")            # → b"A"   (double negation)
ps.canonicalize(b"A OR NOT A")           # → b"TRUE"   (tautology)

Atom cap: 8 (CNF expansion is exponential; cap keeps canonicalization deterministic in bounded time).

Equivalences preserved: commutativity, associativity, IMPL/IFF/XOR rewrites, De Morgan, double negation, distribution, idempotence, tautology collapse, contrapositive.

CLI surfaces#

Two ways to reach a registered π* from the shell — direct or auto-routed.

Direct callarborist canon <key> "<input>" runs the named π* against the given input and prints the canonical bytes. No shards, no LLM, no audit chain. Pure projection.

$ arborist canon arithmetic@v1 "0.1 + 0.2"
3/10
$ arborist canon logic-kernel@v1 "(NOT B) IMPL (NOT A)"
(NOT A OR B)
$ arborist canon time-series-quantized@v1 \
    '{"dt":1,"dv":0.1,"samples":[[0,1.0],[1,2.0]]}'
dt=1;dv=0.1;n=2;t0=0:10|20

# registry contents:
$ arborist canon --list
arithmetic@v1                arithmetic
claim-lattice@v1             text
code-py-ast@v1               code
logic-kernel@v1              logic
tabular-pinned@v1            tabular
time-series-quantized@v1     time-series
wikitext-base@v1             text-or-wikitext

# JSON envelope w/ canonical-bytes SHA-256:
$ arborist canon --json arithmetic@v1 "0.1 + 0.2"
{"pi_star_ref":"arithmetic@v1","input":"0.1 + 0.2",
 "canonical":"3/10","canonical_sha256":"088ae5cc..."}

Auto-routed via arborist query — pure-arithmetic and pure-propositional questions short-circuit the RAG path and answer directly through arithmetic@v1 / logic-kernel@v1. The result gets a synthetic audit_mode = CANONICAL_PROJECTION and renders as CANONICAL · via <pi_star_ref>:

$ arborist query "0.1 + 0.2"
0.1 + 0.2
  CANONICAL · via arithmetic@v1   0.0s   (projected)

3/10

$ arborist query "A IMPL B"
A IMPL B
  CANONICAL · via logic-kernel@v1   0.0s   (projected)

(NOT A OR B)

The sniff is conservative — pure-arithmetic shape (digits + ops, no letters) or pure-propositional shape (uppercase atoms + reserved keywords only). Anything natural-language (“what is 0.1 + 0.2?”) falls through to the existing RAG path. Pass --no-canonical-preflight to force the RAG path even on matching input — useful when bench-comparing the canonical answer against the LLM’s reply. Disable per-call via policy["canonical_projection_preflight"] = False.

Composition algebra#

Two π*’s can be composed into a third via arborist.pi_star.compose():

from arborist.pi_star import compose

chain = compose("wikitext-base@v1", "claim-lattice@v1")
# Auto-registers as "wikitext-base-then-claim-lattice@v1"
chain.canonicalize(b"- The release date was July 3, 1985. [E1]")

Each composition is itself a registered π* with its own key. See docs/pi-star-composition.md for the algebra (type compatibility, determinism preservation, equivalence-class preservation, projective vs invertible compositions).

Authoring a new π*#

  1. Add arborist/pi_star/<name>.py with a frozen dataclass declaring name, version, domain, and a canonicalize(self, raw: bytes) -> bytes method.

  2. Call arborist.pi_star.register() at module import time.

  3. Import the new module in arborist/pi_star/__init__.py so the registration fires at package load.

  4. Write tests covering: idempotency on the canonical form (or document projective behavior), equivalence classes preserved, equivalence classes kept distinct, error paths (bad input rejected explicitly).

  5. Add bench fixtures under bench/fixtures/5s/{syntax,semantics}-<name>-v1.jsonl exercising the new carrier through the existing 5S Syntax and Semantics runners.

  6. Add the carrier name to bench.batteries.base.PHASE_1_CARRIERS.

  7. Add make bench-5s-<name> Makefile target.

See docs/spec-methodology.md for the full discipline.


Permacomputer Preamble — License: AGPL-3.0-only

This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.

Our permacomputer is community-owned infrastructure optimized around four values:

  • TRUTH — First principles, math & science, open source code freely distributed.

  • FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.

  • HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.

  • LOVE — Be yourself without hurting others, cooperation through natural law.

NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.