π* domain library#
Arborist’s canonical-projection registry — the substrate that
implements SQD whitepaper §3’s invariant projection
π*: Σ* → 𝓘 ∪ {⊥}. Every modality the bench surface or audit
chain touches has (or reserves) a registered π* that maps surface
bytes to canonical bytes; the SHA-256 of the canonical bytes is
the equivalence-class identity.
Registry overview#
Lookup is by name@version key. Sixteen concrete π*’s ship
today; the registry has no remaining reserved stubs.
Key |
Domain |
Status |
|---|---|---|
|
text |
Wikitext → plain prose |
|
text |
Claim lines → JSON parsed-claim list |
|
code |
Python source → canonical AST S-expression |
|
arithmetic |
Expression → exact rational |
|
logic |
Boolean expression → canonical CNF (SQD §14.3) |
|
time-series |
JSON sample array → quantized integer vector (SQD §13.5) |
|
tabular |
JSON-rows → pinned-schema canonical bytes |
|
symbolic-algebra |
SymPy |
|
symbolic-algebra |
SymPy |
|
calculus |
|
|
calculus |
|
|
calculus |
|
|
calculus |
Truncated Taylor / Maclaurin (drops |
|
linear-algebra |
RREF / det / eigenvalues / inverse via |
|
function-sampled |
SymPy expr → quantized integer vector (bridge to time-series) |
|
combinatorics |
Pure-integer counting kernel (#000032; fails closed on non-non-negative-integer results) |
Authoritative count comes from
arborist.pi_star.registry.REGISTRY itself; if this table
drifts from len(REGISTRY) the table is wrong, not the registry.
Each entry is behaviorally immutable — any byte-affecting
change to a registered name@version requires a new version
key, never an in-place patch (otherwise prior persisted canonical-
cache rows become semantically unstable). Full versioning rule
in docs/spec-methodology.md §1.1.
The math π*’s (algebra-symbolic / calculus-* / linear-algebra /
function-sampled) gate on the optional [math] extra:
pip install 'arborist[math]' installs SymPy. A fresh checkout
without the extra still passes the test suite — the modules
self-skip registration when import sympy fails.
Cross-modality discipline#
Every bench fixture and audit-bound canonicalization names its
PiStar via:
carrier— a Phase-1 whitelist enforced bybench.batteries.base.PHASE_1_CARRIERSpi_star_ref— the registry key
Unsupported carriers fail or skip explicitly with
reason="unsupported_carrier" — never silently accepted.
Hidden-channel work is defensive only (detection / flagging,
never generation or concealment).
The discipline gives every benchmark, every audit row, and every selection score a stable answer to “what canonicalizer was used” that survives schema migrations and is reproducible from the canonical bytes alone.
Math π*’s — SQD §14#
Two of the most recently graduated π*’s implement the SQD whitepaper’s math substrate:
arithmetic@v1 — closed-form rational arithmetic. Solves the canonical SQD test exactly:
from arborist.pi_star import get
ps = get("arithmetic@v1")
ps.canonicalize(b"0.1+0.2") # → b"3/10"
ps.canonicalize(b"0.3") # → b"3/10"
ps.canonicalize(b"1+2") # → b"3/1"
ps.canonicalize(b"6/4") # → b"3/2" (lowest terms)
No floating-point drift — Decimal(str(0.1)) gives exact
1/10, then fractions.Fraction arithmetic stays in ℚ.
Identifiers, function calls, division by zero, and non-integer
exponents raise PiStarError.
logic-kernel@v1 — propositional Boolean expression → Conjunctive Normal Form (CNF):
ps = get("logic-kernel@v1")
ps.canonicalize(b"A AND B") # → b"A AND B"
ps.canonicalize(b"B AND A") # → b"A AND B" (commutativity)
ps.canonicalize(b"A IMPL B") # → b"(NOT A OR B)"
ps.canonicalize(b"(NOT B) IMPL (NOT A)") # → b"(NOT A OR B)" (contrapositive)
ps.canonicalize(b"NOT NOT A") # → b"A" (double negation)
ps.canonicalize(b"A OR NOT A") # → b"TRUE" (tautology)
Atom cap: 8 (CNF expansion is exponential; cap keeps canonicalization deterministic in bounded time).
Equivalences preserved: commutativity, associativity, IMPL/IFF/XOR rewrites, De Morgan, double negation, distribution, idempotence, tautology collapse, contrapositive.
CLI surfaces#
Two ways to reach a registered π* from the shell — direct or auto-routed.
Direct call — arborist canon <key> "<input>" runs the named
π* against the given input and prints the canonical bytes. No
shards, no LLM, no audit chain. Pure projection.
$ arborist canon arithmetic@v1 "0.1 + 0.2"
3/10
$ arborist canon logic-kernel@v1 "(NOT B) IMPL (NOT A)"
(NOT A OR B)
$ arborist canon time-series-quantized@v1 \
'{"dt":1,"dv":0.1,"samples":[[0,1.0],[1,2.0]]}'
dt=1;dv=0.1;n=2;t0=0:10|20
# registry contents:
$ arborist canon --list
arithmetic@v1 arithmetic
claim-lattice@v1 text
code-py-ast@v1 code
logic-kernel@v1 logic
tabular-pinned@v1 tabular
time-series-quantized@v1 time-series
wikitext-base@v1 text-or-wikitext
# JSON envelope w/ canonical-bytes SHA-256:
$ arborist canon --json arithmetic@v1 "0.1 + 0.2"
{"pi_star_ref":"arithmetic@v1","input":"0.1 + 0.2",
"canonical":"3/10","canonical_sha256":"088ae5cc..."}
Auto-routed via arborist query — pure-arithmetic and
pure-propositional questions short-circuit the RAG path and answer
directly through arithmetic@v1 / logic-kernel@v1. The result
gets a synthetic audit_mode = CANONICAL_PROJECTION and renders as
CANONICAL · via <pi_star_ref>:
$ arborist query "0.1 + 0.2"
0.1 + 0.2
CANONICAL · via arithmetic@v1 0.0s (projected)
3/10
$ arborist query "A IMPL B"
A IMPL B
CANONICAL · via logic-kernel@v1 0.0s (projected)
(NOT A OR B)
The sniff is conservative — pure-arithmetic shape (digits + ops, no
letters) or pure-propositional shape (uppercase atoms + reserved
keywords only). Anything natural-language (“what is 0.1 + 0.2?”)
falls through to the existing RAG path. Pass
--no-canonical-preflight to force the RAG path even on
matching input — useful when bench-comparing the canonical answer
against the LLM’s reply. Disable per-call via
policy["canonical_projection_preflight"] = False.
Composition algebra#
Two π*’s can be composed into a third via
arborist.pi_star.compose():
from arborist.pi_star import compose
chain = compose("wikitext-base@v1", "claim-lattice@v1")
# Auto-registers as "wikitext-base-then-claim-lattice@v1"
chain.canonicalize(b"- The release date was July 3, 1985. [E1]")
Each composition is itself a registered π* with its own key. See
docs/pi-star-composition.md for the algebra (type
compatibility, determinism preservation, equivalence-class
preservation, projective vs invertible compositions).