π* domain library
=================

Arborist's canonical-projection registry — the substrate that
implements SQD whitepaper §3's invariant projection
``π*: Σ* → 𝓘 ∪ {⊥}``. Every modality the bench surface or audit
chain touches has (or reserves) a registered π* that maps surface
bytes to canonical bytes; the SHA-256 of the canonical bytes is
the equivalence-class identity.

Registry overview
-----------------

Lookup is by ``name@version`` key. **Sixteen** concrete π*'s ship
today; the registry has no remaining reserved stubs.

================================== ================== =====================================================
Key                                Domain             Status
================================== ================== =====================================================
``wikitext-base@v1``               text               Wikitext → plain prose
``claim-lattice@v1``               text               Claim lines → JSON parsed-claim list
``code-py-ast@v1``                 code               Python source → canonical AST S-expression
``arithmetic@v1``                  arithmetic         Expression → exact rational ``num/den`` (SQD §14.1)
``logic-kernel@v1``                logic              Boolean expression → canonical CNF (SQD §14.3)
``time-series-quantized@v1``       time-series        JSON sample array → quantized integer vector (SQD §13.5)
``tabular-pinned@v1``              tabular            JSON-rows → pinned-schema canonical bytes
``algebra-symbolic@v1``            symbolic-algebra   SymPy ``expand + srepr`` (#000030 Phase 1)
``algebra-symbolic-simplified@v1`` symbolic-algebra   SymPy ``simplify + srepr`` — collapses trig identities
``calculus-derivative@v1``         calculus           ``sp.diff`` re-canonicalized through algebra-symbolic
``calculus-integral@v1``           calculus           ``sp.integrate`` + unevaluated-Integral sentinel
``calculus-limit@v1``              calculus           ``sp.limit`` + ±∞/complex-infinity sentinels
``calculus-series@v1``             calculus           Truncated Taylor / Maclaurin (drops ``O(x**n)``)
``linear-algebra@v1``              linear-algebra     RREF / det / eigenvalues / inverse via ``{op, matrix}``
``function-sampled@v1``            function-sampled   SymPy expr → quantized integer vector (bridge to time-series)
``combinatorics@v1``               combinatorics      Pure-integer counting kernel (#000032; fails closed on non-non-negative-integer results)
================================== ================== =====================================================

Authoritative count comes from
:data:`arborist.pi_star.registry.REGISTRY` itself; if this table
drifts from ``len(REGISTRY)`` the table is wrong, not the registry.
Each entry is **behaviorally immutable** — any byte-affecting
change to a registered ``name@version`` requires a new version
key, never an in-place patch (otherwise prior persisted canonical-
cache rows become semantically unstable). Full versioning rule
in ``docs/spec-methodology.md`` §1.1.

The math π*'s (algebra-symbolic / calculus-* / linear-algebra /
function-sampled) gate on the optional ``[math]`` extra:
``pip install 'arborist[math]'`` installs SymPy. A fresh checkout
without the extra still passes the test suite — the modules
self-skip registration when ``import sympy`` fails.

Cross-modality discipline
-------------------------

Every bench fixture and audit-bound canonicalization names its
:class:`PiStar` via:

- ``carrier`` — a Phase-1 whitelist enforced by
  :data:`bench.batteries.base.PHASE_1_CARRIERS`
- ``pi_star_ref`` — the registry key

Unsupported carriers fail or skip explicitly with
``reason="unsupported_carrier"`` — never silently accepted.
Hidden-channel work is defensive only (detection / flagging,
never generation or concealment).

The discipline gives every benchmark, every audit row, and every
selection score a stable answer to "what canonicalizer was used"
that survives schema migrations and is reproducible from the
canonical bytes alone.

Math π*'s — SQD §14
-------------------

Two of the most recently graduated π*'s implement the SQD
whitepaper's math substrate:

**arithmetic@v1** — closed-form rational arithmetic. Solves the
canonical SQD test exactly:

.. code-block:: python

   from arborist.pi_star import get
   ps = get("arithmetic@v1")

   ps.canonicalize(b"0.1+0.2")   # → b"3/10"
   ps.canonicalize(b"0.3")       # → b"3/10"
   ps.canonicalize(b"1+2")       # → b"3/1"
   ps.canonicalize(b"6/4")       # → b"3/2"   (lowest terms)

No floating-point drift — ``Decimal(str(0.1))`` gives exact
``1/10``, then ``fractions.Fraction`` arithmetic stays in ℚ.
Identifiers, function calls, division by zero, and non-integer
exponents raise :class:`PiStarError`.

**logic-kernel@v1** — propositional Boolean expression →
Conjunctive Normal Form (CNF):

.. code-block:: python

   ps = get("logic-kernel@v1")

   ps.canonicalize(b"A AND B")              # → b"A AND B"
   ps.canonicalize(b"B AND A")              # → b"A AND B"   (commutativity)
   ps.canonicalize(b"A IMPL B")             # → b"(NOT A OR B)"
   ps.canonicalize(b"(NOT B) IMPL (NOT A)") # → b"(NOT A OR B)"   (contrapositive)
   ps.canonicalize(b"NOT NOT A")            # → b"A"   (double negation)
   ps.canonicalize(b"A OR NOT A")           # → b"TRUE"   (tautology)

Atom cap: 8 (CNF expansion is exponential; cap keeps
canonicalization deterministic in bounded time).

Equivalences preserved:
commutativity, associativity, IMPL/IFF/XOR rewrites, De Morgan,
double negation, distribution, idempotence, tautology collapse,
contrapositive.

CLI surfaces
------------

Two ways to reach a registered π* from the shell — direct or
auto-routed.

**Direct call** — ``arborist canon <key> "<input>"`` runs the named
π* against the given input and prints the canonical bytes. No
shards, no LLM, no audit chain. Pure projection.

.. code-block:: console

   $ arborist canon arithmetic@v1 "0.1 + 0.2"
   3/10
   $ arborist canon logic-kernel@v1 "(NOT B) IMPL (NOT A)"
   (NOT A OR B)
   $ arborist canon time-series-quantized@v1 \
       '{"dt":1,"dv":0.1,"samples":[[0,1.0],[1,2.0]]}'
   dt=1;dv=0.1;n=2;t0=0:10|20

   # registry contents:
   $ arborist canon --list
   arithmetic@v1                arithmetic
   claim-lattice@v1             text
   code-py-ast@v1               code
   logic-kernel@v1              logic
   tabular-pinned@v1            tabular
   time-series-quantized@v1     time-series
   wikitext-base@v1             text-or-wikitext

   # JSON envelope w/ canonical-bytes SHA-256:
   $ arborist canon --json arithmetic@v1 "0.1 + 0.2"
   {"pi_star_ref":"arithmetic@v1","input":"0.1 + 0.2",
    "canonical":"3/10","canonical_sha256":"088ae5cc..."}

**Auto-routed via** ``arborist query`` — pure-arithmetic and
pure-propositional questions short-circuit the RAG path and answer
directly through ``arithmetic@v1`` / ``logic-kernel@v1``. The result
gets a synthetic ``audit_mode = CANONICAL_PROJECTION`` and renders as
``CANONICAL · via <pi_star_ref>``:

.. code-block:: console

   $ arborist query "0.1 + 0.2"
   0.1 + 0.2
     CANONICAL · via arithmetic@v1   0.0s   (projected)

   3/10

   $ arborist query "A IMPL B"
   A IMPL B
     CANONICAL · via logic-kernel@v1   0.0s   (projected)

   (NOT A OR B)

The sniff is conservative — pure-arithmetic shape (digits + ops, no
letters) or pure-propositional shape (uppercase atoms + reserved
keywords only). Anything natural-language ("what is 0.1 + 0.2?")
falls through to the existing RAG path. Pass
``--no-canonical-preflight`` to force the RAG path even on
matching input — useful when bench-comparing the canonical answer
against the LLM's reply. Disable per-call via
``policy["canonical_projection_preflight"] = False``.

Composition algebra
-------------------

Two π*'s can be composed into a third via
:func:`arborist.pi_star.compose`:

.. code-block:: python

   from arborist.pi_star import compose

   chain = compose("wikitext-base@v1", "claim-lattice@v1")
   # Auto-registers as "wikitext-base-then-claim-lattice@v1"
   chain.canonicalize(b"- The release date was July 3, 1985. [E1]")

Each composition is itself a registered π* with its own key. See
:file:`docs/pi-star-composition.md` for the algebra (type
compatibility, determinism preservation, equivalence-class
preservation, projective vs invertible compositions).

Authoring a new π*
------------------

1. Add ``arborist/pi_star/<name>.py`` with a frozen dataclass
   declaring ``name``, ``version``, ``domain``, and a
   ``canonicalize(self, raw: bytes) -> bytes`` method.
2. Call :func:`arborist.pi_star.register` at module import time.
3. Import the new module in ``arborist/pi_star/__init__.py`` so
   the registration fires at package load.
4. Write tests covering: idempotency on the canonical form (or
   document projective behavior), equivalence classes preserved,
   equivalence classes kept distinct, error paths (bad input
   rejected explicitly).
5. Add bench fixtures under ``bench/fixtures/5s/{syntax,semantics}-<name>-v1.jsonl``
   exercising the new carrier through the existing 5S Syntax and
   Semantics runners.
6. Add the carrier name to
   :data:`bench.batteries.base.PHASE_1_CARRIERS`.
7. Add ``make bench-5s-<name>`` Makefile target.

See :file:`docs/spec-methodology.md` for the full discipline.
