Makefile reference
==================

Every arborist workflow lives behind a ``make`` target. This page is
auto-generated from the project ``Makefile``'s ``## description``
annotations at Sphinx build time, so it stays in sync with the source.

Run ``make help`` locally for a flat alphabetized listing.

.. note::

   Targets are grouped below by **workflow phase**, in the order a
   new operator typically runs them. The first row of each table is
   the most common entry point for that phase.

Setup
-----

One-time installation. Creates the venv and (optionally) the crawler's heavy extras. Re-running is a no-op when up to date.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make bootstrap``
     - create venv and install editable package
   * - ``make bootstrap-crawler``
     - install [crawler] extras into the venv
   * - ``make all``
     - bootstrap → fetch cur → ingest cur → verify → stats

Fetch
-----

Download corpus snapshots into ``data/``. Idempotent — ``curl`` skips files already present.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make fetch``
     - download all 3 files (cur + old.1 + old.2 + concat)
   * - ``make fetch-cur``
     - download cur table dump (~82 MB)
   * - ``make fetch-old``
     - download old (revision history) parts and concatenate (~893 MB)
   * - ``make fetch-xml``
     - download Phase IV XML cur dump (default: enwiki 20101011, 6.2 GB)
   * - ``make fetch-abstract``
     - download Phase IV abstract.xml (default: enwiki 20101011, ~3 GB)

Ingest
------

Parse a source into Merkle-committed shards. ``-attached`` variants are the canonical sharded path (one SQLite per shard, no WAL contention). Single-DB variants are for experiments.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make ingest``
     - default ingest = cur (use ingest-old or \*-parallel for full)
   * - ``make ingest-cur``
     - ingest INGEST_LIMIT cur articles
   * - ``make ingest-cur-attached``
     - sharded ingest, no WAL contention (Phase 2)
   * - ``make ingest-old``
     - ingest INGEST_LIMIT old (history) revisions
   * - ``make ingest-old-attached``
     - sharded ingest of old history (Phase 2)
   * - ``make ingest-xml``
     - ingest INGEST_LIMIT pages from $(WP_XML)
   * - ``make ingest-xml-history``
     - ingest every revision (multi-revision mode); set WP_XML to a pages-meta-history file
   * - ``make ingest-xml-attached``
     - sharded XML ingest, one process per shard
   * - ``make ingest-abstract``
     - ingest INGEST_LIMIT abstract docs from $(WP_ABSTRACT)
   * - ``make ingest-grok-attached``
     - ingest Grok conversations into $(GROK_SHARD)
   * - ``make ingest-grok-media-attached``
     - ingest Grok media prompts into $(GROK_SHARD)
   * - ``make ingest-self``
     - ingest this repo's HEAD into a dedicated shard
   * - ``make ingest-self-providence``
     - promote STRICT live providence records into the document corpus [KG_SECONDS=3600]
   * - ``make ingest-git``
     - ingest GIT_REPO=<path> into its own shard
   * - ``make ingest-hg``
     - ingest HG_REPO=<path> (mercurial) into its own shard
   * - ``make crawl-ingest``
     - crawl URL=https://x.com [DEPTH=2 MAX=0 FAST=1] into $(SHARDS_DIR)/crawl_<domain>.db
   * - ``make recrawl-check``
     - conditional HEAD per ingested doc [DOMAIN=x.com LIMIT=100 CRAWL_SHARD=path]

Distill
-------

Surface → core distillation. Cores are Merkle-bound back to their source chunks via inclusion proofs.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make distill-shards-parallel``
     - one distill process per shard (parallel)
   * - ``make distill-shards-tfidf-parallel``
     - TF-IDF cores per shard, in parallel
   * - ``make backfill-concepts``
     - backfill all concept extractors in parallel across shards

Query
-----

Ask the corpus a question. ``query-dry`` skips the LLM call and returns the assembled context — useful for prompt iteration.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make query``
     - ask the corpus a question [JSON=1 BURN=1 REPAIR=1 REPROMPTS=N K="extra retrieval keywords" ANSWER_MODE=claim_lattice|claim_lattice_pointer|quote BROAD=1 REJECT_BROAD=1 ALLOW_BROAD=1 WITNESS=1 XLANG=1 XLANG_MT=1]; JSON by default
   * - ``make query-dry``
     - like 'make query' but skip the LLM call (dry-run) [JSON=1 BURN=1 ANSWER_MODE=... BROAD=1 REJECT_BROAD=1 ALLOW_BROAD=1 XLANG=1 XLANG_MT=1]
   * - ``make search``
     - keyword search; override SEARCH_Q (or pass Q=...)

Verify and inspect
------------------

Round-trip Merkle proofs, audit chain integrity, sidecar diagnostics on cached answers.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make verify``
     - round-trip Merkle proofs for VERIFY_N random documents
   * - ``make verify-shards``
     - cross-shard Merkle round-trip on a random sample
   * - ``make chain-check``
     - audit-chain integrity count for $(DB) (0 = intact: linear chain, one genesis)
   * - ``make chain-check-shards``
     - audit-chain integrity count for every \*.db in $(SHARDS_DIR)
   * - ``make analyze-shards``
     - cross-shard compression spectrum + audit integrity
   * - ``make stats``
     - counts: documents, chunks, edges, audit chain
   * - ``make stats-shards``
     - cross-shard stats via UNION views over $(SHARDS_DIR)
   * - ``make activity``
     - recent Q&A + freshly cached docs (agent timeline)
   * - ``make inspect``
     - sidecar diagnose unverified spans for a cache_key: make inspect KEY=hex [JSON=1]

Operations on cached records
----------------------------

Mark a record falsified (audit-preserving) or burn it from the database (refuses if it has children unless ``FORCE=1``).

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make falsify``
     - mark a cached answer wrong: make falsify KEY=hex REASON='why'
   * - ``make burn``
     - delete a leaf with no children. providence: KEY=<cache_key>; document/core: KIND=document|core ROOT=<hex>. REASON='why' [FORCE=1]
   * - ``make burn-kindergarten``
     - bust providence rows < SECONDS old [SECONDS=3600 FORCE=1 DRY_RUN=1 REASON='why']

Tests and benches
-----------------

Default test suite excludes opt-in crawler tests. ``bench-qa`` runs the full QA-quality sweep; ``bench-qa-smoke`` is the 5-question fast loop.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make test``
     - run pytest suite (excludes opt-in crawler tests)
   * - ``make test-crawler``
     - run only the lifted crawler tests
   * - ``make test-live``
     - live QA quality tests against Hermes (gated; -n auto parallel)
   * - ``make bench``
     - benchmark serial vs parallel-shared vs attached at $(BENCH_DOCS) docs
   * - ``make bench-qa``
     - QA-quality sweep: questions × modes × N samples [BENCH_QA_N=3 BENCH_QA_LIMIT=N BENCH_QA_MODES=... BENCH_QA_CONCURRENCY=4]
   * - ``make bench-qa-smoke``
     - quick 5-question smoke (all anchor classes; ~30s)
   * - ``make bench-emergent``
     - blue-moon emergent stress test (3-word triangulation; N=10)
   * - ``make bench-emergent-pending``
     - print log entries awaiting teacher review

Docs
----

Render diagrams (graphviz) and build the Sphinx API reference. RTD rebuilds on push; these targets are for local previews.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make docs``
     - render docs/diagrams/\*.dot -> .png + .svg via graphviz
   * - ``make docs-api``
     - generate Sphinx API reference from docstrings (output: docs/_source/_build/html/)
   * - ``make docs-api-clean``
     - remove Sphinx build artifacts

Clean
-----

Reversible by re-running the matching ``bootstrap`` / ``fetch`` / ``ingest`` target. ``clean-data`` deletes the largest payload (downloaded dumps).

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make clean``
     - remove venv + caches (keeps fetched data and db)
   * - ``make clean-db``
     - drop the arborist db (keeps fetched data and venv)
   * - ``make clean-data``
     - remove fetched dumps
   * - ``make help``
     - show this help

Uncategorized
-------------

Targets not yet placed in a workflow phase. If you see one here, add it to ``docs/_source/_ext/makefile_targets.py`` ``PHASES``.

.. list-table::
   :widths: 30 70
   :header-rows: 1

   * - Target
     - Description
   * - ``make bench-5f``
     - 5F battery (Function+Finetuning+Falsification+Formulate+Feedback Loop)
   * - ``make bench-5f-falsification-hard``
     - #000046 — HARD Falsification near-miss pack (rate < 1.0 at HEAD by design)
   * - ``make bench-5f-falsification-live``
     - 5F Falsification via real arborist.qa.verify.verify_quotes (Phase 1b.2)
   * - ``make bench-5f-feedback-loop-live``
     - 5F Feedback Loop via live arborist memory + audit chain (Phase 1b.2)
   * - ``make bench-5f-finetuning-live``
     - 5F Finetuning via real selfmodel store/claims_for round-trip (Phase 1b.2)
   * - ``make bench-5f-finetuning-shardchain``
     - #000025 §10.11 — Finetuning over the two latest chain snapshots (run bench-5f-selfmodel-snapshot >=2x first)
   * - ``make bench-5f-formulate-hard``
     - #000046 Phase 2 / #000048 step 2.4 — HARD Formulate clause-segmentation pack (12/12 after step 2.4)
   * - ``make bench-5f-formulate-live``
     - 5F Formulate via live arborist.qa.parse_claims (Phase 1b.2)
   * - ``make bench-5f-function-live``
     - 5F Function via live arborist.qa.parse_claims (Phase 1b.2)
   * - ``make bench-5f-harvest``
     - harvest live-shard falsification proposals → 5F fixture pack
   * - ``make bench-5f-live``
     - all 5F Phase-1b.2 live wire-ups
   * - ``make bench-5f-selfmodel-snapshot``
     - #000025 §10.11 — append one SelfModel snapshot (rates as claims) to the chain shard
   * - ``make bench-5f-threshold-calibration``
     - #000025 §10.14 — 5S/5T/5F → ForkScore threshold calibration report
   * - ``make bench-5r``
     - 5R battery (React+Rearrange+Restore+Replicate+Resonate)
   * - ``make bench-5r-live``
     - all 5R Phase-1b.2 live wire-ups
   * - ``make bench-5r-react-live``
     - 5R React via real audit_events on a temp shard (Phase 1b.2)
   * - ``make bench-5r-restore-live``
     - 5R Restore via real audit chain query (Phase 1b.2)
   * - ``make bench-5s``
     - 5S battery (Syntax+Semantics+Syllogism+Synthesis+Semiotics)
   * - ``make bench-5s-algebra``
     - 5S algebra-symbolic π\* (ticket #000030 Phase 1; SymPy expand+srepr canonicalizer)
   * - ``make bench-5s-arithmetic``
     - 5S arithmetic π\* (SQD §14.1; rational arithmetic canonicalizer)
   * - ``make bench-5s-calculus-limit``
     - 5S calculus-limit π\* (#000030 Phase 4)
   * - ``make bench-5s-calculus-series``
     - 5S calculus-series π\* (#000030 Phase 5)
   * - ``make bench-5s-code``
     - 5S code-carrier (Syntax + Semantics through code-py-ast@v1)
   * - ``make bench-5s-combinatorics``
     - 5S combinatorics π\* (ticket #000032; pure-integer counting kernel)
   * - ``make bench-5s-function-sampled``
     - 5S function-sampled π\* (#000030 Phase 7; SymPy → time-series bridge)
   * - ``make bench-5s-linear-algebra``
     - 5S linear-algebra π\* (#000030 Phase 6)
   * - ``make bench-5s-logic-kernel``
     - 5S logic-kernel π\* (SQD §14.3; CNF canonicalizer)
   * - ``make bench-5s-math``
     - complete math π\* surface (arithmetic + logic-kernel)
   * - ``make bench-5s-tabular``
     - 5S tabular-pinned π\* (#000030/Phase tabular; declared-schema 2D structured-data canonicalizer)
   * - ``make bench-5s-time-series``
     - 5S time-series-quantized π\* (SQD §13.5; quantized integer-vector canonicalizer)
   * - ``make bench-5s5t``
     - 5S + 5T (Phase-1b vocabulary)
   * - ``make bench-5s5t5f``
     - 5S + 5T + 5F (operational triad)
   * - ``make bench-5t``
     - 5T battery (Transfer Learning+Triangulation+Truthtables+Transitivity+Time)
   * - ``make bench-5t-legacy``
     - 5T legacy SQD-name (transfer-v1) for Phase-1a digest stability
   * - ``make bench-fork-baseline``
     - pin current bench-suite output as ForkScore parent
   * - ``make bench-fork-baseline-hard``
     - #000046 — pin the below-ceiling hard-Falsification rate as a ForkScore parent
   * - ``make bench-fork-score``
     - #000012 Phase 1b — score current bench output vs $(FORK_PARENT)
   * - ``make bench-jaggedness``
     - #000060: deterministic retrieval jaggedness across surface perturbations [JAGGED_LIMIT / _K / _CLASSES ...]
   * - ``make bench-nli-backends``
     - #000049 §7 #28 — torch vs onnx-int8 vs tinygrad agreement+latency A/B
   * - ``make bench-nli-shadow``
     - #000049 Phase 2 — NLI shadow would-demote sweep (INPUT="f1.jsonl f2.jsonl" optional)
   * - ``make bench-qa-progressive-and``
     - retrieval-side fixture exercising progressive-AND + DF filter [BENCH_PROGRESSIVE_N=3]
   * - ``make bench-real-shard``
     - #000026 Phase 2 — real-shard workload baseline (latency, audit, primary-source use)
   * - ``make bench-suite``
     - complete Dav1DPrometheus suite (5S + 5T + 5F + 5R)
   * - ``make bench-witness-divergence``
     - extract LLM-divergence events as 5F Falsification fixtures
   * - ``make bench-witness-sweep``
     - fire witness mode on canonical-question corpus against real shards
   * - ``make bootstrap-math``
     - install [math] extras (sympy) into the venv
   * - ``make bootstrap-nli``
     - install [nli] extras + warm the pinned NLI checkpoint
   * - ``make bootstrap-nli-only``
     - #000049 §7 #22 — minimal venv + [nli] only (GPU/CPU compute-node setup; no [dev])
   * - ``make control-ab``
     - #000057 v1: Hermes-solo vs Arborist, blinded Opus judge [N=12 FIXTURE=...]
   * - ``make control-sweep``
     - #000057: model×framing control sweep [CONTROL_SWEEP_N / _WORKERS / _RESUME=jsonl ...]
   * - ``make crawl-textbooks``
     - BFS-crawl every textbook with a `crawl_url` field, one shard per host (in $(CRAWL_SHARDS_DIR))
   * - ``make crawl-textbooks-stats``
     - docs-per-shard summary across textbook crawl shards
   * - ``make demo-plot``
     - function-sampled@v1 demo: make demo-plot Q='sin(x)' [PNG=/tmp/out.png]
   * - ``make docs-one-pager``
     - render docs/_build/arborist-one-pager.pdf
   * - ``make docs-pagers``
     - render both 1-pager and 2-pager PDFs
   * - ``make docs-pagers-clean``
     - remove generated pager PDFs
   * - ``make docs-two-pager``
     - render docs/_build/arborist-two-pager.pdf
   * - ``make export-nli-onnx``
     - #000049 §3 — export the pinned shadow-NLI checkpoint to ONNX (int8)
   * - ``make fetch-textbooks``
     - fetch + ingest manifested textbooks → $(TEXTBOOK_DB)
   * - ``make judge-self-test``
     - verify the #000057 external judge on known-verdict triples (gate before control-ab)
   * - ``make monitor-access``
     - proxy access-log real-client-IP tally + SVG: make monitor-access LOG=path
   * - ``make monitor-graph``
     - render stored load samples to SVG [MONITOR_HOURS=6]
   * - ``make monitor-poll``
     - poll LLM endpoint /metrics (+GPU power/util) into SQLite [MONITOR_INTERVAL=10 MONITOR_GPU=name=host]
   * - ``make prometheus-sweep-dryrun``
     - #000037 Phase 3 sleep-sweep dry-run → markdown report
   * - ``make prometheus-trigger-probe``
     - #000037 §12 measured-pressure probe → markdown report
   * - ``make rapl-access``
     - [root] make Intel RAPL energy_uj readable for watt_bench CPU power
   * - ``make rapl-access-revoke``
     - [root] remove the RAPL read-access udev rule
   * - ``make test-ci``
     - CI gate: full suite minus crawler + wikipedia ingest
   * - ``make textbook``
     - ingest one textbook by id: make textbook ID=bogart-ctgd-2017
   * - ``make textbook-aristotle-posterior``
     - Aristotle Posterior Analytics (PD, Wikisource)
   * - ``make textbook-aristotle-prior``
     - Aristotle Prior Analytics (PD, Wikisource)
   * - ``make textbook-bogart``
     - Bogart Combinatorics Through Guided Discovery (GFDL)
   * - ``make textbook-boole``
     - Boole Laws of Thought (PD, PG TeX)
   * - ``make textbook-cantor``
     - Cantor Contributions to Transfinite Numbers (PD, Jourdain transl. 1915, Wikisource)
   * - ``make textbook-dedekind``
     - Dedekind Essays on the Theory of Numbers 1901 (PD, PG #21016 TeX)
   * - ``make textbook-demorgan``
     - De Morgan First Notions of Logic 1839 (PD, PG #67017)
   * - ``make textbook-grinstead-snell``
     - Grinstead-Snell Introduction to Probability (GFDL, LibreTexts mirror)
   * - ``make textbook-hilbert``
     - Hilbert Foundations of Geometry (PD, PG TeX)
   * - ``make textbook-judson``
     - Judson Abstract Algebra: Theory and Applications (GFDL)
   * - ``make textbook-keller-trotter``
     - Keller-Trotter Applied Combinatorics (CC-BY-SA, slow ~25min crawl-delay)
   * - ``make textbook-laplace``
     - Laplace Philosophical Essay on Probabilities 1814 / 1902 (PD, PG #58881)
   * - ``make textbook-levin``
     - Levin Discrete Mathematics (CC-BY-SA)
   * - ``make textbook-list``
     - list ingestable textbook ids (entries with urls or crawl_url)
   * - ``make textbook-morin``
     - Morin Open Data Structures (CC-BY)
   * - ``make textbook-newton``
     - Newton Principia (PD, Wikisource)
   * - ``make textbook-peano``
     - Peano Arithmetices Principia 1889 (CC-BY-SA, Verheyen+Nahas LaTeX from GitHub)
   * - ``make textbook-plfa``
     - PLFA - Programming Language Foundations in Agda (CC-BY-4.0)
   * - ``make textbook-pm``
     - Whitehead-Russell Principia Mathematica Vol 1 (PD, PG #78050 — preface+intro only)
   * - ``make textbook-russell-imp``
     - Russell Introduction to Mathematical Philosophy 1919 (PD, PG #41654)
   * - ``make textbook-russell-pom``
     - Russell Principles of Mathematics 1903 (PD content + CC-BY-SA-4.0 typesetting, Klement HTML)
   * - ``make textbook-sf-lf``
     - Software Foundations Vol 1 Logical Foundations (MIT, Pierce et al.)
   * - ``make textbooks-base-knowledge``
     - ingest the four 2026-05-09 base-knowledge additions (pillars I/II/III/IX)
   * - ``make textbooks-stats``
     - stats of the textbook shard
   * - ``make textbooks-summary``
     - list manifest entries with license + URL counts
   * - ``make textbooks-tex``
     - ingest every manifest entry that declares a tex_url (Hilbert + Boole)
   * - ``make textbooks-urls``
     - emit one textbook URL per line to stdout
   * - ``make textbooks-verify``
     - sample-Merkle-verify the textbook shard
