Makefile reference#
Every arborist workflow lives behind a make target. This page is
auto-generated from the project Makefile’s ## description
annotations at Sphinx build time, so it stays in sync with the source.
Run make help locally for a flat alphabetized listing.
Note
Targets are grouped below by workflow phase, in the order a new operator typically runs them. The first row of each table is the most common entry point for that phase.
Setup#
One-time installation. Creates the venv and (optionally) the crawler’s heavy extras. Re-running is a no-op when up to date.
Target |
Description |
|---|---|
|
create venv and install editable package |
|
install [crawler] extras into the venv |
|
bootstrap → fetch cur → ingest cur → verify → stats |
Fetch#
Download corpus snapshots into data/. Idempotent — curl skips files already present.
Target |
Description |
|---|---|
|
download all 3 files (cur + old.1 + old.2 + concat) |
|
download cur table dump (~82 MB) |
|
download old (revision history) parts and concatenate (~893 MB) |
|
download Phase IV XML cur dump (default: enwiki 20101011, 6.2 GB) |
|
download Phase IV abstract.xml (default: enwiki 20101011, ~3 GB) |
Ingest#
Parse a source into Merkle-committed shards. -attached variants are the canonical sharded path (one SQLite per shard, no WAL contention). Single-DB variants are for experiments.
Target |
Description |
|---|---|
|
default ingest = cur (use ingest-old or *-parallel for full) |
|
ingest INGEST_LIMIT cur articles |
|
sharded ingest, no WAL contention (Phase 2) |
|
ingest INGEST_LIMIT old (history) revisions |
|
sharded ingest of old history (Phase 2) |
|
ingest INGEST_LIMIT pages from $(WP_XML) |
|
ingest every revision (multi-revision mode); set WP_XML to a pages-meta-history file |
|
sharded XML ingest, one process per shard |
|
ingest INGEST_LIMIT abstract docs from $(WP_ABSTRACT) |
|
ingest Grok conversations into $(GROK_SHARD) |
|
ingest Grok media prompts into $(GROK_SHARD) |
|
ingest this repo’s HEAD into a dedicated shard |
|
promote STRICT live providence records into the document corpus [KG_SECONDS=3600] |
|
ingest GIT_REPO=<path> into its own shard |
|
ingest HG_REPO=<path> (mercurial) into its own shard |
|
crawl URL=https://x.com [DEPTH=2 MAX=0 FAST=1] into $(SHARDS_DIR)/crawl_<domain>.db |
|
conditional HEAD per ingested doc [DOMAIN=x.com LIMIT=100 CRAWL_SHARD=path] |
Distill#
Surface → core distillation. Cores are Merkle-bound back to their source chunks via inclusion proofs.
Target |
Description |
|---|---|
|
one distill process per shard (parallel) |
|
TF-IDF cores per shard, in parallel |
|
backfill all concept extractors in parallel across shards |
Query#
Ask the corpus a question. query-dry skips the LLM call and returns the assembled context — useful for prompt iteration.
Target |
Description |
|---|---|
|
ask the corpus a question [JSON=1 BURN=1 REPAIR=1 REPROMPTS=N K=”extra retrieval keywords” ANSWER_MODE=claim_lattice|claim_lattice_pointer|quote BROAD=1 REJECT_BROAD=1 ALLOW_BROAD=1 WITNESS=1 XLANG=1 XLANG_MT=1]; JSON by default |
|
like ‘make query’ but skip the LLM call (dry-run) [JSON=1 BURN=1 ANSWER_MODE=… BROAD=1 REJECT_BROAD=1 ALLOW_BROAD=1 XLANG=1 XLANG_MT=1] |
|
keyword search; override SEARCH_Q (or pass Q=…) |
Verify and inspect#
Round-trip Merkle proofs, audit chain integrity, sidecar diagnostics on cached answers.
Target |
Description |
|---|---|
|
round-trip Merkle proofs for VERIFY_N random documents |
|
cross-shard Merkle round-trip on a random sample |
|
audit-chain integrity count for $(DB) (0 = intact: linear chain, one genesis) |
|
audit-chain integrity count for every *.db in $(SHARDS_DIR) |
|
cross-shard compression spectrum + audit integrity |
|
counts: documents, chunks, edges, audit chain |
|
cross-shard stats via UNION views over $(SHARDS_DIR) |
|
recent Q&A + freshly cached docs (agent timeline) |
|
sidecar diagnose unverified spans for a cache_key: make inspect KEY=hex [JSON=1] |
Operations on cached records#
Mark a record falsified (audit-preserving) or burn it from the database (refuses if it has children unless FORCE=1).
Target |
Description |
|---|---|
|
mark a cached answer wrong: make falsify KEY=hex REASON=’why’ |
|
delete a leaf with no children. providence: KEY=<cache_key>; document/core: KIND=document|core ROOT=<hex>. REASON=’why’ [FORCE=1] |
|
bust providence rows < SECONDS old [SECONDS=3600 FORCE=1 DRY_RUN=1 REASON=’why’] |
Tests and benches#
Default test suite excludes opt-in crawler tests. bench-qa runs the full QA-quality sweep; bench-qa-smoke is the 5-question fast loop.
Target |
Description |
|---|---|
|
run pytest suite (excludes opt-in crawler tests) |
|
run only the lifted crawler tests |
|
live QA quality tests against Hermes (gated; -n auto parallel) |
|
benchmark serial vs parallel-shared vs attached at $(BENCH_DOCS) docs |
|
QA-quality sweep: questions × modes × N samples [BENCH_QA_N=3 BENCH_QA_LIMIT=N BENCH_QA_MODES=… BENCH_QA_CONCURRENCY=4] |
|
quick 5-question smoke (all anchor classes; ~30s) |
|
blue-moon emergent stress test (3-word triangulation; N=10) |
|
print log entries awaiting teacher review |
Docs#
Render diagrams (graphviz) and build the Sphinx API reference. RTD rebuilds on push; these targets are for local previews.
Target |
Description |
|---|---|
|
render docs/diagrams/*.dot -> .png + .svg via graphviz |
|
generate Sphinx API reference from docstrings (output: docs/_source/_build/html/) |
|
remove Sphinx build artifacts |
Clean#
Reversible by re-running the matching bootstrap / fetch / ingest target. clean-data deletes the largest payload (downloaded dumps).
Target |
Description |
|---|---|
|
remove venv + caches (keeps fetched data and db) |
|
drop the arborist db (keeps fetched data and venv) |
|
remove fetched dumps |
|
show this help |
Uncategorized#
Targets not yet placed in a workflow phase. If you see one here, add it to docs/_source/_ext/makefile_targets.py PHASES.
Target |
Description |
|---|---|
|
5F battery (Function+Finetuning+Falsification+Formulate+Feedback Loop) |
|
#000046 — HARD Falsification near-miss pack (rate < 1.0 at HEAD by design) |
|
5F Falsification via real arborist.qa.verify.verify_quotes (Phase 1b.2) |
|
5F Feedback Loop via live arborist memory + audit chain (Phase 1b.2) |
|
5F Finetuning via real selfmodel store/claims_for round-trip (Phase 1b.2) |
|
#000025 §10.11 — Finetuning over the two latest chain snapshots (run bench-5f-selfmodel-snapshot >=2x first) |
|
#000046 Phase 2 / #000048 step 2.4 — HARD Formulate clause-segmentation pack (12/12 after step 2.4) |
|
5F Formulate via live arborist.qa.parse_claims (Phase 1b.2) |
|
5F Function via live arborist.qa.parse_claims (Phase 1b.2) |
|
harvest live-shard falsification proposals → 5F fixture pack |
|
all 5F Phase-1b.2 live wire-ups |
|
#000025 §10.11 — append one SelfModel snapshot (rates as claims) to the chain shard |
|
#000025 §10.14 — 5S/5T/5F → ForkScore threshold calibration report |
|
5R battery (React+Rearrange+Restore+Replicate+Resonate) |
|
all 5R Phase-1b.2 live wire-ups |
|
5R React via real audit_events on a temp shard (Phase 1b.2) |
|
5R Restore via real audit chain query (Phase 1b.2) |
|
5S battery (Syntax+Semantics+Syllogism+Synthesis+Semiotics) |
|
5S algebra-symbolic π* (ticket #000030 Phase 1; SymPy expand+srepr canonicalizer) |
|
5S arithmetic π* (SQD §14.1; rational arithmetic canonicalizer) |
|
5S calculus-limit π* (#000030 Phase 4) |
|
5S calculus-series π* (#000030 Phase 5) |
|
5S code-carrier (Syntax + Semantics through code-py-ast@v1) |
|
5S combinatorics π* (ticket #000032; pure-integer counting kernel) |
|
5S function-sampled π* (#000030 Phase 7; SymPy → time-series bridge) |
|
5S linear-algebra π* (#000030 Phase 6) |
|
5S logic-kernel π* (SQD §14.3; CNF canonicalizer) |
|
complete math π* surface (arithmetic + logic-kernel) |
|
5S tabular-pinned π* (#000030/Phase tabular; declared-schema 2D structured-data canonicalizer) |
|
5S time-series-quantized π* (SQD §13.5; quantized integer-vector canonicalizer) |
|
5S + 5T (Phase-1b vocabulary) |
|
5S + 5T + 5F (operational triad) |
|
5T battery (Transfer Learning+Triangulation+Truthtables+Transitivity+Time) |
|
5T legacy SQD-name (transfer-v1) for Phase-1a digest stability |
|
pin current bench-suite output as ForkScore parent |
|
#000046 — pin the below-ceiling hard-Falsification rate as a ForkScore parent |
|
#000012 Phase 1b — score current bench output vs $(FORK_PARENT) |
|
#000060: deterministic retrieval jaggedness across surface perturbations [JAGGED_LIMIT / _K / _CLASSES …] |
|
#000049 §7 #28 — torch vs onnx-int8 vs tinygrad agreement+latency A/B |
|
#000049 Phase 2 — NLI shadow would-demote sweep (INPUT=”f1.jsonl f2.jsonl” optional) |
|
retrieval-side fixture exercising progressive-AND + DF filter [BENCH_PROGRESSIVE_N=3] |
|
#000026 Phase 2 — real-shard workload baseline (latency, audit, primary-source use) |
|
complete Dav1DPrometheus suite (5S + 5T + 5F + 5R) |
|
extract LLM-divergence events as 5F Falsification fixtures |
|
fire witness mode on canonical-question corpus against real shards |
|
install [math] extras (sympy) into the venv |
|
install [nli] extras + warm the pinned NLI checkpoint |
|
#000049 §7 #22 — minimal venv + [nli] only (GPU/CPU compute-node setup; no [dev]) |
|
#000057 v1: Hermes-solo vs Arborist, blinded Opus judge [N=12 FIXTURE=…] |
|
#000057: model×framing control sweep [CONTROL_SWEEP_N / _WORKERS / _RESUME=jsonl …] |
|
BFS-crawl every textbook with a crawl_url field, one shard per host (in $(CRAWL_SHARDS_DIR)) |
|
docs-per-shard summary across textbook crawl shards |
|
function-sampled@v1 demo: make demo-plot Q=’sin(x)’ [PNG=/tmp/out.png] |
|
render docs/_build/arborist-one-pager.pdf |
|
render both 1-pager and 2-pager PDFs |
|
remove generated pager PDFs |
|
render docs/_build/arborist-two-pager.pdf |
|
#000049 §3 — export the pinned shadow-NLI checkpoint to ONNX (int8) |
|
fetch + ingest manifested textbooks → $(TEXTBOOK_DB) |
|
verify the #000057 external judge on known-verdict triples (gate before control-ab) |
|
proxy access-log real-client-IP tally + SVG: make monitor-access LOG=path |
|
render stored load samples to SVG [MONITOR_HOURS=6] |
|
poll LLM endpoint /metrics (+GPU power/util) into SQLite [MONITOR_INTERVAL=10 MONITOR_GPU=name=host] |
|
#000037 Phase 3 sleep-sweep dry-run → markdown report |
|
#000037 §12 measured-pressure probe → markdown report |
|
[root] make Intel RAPL energy_uj readable for watt_bench CPU power |
|
[root] remove the RAPL read-access udev rule |
|
CI gate: full suite minus crawler + wikipedia ingest |
|
ingest one textbook by id: make textbook ID=bogart-ctgd-2017 |
|
Aristotle Posterior Analytics (PD, Wikisource) |
|
Aristotle Prior Analytics (PD, Wikisource) |
|
Bogart Combinatorics Through Guided Discovery (GFDL) |
|
Boole Laws of Thought (PD, PG TeX) |
|
Cantor Contributions to Transfinite Numbers (PD, Jourdain transl. 1915, Wikisource) |
|
Dedekind Essays on the Theory of Numbers 1901 (PD, PG #21016 TeX) |
|
De Morgan First Notions of Logic 1839 (PD, PG #67017) |
|
Grinstead-Snell Introduction to Probability (GFDL, LibreTexts mirror) |
|
Hilbert Foundations of Geometry (PD, PG TeX) |
|
Judson Abstract Algebra: Theory and Applications (GFDL) |
|
Keller-Trotter Applied Combinatorics (CC-BY-SA, slow ~25min crawl-delay) |
|
Laplace Philosophical Essay on Probabilities 1814 / 1902 (PD, PG #58881) |
|
Levin Discrete Mathematics (CC-BY-SA) |
|
list ingestable textbook ids (entries with urls or crawl_url) |
|
Morin Open Data Structures (CC-BY) |
|
Newton Principia (PD, Wikisource) |
|
Peano Arithmetices Principia 1889 (CC-BY-SA, Verheyen+Nahas LaTeX from GitHub) |
|
PLFA - Programming Language Foundations in Agda (CC-BY-4.0) |
|
Whitehead-Russell Principia Mathematica Vol 1 (PD, PG #78050 — preface+intro only) |
|
Russell Introduction to Mathematical Philosophy 1919 (PD, PG #41654) |
|
Russell Principles of Mathematics 1903 (PD content + CC-BY-SA-4.0 typesetting, Klement HTML) |
|
Software Foundations Vol 1 Logical Foundations (MIT, Pierce et al.) |
|
ingest the four 2026-05-09 base-knowledge additions (pillars I/II/III/IX) |
|
stats of the textbook shard |
|
list manifest entries with license + URL counts |
|
ingest every manifest entry that declares a tex_url (Hilbert + Boole) |
|
emit one textbook URL per line to stdout |
|
sample-Merkle-verify the textbook shard |
Permacomputer Preamble — License: AGPL-3.0-only
This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.
Our permacomputer is community-owned infrastructure optimized around four values:
TRUTH — First principles, math & science, open source code freely distributed.
FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.
HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.
LOVE — Be yourself without hurting others, cooperation through natural law.
NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.