Quickstart#
Two end-to-end paths. Pick whichever corpus you want first; both share the same query, verify, falsify, and inspect surfaces.
Install#
Arborist needs Python 3.10+, GNU make, curl, and bzip2. SQLite
3.35+ ships with CPython.
git clone https://git.unturf.com/engineering/unturf/arborist.git
cd arborist
make bootstrap # one-time: venv + dev extras
make bootstrap creates .venv/, installs the package in editable
mode with [dev,html] extras, and exposes arborist at
.venv/bin/arborist. No system-wide install. Re-running is a no-op
if the venv is up to date.
After bootstrap, every workflow lives behind a make target. Run
make help (or see the auto-generated Makefile reference reference)
to list them.
Path A — Wikipedia 2003 (canonical bootstrap dataset)#
make fetch-cur # download 2003-05-16 snapshot (~82 MB)
make ingest-cur-attached # ~3 min: 128k articles, 4 parallel shards
make distill-shards-parallel # surface → core (first-sentence)
make distill-shards-tfidf-parallel # core → keyword sets for retrieval
make query Q="What is anarcho-capitalism?"
A Hermes-3 inference runs against the local corpus, picks 4–8 source
articles by Merkle root, and returns an answer plus a verifier label
that names what the lexical verifier could confirm. The four-rung
ladder for claim-lattice modes is POINTER-LINKED →
ANCHOR-WARRANTED → EVIDENCE-WARRANTED → UNGROUNDED
(with -PARTIAL suffix on HYBRID). Repeat the same question and a
cache hit replays in ~100 ms.
Path B — Crawl any live website and query it#
make bootstrap-crawler # one-time: install [crawler] extras
make crawl-ingest URL=https://russell.ballestrini.net DEPTH=2 # BFS + ingest
make query Q="who is Russell Ballestrini?" # cross-shard; picks up new shard automatically
The crawl shard is named after the seed hostname
(crawl_russell_ballestrini_net.db) under ~/.arborist/shards/.
FAST=1 enables aggressive crawling for your own sites; MAX=N
caps discovery; DEPTH=N bounds BFS. Robots Disallow is always
honored. After ingest, make recrawl-check DOMAIN=... does a
conditional-HEAD freshness probe per page.
After the answer#
make inspect KEY=<cache_key> # sidecar: classify each unverified span
make falsify KEY=<cache_key> REASON='…' # mark wrong, keep history
make burn KEY=<cache_key> REASON='…' # delete (kindergarten only — refuses if children exist)
The query path#
Every query runs through the same pipeline:
Search — FTS5 (body) + SQL
LIKE(title) +JOINover derivations (TF-IDF core keywords) across every shard.Concept overlay — per-shard
concept_relationsSQLite table widens retrieval via synonyms, narrows via rivalries unless the query uses comparative phrasing.Context assembly — top-K sources concatenated up to a 60 KB budget. Wikitext stripped to plain prose.
LLM — Hermes-3 with strict attribution rules.
Verifier — every claim runs through a layered lexical check; result rolls up into the v9.8 trichotomy (
audit_mode ∈ STRICT / HYBRID / UNGROUNDED) at the schema layer AND a four-rung display ladder at render time.Cache — the v9.8 8-dim
cache_keykeys the answer in the shard. Cache hits replay in ~100 ms.
LLM endpoint defaults to https://hermes.ai.unturf.com/v1 (Hermes-3
Llama-3.1-8B, 82K context, no auth). Override:
export ARBORIST_LLM_ENDPOINT="https://your-vllm.example/v1"
export ARBORIST_LLM_MODEL="meta-llama/Llama-3.1-70B-Instruct"
export ARBORIST_LLM_API_KEY="..."
Where to next#
Makefile reference — every
maketarget with a one-line descriptionCLI: Command-line interface — direct
arboristCLI referenceQ&A Pipeline: question → answer → verify → cache — Q&A pipeline internals (verifier, evidence, DAG)
Substrate: Core data structures — Merkle tree + document primitives
License — full AGPL + Permacomputer Preamble
Permacomputer Preamble — License: AGPL-3.0-only
This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.
Our permacomputer is community-owned infrastructure optimized around four values:
TRUTH — First principles, math & science, open source code freely distributed.
FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.
HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.
LOVE — Be yourself without hurting others, cooperation through natural law.
NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.