Quickstart#

Two end-to-end paths. Pick whichever corpus you want first; both share the same query, verify, falsify, and inspect surfaces.

Install#

Arborist needs Python 3.10+, GNU make, curl, and bzip2. SQLite 3.35+ ships with CPython.

git clone https://git.unturf.com/engineering/unturf/arborist.git
cd arborist
make bootstrap                          # one-time: venv + dev extras

make bootstrap creates .venv/, installs the package in editable mode with [dev,html] extras, and exposes arborist at .venv/bin/arborist. No system-wide install. Re-running is a no-op if the venv is up to date.

After bootstrap, every workflow lives behind a make target. Run make help (or see the auto-generated Makefile reference reference) to list them.

Path A — Wikipedia 2003 (canonical bootstrap dataset)#

make fetch-cur                          # download 2003-05-16 snapshot (~82 MB)
make ingest-cur-attached                # ~3 min: 128k articles, 4 parallel shards
make distill-shards-parallel            # surface → core (first-sentence)
make distill-shards-tfidf-parallel      # core → keyword sets for retrieval
make query Q="What is anarcho-capitalism?"

A Hermes-3 inference runs against the local corpus, picks 4–8 source articles by Merkle root, and returns an answer plus a verifier label that names what the lexical verifier could confirm. The four-rung ladder for claim-lattice modes is POINTER-LINKEDANCHOR-WARRANTEDEVIDENCE-WARRANTEDUNGROUNDED (with -PARTIAL suffix on HYBRID). Repeat the same question and a cache hit replays in ~100 ms.

Path B — Crawl any live website and query it#

make bootstrap-crawler                                              # one-time: install [crawler] extras
make crawl-ingest URL=https://russell.ballestrini.net DEPTH=2       # BFS + ingest
make query Q="who is Russell Ballestrini?"                          # cross-shard; picks up new shard automatically

The crawl shard is named after the seed hostname (crawl_russell_ballestrini_net.db) under ~/.arborist/shards/. FAST=1 enables aggressive crawling for your own sites; MAX=N caps discovery; DEPTH=N bounds BFS. Robots Disallow is always honored. After ingest, make recrawl-check DOMAIN=... does a conditional-HEAD freshness probe per page.

After the answer#

make inspect KEY=<cache_key>             # sidecar: classify each unverified span
make falsify KEY=<cache_key> REASON='…'  # mark wrong, keep history
make burn    KEY=<cache_key> REASON='…'  # delete (kindergarten only — refuses if children exist)

The query path#

Query pipeline

Every query runs through the same pipeline:

  1. Search — FTS5 (body) + SQL LIKE (title) + JOIN over derivations (TF-IDF core keywords) across every shard.

  2. Concept overlay — per-shard concept_relations SQLite table widens retrieval via synonyms, narrows via rivalries unless the query uses comparative phrasing.

  3. Context assembly — top-K sources concatenated up to a 60 KB budget. Wikitext stripped to plain prose.

  4. LLM — Hermes-3 with strict attribution rules.

  5. Verifier — every claim runs through a layered lexical check; result rolls up into the v9.8 trichotomy (audit_mode STRICT / HYBRID / UNGROUNDED) at the schema layer AND a four-rung display ladder at render time.

  6. Cache — the v9.8 8-dim cache_key keys the answer in the shard. Cache hits replay in ~100 ms.

LLM endpoint defaults to https://hermes.ai.unturf.com/v1 (Hermes-3 Llama-3.1-8B, 82K context, no auth). Override:

export ARBORIST_LLM_ENDPOINT="https://your-vllm.example/v1"
export ARBORIST_LLM_MODEL="meta-llama/Llama-3.1-70B-Instruct"
export ARBORIST_LLM_API_KEY="..."

Where to next#


Permacomputer Preamble — License: AGPL-3.0-only

This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.

Our permacomputer is community-owned infrastructure optimized around four values:

  • TRUTH — First principles, math & science, open source code freely distributed.

  • FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.

  • HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.

  • LOVE — Be yourself without hurting others, cooperation through natural law.

NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.