Quickstart
==========

Two end-to-end paths. Pick whichever corpus you want first; both share
the same query, verify, falsify, and inspect surfaces.

Install
-------

Arborist needs Python 3.10+, GNU make, ``curl``, and ``bzip2``. SQLite
3.35+ ships with CPython.

.. code-block:: sh

   git clone https://git.unturf.com/engineering/unturf/arborist.git
   cd arborist
   make bootstrap                          # one-time: venv + dev extras

``make bootstrap`` creates ``.venv/``, installs the package in editable
mode with ``[dev,html]`` extras, and exposes ``arborist`` at
``.venv/bin/arborist``. No system-wide install. Re-running is a no-op
if the venv is up to date.

After bootstrap, every workflow lives behind a ``make`` target. Run
``make help`` (or see the auto-generated :doc:`api/makefile` reference)
to list them.

Path A — Wikipedia 2003 (canonical bootstrap dataset)
------------------------------------------------------

.. code-block:: sh

   make fetch-cur                          # download 2003-05-16 snapshot (~82 MB)
   make ingest-cur-attached                # ~3 min: 128k articles, 4 parallel shards
   make distill-shards-parallel            # surface → core (first-sentence)
   make distill-shards-tfidf-parallel      # core → keyword sets for retrieval
   make query Q="What is anarcho-capitalism?"

A Hermes-3 inference runs against the local corpus, picks 4–8 source
articles by Merkle root, and returns an answer plus a verifier label
that names what the lexical verifier could confirm. The four-rung
ladder for claim-lattice modes is ``POINTER-LINKED`` →
``ANCHOR-WARRANTED`` → ``EVIDENCE-WARRANTED`` → ``UNGROUNDED``
(with ``-PARTIAL`` suffix on HYBRID). Repeat the same question and a
cache hit replays in ~100 ms.

Path B — Crawl any live website and query it
---------------------------------------------

.. code-block:: sh

   make bootstrap-crawler                                              # one-time: install [crawler] extras
   make crawl-ingest URL=https://russell.ballestrini.net DEPTH=2       # BFS + ingest
   make query Q="who is Russell Ballestrini?"                          # cross-shard; picks up new shard automatically

The crawl shard is named after the seed hostname
(``crawl_russell_ballestrini_net.db``) under ``~/.arborist/shards/``.
``FAST=1`` enables aggressive crawling for your own sites; ``MAX=N``
caps discovery; ``DEPTH=N`` bounds BFS. Robots ``Disallow`` is always
honored. After ingest, ``make recrawl-check DOMAIN=...`` does a
conditional-HEAD freshness probe per page.

After the answer
----------------

.. code-block:: sh

   make inspect KEY=<cache_key>             # sidecar: classify each unverified span
   make falsify KEY=<cache_key> REASON='…'  # mark wrong, keep history
   make burn    KEY=<cache_key> REASON='…'  # delete (kindergarten only — refuses if children exist)

The query path
--------------

.. figure:: diagrams/query-pipeline.svg
   :alt: Query pipeline
   :width: 100%

Every query runs through the same pipeline:

1. **Search** — FTS5 (body) + SQL ``LIKE`` (title) + ``JOIN`` over
   derivations (TF-IDF core keywords) across every shard.
2. **Concept overlay** — per-shard ``concept_relations`` SQLite table
   widens retrieval via synonyms, narrows via rivalries unless the
   query uses comparative phrasing.
3. **Context assembly** — top-K sources concatenated up to a 60 KB
   budget. Wikitext stripped to plain prose.
4. **LLM** — Hermes-3 with strict attribution rules.
5. **Verifier** — every claim runs through a layered lexical check;
   result rolls up into the v9.8 trichotomy
   (``audit_mode ∈ STRICT / HYBRID / UNGROUNDED``) at the schema layer
   AND a four-rung display ladder at render time.
6. **Cache** — the v9.8 8-dim ``cache_key`` keys the answer in the
   shard. Cache hits replay in ~100 ms.

LLM endpoint defaults to ``https://hermes.ai.unturf.com/v1`` (Hermes-3
Llama-3.1-8B, 82K context, no auth). Override:

.. code-block:: sh

   export ARBORIST_LLM_ENDPOINT="https://your-vllm.example/v1"
   export ARBORIST_LLM_MODEL="meta-llama/Llama-3.1-70B-Instruct"
   export ARBORIST_LLM_API_KEY="..."

Where to next
-------------

* :doc:`api/makefile` — every ``make`` target with a one-line description
* :doc:`api/cli` — direct ``arborist`` CLI reference
* :doc:`api/qa` — Q&A pipeline internals (verifier, evidence, DAG)
* :doc:`api/substrate` — Merkle tree + document primitives
* :doc:`license` — full AGPL + Permacomputer Preamble
