v8 ForkScore
============

The Merkle-AGI v8 selection protocol's **scoring half**. Pure
function over a (parent, child) bench-result pair → a single
scalar with ACCEPT / MARGINAL / REJECT verdict. Phase 1a of
ticket ``#000012``.

The complementary canonicalization half (validator state machine,
acceptance protocol, slashing, fork-choice rule) is reserved for
the v8 paper itself; Phase 1a ships only the function ForkScore
without the consensus surface around it.

Formula
-------

.. code-block:: text

   ForkScore =
       α · Δ5S
     + β · Δ5T
     + γ · Δ5F                            (incl. efficiency-aware bonus)
     + δ · SelfModelCalibrationGain
     + ε · AuditCompleteness
     + ζ · ValidatorDiversity
     - η · RegressionPenalty
     - θ · CapitalCostPenalty
     - ι · SecurityRiskPenalty            (reserved; Phase 1a = 0)
     - κ · ComplexityPenalty              (reserved; Phase 1a = 0)
     - λ · MemoryInvalidationPenalty

Each term consumes the metrics every 5S/5T/5F sub-battery emits in
its :class:`BatteryResult.metrics` dict. The Δ-rate per battery is
the mean of per-sub-battery rate deltas (child — parent).

Δ5F additionally consumes the inf-aware efficiency aggregations
landed under the 2026-05-08 ``fbd99a8`` review:
``adaptation_efficiency_mean_finite``,
``adaptation_efficiency_infinite_count``,
``adaptation_efficiency_neg_infinite_count``,
``feedback_efficiency_mean_finite``,
``feedback_efficiency_infinite_count``.

Verdict thresholds
------------------

============================  ==================  ============
Score / flags                 Verdict             CLI exit
============================  ==================  ============
``score >= SIGNAL_FLOOR``     **ACCEPT**          ``0``
``[0, SIGNAL_FLOOR)``         **MARGINAL**        ``0``
``score < 0``                 **REJECT**          ``1``
hard-regression flag          **REJECT**          ``1``
``NEG_INF_REGRESSION`` flag   **REJECT**          ``1``
============================  ==================  ============

``SIGNAL_FLOOR`` defaults to ``0.05`` (5pp; matches
:file:`docs/bench-maxing.md`'s noise floor).

Hard-regression flag fires if any single sub-battery rate drops by
≥ ``HARD_REGRESSION_FLOOR`` (default 5pp), regardless of net score.
``NEG_INF_REGRESSION`` flag fires when
``*_efficiency_neg_infinite_count`` increases parent → child:
*free regression* is unsafe regardless of other gains.

Default weights
---------------

.. code-block:: python

   WeightSet(
       alpha=1.0,    # Δ5S
       beta=1.0,     # Δ5T
       gamma=1.0,    # Δ5F
       delta=0.5,    # SelfModelCalibrationGain
       epsilon=0.3,  # AuditCompleteness
       zeta=0.0,     # ValidatorDiversity (off in single-validator)
       eta=2.0,      # RegressionPenalty (heavy by design)
       theta=0.5,    # CapitalCostPenalty
       iota=0.0,     # SecurityRiskPenalty (reserved)
       kappa=0.0,    # ComplexityPenalty (reserved)
       lambda_=0.5,  # MemoryInvalidationPenalty
   )

Override via JSON file passed to ``--weights``:

.. code-block:: json

   {
     "alpha": 1.5,
     "eta": 5.0,
     "lambda": 1.0
   }

The JSON key ``"lambda"`` round-trips into ``WeightSet.lambda_``
because ``lambda`` is a Python reserved word.

CLI
---

.. code-block:: bash

   arborist substrate score \
     --parent  parent-bench.json \
     --child   child-bench.json \
     [--weights weights.json] \
     [--capital-delta N] \
     [--memory-invalidation-count N] \
     [--audit-completeness 0..1] \
     [--selfmodel-calibration-gain N] \
     [--out path/to/fork_score_report.json]

``parent-bench.json`` and ``child-bench.json`` are the JSON output
of ``bench.batteries.runner --all``.

``--out`` mirrors stdout to a file; CI / mesh peers / downstream
graders ingest the artifact without parsing pipe output.

Output: :class:`arborist.substrate.fork_score.ScoredFork` with
``score``, ``verdict``, per-term ``breakdown``, ``flags`` list,
and the active ``weights`` echoed back.

Make harness (Phase 1b)
~~~~~~~~~~~~~~~~~~~~~~~

.. code-block:: bash

   make bench-fork-baseline   # one-shot: pin current bench-suite output
                              # as the ForkScore parent (writes
                              # bench/results/baseline-suite.json)
   <hack hack hack>
   make bench-fork-score      # rerun bench, score child vs pinned parent,
                              # write bench/results/fork_score_report.json.
                              # Exits 1 on REJECT verdict — CI-gateable.

Override defaults via env-vars: ``FORK_PARENT``, ``FORK_CHILD``,
``FORK_REPORT``.

What's NOT in Phase 1a
----------------------

- Validator state machine (bonding / signing / slashing).
- Acceptance protocol (proposal / quorum / finalization).
- Challenge protocol (audit-replay disagreement).
- Fork-choice rule (which of two competing finalizations wins).
- Mesh wire format extensions for validator gossip.
- Stake mechanics + economic incentives.
- Cross-validator ZK proof exchange.

These are commissioned by the v8 paper itself
(``docs/merkle-agi-v8-consensus.rst``, still open under #000012).
Phase 1a's scoring function is the substrate the paper consumes;
landing it now lets the paper cite measured values instead of
stipulated ones.
