v8 ForkScore#

The Merkle-AGI v8 selection protocol’s scoring half. Pure function over a (parent, child) bench-result pair → a single scalar with ACCEPT / MARGINAL / REJECT verdict. Phase 1a of ticket #000012.

The complementary canonicalization half (validator state machine, acceptance protocol, slashing, fork-choice rule) is reserved for the v8 paper itself; Phase 1a ships only the function ForkScore without the consensus surface around it.

Formula#

ForkScore =
    α · Δ5S
  + β · Δ5T
  + γ · Δ5F                            (incl. efficiency-aware bonus)
  + δ · SelfModelCalibrationGain
  + ε · AuditCompleteness
  + ζ · ValidatorDiversity
  - η · RegressionPenalty
  - θ · CapitalCostPenalty
  - ι · SecurityRiskPenalty            (reserved; Phase 1a = 0)
  - κ · ComplexityPenalty              (reserved; Phase 1a = 0)
  - λ · MemoryInvalidationPenalty

Each term consumes the metrics every 5S/5T/5F sub-battery emits in its BatteryResult.metrics dict. The Δ-rate per battery is the mean of per-sub-battery rate deltas (child — parent).

Δ5F additionally consumes the inf-aware efficiency aggregations landed under the 2026-05-08 fbd99a8 review: adaptation_efficiency_mean_finite, adaptation_efficiency_infinite_count, adaptation_efficiency_neg_infinite_count, feedback_efficiency_mean_finite, feedback_efficiency_infinite_count.

Verdict thresholds#

Score / flags	Verdict	CLI exit
`score >= SIGNAL_FLOOR`	ACCEPT	`0`
`[0, SIGNAL_FLOOR)`	MARGINAL	`0`
`score < 0`	REJECT	`1`
hard-regression flag	REJECT	`1`
`NEG_INF_REGRESSION` flag	REJECT	`1`

SIGNAL_FLOOR defaults to 0.05 (5pp; matches docs/bench-maxing.md’s noise floor).

Hard-regression flag fires if any single sub-battery rate drops by ≥ HARD_REGRESSION_FLOOR (default 5pp), regardless of net score. NEG_INF_REGRESSION flag fires when *_efficiency_neg_infinite_count increases parent → child: free regression is unsafe regardless of other gains.

Default weights#

WeightSet(
    alpha=1.0,    # Δ5S
    beta=1.0,     # Δ5T
    gamma=1.0,    # Δ5F
    delta=0.5,    # SelfModelCalibrationGain
    epsilon=0.3,  # AuditCompleteness
    zeta=0.0,     # ValidatorDiversity (off in single-validator)
    eta=2.0,      # RegressionPenalty (heavy by design)
    theta=0.5,    # CapitalCostPenalty
    iota=0.0,     # SecurityRiskPenalty (reserved)
    kappa=0.0,    # ComplexityPenalty (reserved)
    lambda_=0.5,  # MemoryInvalidationPenalty
)

Override via JSON file passed to --weights:

{
  "alpha": 1.5,
  "eta": 5.0,
  "lambda": 1.0
}

The JSON key "lambda" round-trips into WeightSet.lambda_ because lambda is a Python reserved word.

CLI#

arborist substrate score \
  --parent  parent-bench.json \
  --child   child-bench.json \
  [--weights weights.json] \
  [--capital-delta N] \
  [--memory-invalidation-count N] \
  [--audit-completeness 0..1] \
  [--selfmodel-calibration-gain N] \
  [--out path/to/fork_score_report.json]

parent-bench.json and child-bench.json are the JSON output of bench.batteries.runner --all.

--out mirrors stdout to a file; CI / mesh peers / downstream graders ingest the artifact without parsing pipe output.

Output: arborist.substrate.fork_score.ScoredFork with score, verdict, per-term breakdown, flags list, and the active weights echoed back.

Make harness (Phase 1b)#

make bench-fork-baseline   # one-shot: pin current bench-suite output
                           # as the ForkScore parent (writes
                           # bench/results/baseline-suite.json)
<hack hack hack>
make bench-fork-score      # rerun bench, score child vs pinned parent,
                           # write bench/results/fork_score_report.json.
                           # Exits 1 on REJECT verdict — CI-gateable.

Override defaults via env-vars: FORK_PARENT, FORK_CHILD, FORK_REPORT.

What’s NOT in Phase 1a#

Validator state machine (bonding / signing / slashing).
Acceptance protocol (proposal / quorum / finalization).
Challenge protocol (audit-replay disagreement).
Fork-choice rule (which of two competing finalizations wins).
Mesh wire format extensions for validator gossip.
Stake mechanics + economic incentives.
Cross-validator ZK proof exchange.

These are commissioned by the v8 paper itself (docs/merkle-agi-v8-consensus.rst, still open under #000012). Phase 1a’s scoring function is the substrate the paper consumes; landing it now lets the paper cite measured values instead of stipulated ones.

Permacomputer Preamble — License: AGPL-3.0-only

This is free software for the public good of a permacomputer hosted at permacomputer.com, an always-on computer by the people, for the people. Durable, easy to repair, & distributed like tap water for machine learning intelligence.

Our permacomputer is community-owned infrastructure optimized around four values:

TRUTH — First principles, math & science, open source code freely distributed.
FREEDOM — Voluntary partnerships, freedom from tyranny & corporate control.
HARMONY — Minimal waste, self-renewing systems with diverse thriving connections.
LOVE — Be yourself without hurting others, cooperation through natural law.

NO WARRANTY. Software is provided “AS IS” without warranty of any kind. Full text: License.

v8 ForkScore

Contents