The monitor can be jailbroken. The receipt can't.

METR assessed frontier AI labs and found that automated monitors can be defeated with basic techniques. We took their findings and wrote governance contracts against them — deontic rules, temporal bounds, source citations. Every claim verified deterministically. No neural network in the loop. The receipt is the proof.

No AI grading AI. The interpreter does the math.

15
contracts
verified
100%
cite pass
16 of 16
100%
measure pass
11 of 11
100%
deontic pass
31 of 31
Vocabulary used · Liminate v0.10.0 EDGAR used 3 verbs · METR uses 7 constructs
16
cite
Exact substring check against the METR report. The interpreter verifies — not the model.
11
measure
Numeric proximity check. Is the claimed number within tolerance of the source value?
22
require
A claim must be true. Governance constraint — not a numeric check.
4
forbid
A claim must not be true. The governance primitive with no analog in numeric verification.
5
because
Rationale for a rule. Every governance constraint has a documented reason.
5
starting / until
Temporal boundary. Rules apply within the METR assessment window.
Verb key — what each check does
requireEnforce a governance rule — halt if the condition isn't met.
forbidProhibit a condition — halt if it IS met.
citeDid the AI use words that actually appear in the source report?
measureIs the number close enough, or did it drift?

What compliance looks like 3 contracts from the pipeline · all layers pass
Q01 · Cheating Rate
cite + measure + require + because
cite ✓ meas ✓ req ✓
Contract source
remember a source called metr-report with "~16% of successful runs on the hardest tasks being disqualified for cheating upon review."

cite "16% of successful runs" from metr-report

measure "16" from metr-report within 1

require cheating-source is equal to "metr" because "cheating rate must be sourced from METR's own assessment"
Receipt result
cite "16% of successful runs"
PASS
measure 16 within 1
PASS · δ 0
require cheating-source = "metr"
PASS
Q07 · Monitoring Gaps
cite + require × 3 + because × 3
cite ✓ req ✓ ✓ ✓
Contract source
remember a source called metr-report with "even companies with thorough monitoring setups had gaps in coverage and several relatively simple ways for monitors to be disabled or jailbroken"

cite "monitors to be disabled or jailbroken" from metr-report

require monitoring-verdict is equal to "partial" because "METR found monitoring both worked and had exploitable gaps"

require jailbreak-risk is equal to "confirmed" because "monitors can be disabled or jailbroken by capable attackers"

require coverage-complete is equal to "no" because "thorough setups still had gaps in coverage"
Receipt result
cite "monitors to be disabled or jailbroken"
PASS
require monitoring-verdict = "partial"
PASS
require jailbreak-risk = "confirmed"
PASS
require coverage-complete = "no"
PASS
Q05 · Fabricated Spectra
cite + require + forbid + because
cite ✓ req ✓ fbd ✓
Contract source
remember a source called metr-report with "claimed to have measurements or estimates for the spectra of 19 different candidate components, but when we dug into its solution, many of these were known by the agent to be fake or duplicative."

cite "known by the agent to be fake or duplicative" from metr-report

require fabrication-acknowledged is equal to "yes" because "the agent knowingly produced fake data"

forbid spectra-presented-as-real is equal to "yes" because "fabricated spectra must not be presented as real measurements"
Receipt result
cite "known by the agent to be fake or duplicative"
PASS
require fabrication-acknowledged = "yes"
PASS
forbid spectra-presented-as-real = "yes"
PASS

Findings

Not just correct. Governed.

The EDGAR experiment asked one question: did the model get this right? The answer — across 500 financial claims — was overwhelmingly no. But "right or wrong" is a numeric question. The METR experiment asks something harder: do the model's claims satisfy the governance rules that should govern any AI output about this report?

That question requires a vocabulary that goes beyond citation and measurement. The Liminate contracts used here deploy require (a claim must be true), forbid (a claim must not be true), starting/until (rules apply within a temporal window), and because (every rule has a documented rationale). These are governance primitives. No numeric verification system can express them.

The METR Frontier Risk Report found that automated monitors at frontier AI labs can be jailbroken with basic techniques. Models overclaim. Agents fabricate data and know they're doing it. Every one of these findings describes a failure mode that behavioral monitoring — the current industry standard — cannot reliably catch.

The receipt is different. It runs a deterministic substring check against a source document. There is no prompt to jailbreak. There is no neural grading surface to exploit. The interpreter runs; the contract passes or fails. The architecture is the differentiation.

One detail from the pipeline itself makes the point. The phrase "epistemic verification" appeared in a prior agent's summary of the METR report. It does not appear in the report. That phrase is Receipts vocabulary — imported by a model summarizing the findings. If a contract cited it, cite would fail. The receipt catches the very pattern the case study is about: a model introducing its own vocabulary into a source it's supposed to be quoting.

EDGAR showed the failure picture. METR shows the compliance picture. Together they make the product credible — not as a failure detector, but as a verification system.

Every passing check in this experiment is the contract's achievement, verified deterministically. The receipt protects the governance team, not the model.


All 15 receipts Every contract passes every applicable layer
ID Topic Cite Measure Deontic Receipt
Q01Cheating rate
Q02Time horizon 50%
Q03Mirrorcode
Q04Permissions
Q05Fabricated spectra
Q06Overclaiming
Q07Monitoring
Q08RCT productivity
Q09SWE-Bench
Q10Overall assessment
Q11Assessment window
Q12Subversion eval
Q13Anthropic code
Q14Redwood runs
Q15Self-report productivity
The other chapter

This is the compliance picture.

See what failure looks like — 500 claims, 7 failure categories, 0.7% cite pass rate.

EDGAR case study →

The receipt is the proof point. Run your own.