Three experiments · Three verification pictures · One receipt system

Three experiments. Three questions. One receipt system.

The EDGAR experiment asked whether AI gets financial data right. The METR experiment asked whether AI claims satisfy the governance rules that should govern them. The DailyMed experiment asked whether AI knows the safety rules for FDA-approved drugs — not just the numbers, but the warnings that protect patients. Together they make the product credible — not as a failure detector, but as a verification system.

No AI grading AI. The interpreter does the math.

SEC EDGAR × 50 Companies

The failure picture

Liminate v0.7.0 · cite / measure / check
500
claims checked
0.7%
cite pass
7
failure categories

The model wasn't hallucinating. It was misbinding.

Read the EDGAR case study →
METR Frontier Risk Report × 15 Contracts

The compliance picture

Liminate v0.10.0 · require / forbid / cite / measure
15
contracts verified
100%
all three layers
7
vocabulary constructs

The monitor can be jailbroken. The receipt can't.

Read the METR case study →
DailyMed FDA Drug Labels × 15 Drugs

The safety picture

Liminate v0.10.0 · require / forbid / cite / measure
45
contracts verified
52.7%
overall pass
6.5%
deontic pass

The model knows the numbers. It doesn't know the rules.

Read the DailyMed case study →

EDGAR is the failure picture. METR is the compliance picture. DailyMed is the safety picture. A system that only catches failure is a failure detector. A system that also confirms compliance — and exposes where the safety rules go unmet — is a verification system. The receipt tells you which one you're dealing with. Every receipt is the same thing: the interpreter checking the AI's work by hand, against the source, with no shortcuts. The receipt protects the person doing the work, not the system that produced it.

The receipt is the proof point. Run your own.