SEC EDGAR × 50 Companies × 500 Claims × 3 Verification Layers

The model wasn't hallucinating. It was misbinding.

We asked an AI to summarize the financials of 50 S&P companies from their most recent 10-K filings. Then we checked every number against the SEC's own XBRL data with a deterministic interpreter. No neural network in the loop. 500 claims. Seven failure categories. The receipt is the proof.

No AI grading AI. The interpreter does the math.

500

claims checked
across 50 companies

0.7%

passed exact
text citation

silent accounting
swaps detected

claims reversed
profit vs. loss

Three-layer verification The spread is the signal

cite

0.7%

measure

34.7%

check

5.1%

Where 500 claims landed

True Fab.

Wrong Year

Stale Truth

Concept Sub

Drift

Precision

No Source

190

38.0%

148

29.6%

11.6%

6.6%

2.4%

1.6%

10.2%

Verb key — what each check does ▸

citeDid the AI use words that actually appear in the filing?

measureIs the number close enough to the XBRL value, or did it drift?

checkDid the AI cite the right fiscal year, or a different one?

3 of 25 verbs shown · Full vocabulary at liminate.dev →

What the receipts show 5 claims from the matrix · 1 per failure type

AI CLAIMED

10-K FILING SAYS

LAYERS

✗

Ford basic EPS: $1.50

FY2025 EarningsPerShareBasic: −$2.06

cite ✗ meas ✗ chk ✗

sign misbinding · directionally wrong · claimed profit, filing shows loss

AI CLAIMED

10-K FILING SAYS

LAYERS

✗

UnitedHealth net income: $14.4 billion

FY2025: $22.4B · but FY2021 was $13.8B

cite ✗ meas ✗ chk ✗

stale truth · real number from FY2021 presented as current · 4 years stale

AI CLAIMED

10-K FILING SAYS

LAYERS

✗

Goldman Sachs cash equivalents: $241.0 billion

Unrestricted cash: $182.1B · Restricted-inclusive: $241.8B

cite ✗ meas ✗ chk ✗ concept ↔

concept substitution · silently included restricted cash · $59B definitional swing

Restricted cash is money the firm can't freely spend. Including it in "cash equivalents" overstates available liquidity by $59 billion.

AI CLAIMED

10-K FILING SAYS

LAYERS

✗

Alphabet revenue: $350.0 billion

Revenues: different tag · RevenueFromContract: $350.0B exact

cite ✗ meas ✗ chk ✗ concept ↔

concept substitution · right to the dollar, wrong XBRL concept · no numeric check would flag this

ASC 606 contract revenue and top-line revenue are different line items. The number is identical — only the accounting definition differs.

AI CLAIMED

10-K FILING SAYS

LAYERS

✗

Microsoft: 5 of 10 financial metrics

All 5 match a prior fiscal year exactly

cite ✗ meas ✗ chk ✗ stale ×5

temporal misbinding · 5 stale truths, 0 fabrications · the model remembered last year's Microsoft, not this year's

Concept substitutions

33 silent accounting swaps, caught deterministically.

The value was right. The concept was wrong. No numeric check would flag these — the number lands within tolerance of a neighboring XBRL concept, not the one that was asked for.

Company	Claimed Metric	Actual Metric Returned	Value	Receipt
GS	cash_equivalents	CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents	$241.8B	receipt →
WFC	net_income	NetIncomeLossAvailableToCommonStockholdersBasic	$20.3B	receipt →
C	net_income	NetIncomeLossAvailableToCommonStockholdersBasic	$13.0B	receipt →
BLK	eps_basic	EarningsPerShareDiluted	$42.01	receipt →
JNJ	long_term_debt	LongTermDebtNoncurrent	$30.7B	receipt →
RTX	revenue	Revenues	$80.7B	receipt →
RTX	stockholders_equity	StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest	$74.2B	receipt →
RTX	operating_income	NetIncomeLoss	$6.7B	receipt →
MSFT	cash_equivalents	CashCashEquivalentsAndShortTermInvestments	$75.5B	receipt →
GOOGL	revenue	RevenueFromContractWithCustomerExcludingAssessedTax	$350.0B	receipt →
GOOGL	long_term_debt	DebtInstrumentCarryingAmount	$12.0B	receipt →
GOOGL	cash_equivalents	CashCashEquivalentsAndShortTermInvestments	$95.7B	receipt →
META	eps_diluted	EarningsPerShareBasic	$24.61	receipt →
SPGI	operating_income	IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest	$6.2B	receipt →
ABBV	long_term_debt	LongTermDebtAndCapitalLeaseObligations	$60.3B	receipt →
ABBV	cash_equivalents	CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents	$12.8B	receipt →
ELV	eps_diluted	EarningsPerShareBasic	$25.81	receipt →
ELV	cash_equivalents	CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents	$7.4B	receipt →
NOC	long_term_debt	LongTermDebtNoncurrent	$14.7B	receipt →
INTC	long_term_debt	LongTermDebtNoncurrent	$46.3B	receipt →
AVGO	long_term_debt	DebtInstrumentCarryingAmount	$67.1B	receipt →
DIS	long_term_debt	DebtInstrumentCarryingAmount	$41.3B	receipt →
DIS	shares_outstanding	WeightedAverageNumberOfDilutedSharesOutstanding	1.83B shares	receipt →
SBUX	long_term_debt	LongTermDebtNoncurrent	$14.3B	receipt →
SBUX	operating_income	IncomeLossFromContinuingOperationsBeforeIncomeTaxesExtraordinaryItemsNoncontrollingInterest	$5.4B	receipt →
KO	operating_income	NetIncomeLoss	$13.1B	receipt →
GE	shares_outstanding	WeightedAverageNumberOfDilutedSharesOutstanding	1.10B shares	receipt →
PG	net_income	NetIncomeLossAvailableToCommonStockholdersBasic	$15.7B	receipt →
XOM	stockholders_equity	StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest	$212.5B	receipt →
IBM	net_income	ProfitLoss	$5.7B	receipt →
IBM	stockholders_equity	StockholdersEquityIncludingPortionAttributableToNoncontrollingInterest	$22.6B	receipt →
IBM	long_term_debt	DebtInstrumentCarryingAmount	$56.1B	receipt →
IBM	cash_equivalents	CashCashEquivalentsRestrictedCashAndRestrictedCashEquivalents	$13.1B	receipt →

Severity × prominence

Where the failures hit hardest.

Severe means a numeric error over 20%. Directionally wrong means the sign flipped — profit reported as loss. The pattern holds across every industry tier.

By industry tier · 50 companies

Counts of high-severity classifications per tier.

Tier	Cos	Severe	Dir. Wrong	Stale Truth	Concept Sub	True Fab	Drift	Precision
T1 - Banks & Finance	15	24	0	17	8	49	0	1
T2 - Big Tech	10	29	0	16	11	44	0	0
T3 - High Coverage	10	28	0	6	2	43	8	2
T4 - Consumer / Industrial	15	25	3	19	12	54	4	5

Highest-severity companies	Severe	Dir. Wrong	Stale Truth	Concept Sub	True Fab
NVDA	7	0	1	0	6
AMD	7	0	0	0	7
AVGO	6	0	0	1	5
GOOGL	5	0	1	3	4
AMZN	5	0	1	0	6
NFLX	5	0	2	0	5
GE	5	0	1	1	5
DIS	5	0	0	2	5
F	1	3	1	0	3
UNH	4	0	1	0	5

Ten companies shown, ranked by severe + directionally-wrong claims. The full 50-company matrix lives in the receipts — click any ticker to open its receipt.

Findings

Not hallucination. Misbinding.

The word "hallucination" implies the model invented something from nothing. The data says otherwise. Of 500 financial claims checked against SEC EDGAR XBRL filings, 205 were real values attached to the wrong fiscal year. Another 33 were real values pulled from the wrong accounting concept. Three reversed the direction of a financial result — claiming profit where the filing shows loss.

Only the receipt system can tell you the difference. A single accuracy score would flatten all of this into one number. The three-layer receipt classifies every failure:

Layer	What it checks	Result
cite	Is the text verbatim from the filing?	0.7% pass
measure	Is the number within tolerance?	34.7% pass
check	Is it the right fiscal year?	5.1% pass

The spread between the layers is the signal. cite at 0.7% means the model almost never reproduces the filing's exact text — it paraphrases and rounds. measure at 34.7% means about a third of the numbers land within tolerance of the real value. check at 5.1% means 95% of claims cited the wrong fiscal year — the most common failure mode across all 50 companies.

But the taxonomy goes deeper. What the old "fabrication" label hid was three distinct failure types: 58 stale truths (real numbers from a prior year, averaging 1.4 years old), 33 concept substitutions (numbers from a neighboring XBRL concept — like including restricted cash in "cash equivalents"), and 152 of 190 true fabrications that were within 20% of a real value (average distance: 18.9%). The model wasn't generating random numbers. It was retrieving real values and misbinding them.

The most dangerous failures were the most plausible ones. Goldman Sachs' $241 billion "cash equivalents" was actually restricted-cash-inclusive — off by $59 billion in economic meaning, but only 0.3% from the wrong concept's actual value. Alphabet's $350 billion revenue matched a different revenue XBRL tag to the dollar. No numeric check, no tolerance window, no human review would catch these. Only a receipt that checks the concept, not just the number.

This is what a receipt system does that nothing else can: it doesn't just tell you a claim is wrong. It tells you how it's wrong — and that classification is the difference between a rounding error and a $59 billion misstatement of liquidity.

Every failure in this matrix is the model's mistake, verified against the SEC's own XBRL data. The receipt protects the analyst who uses AI, not the vendor who sells it.

Receipt-only findings

These findings could only be produced by a deterministic receipt system with access to structured source data.

claims reclassified from "fabrication" to stale truths — real numbers from the wrong year. Average staleness: 1.4 years. Oldest: UnitedHealth's FY2021 net income presented as FY2025, four years stale.

silent accounting swaps detected — values from neighboring XBRL concepts presented as the requested metric. Most common: cash equivalents silently including restricted cash (4 times).

claims excluded from failure rates because the source data didn't exist in XBRL — an adjustment no accuracy benchmark makes.

claims reversed profit/loss direction — the model said gain where the filing says loss. All three were Ford. EPS claimed $1.50; actual: −$2.06.

152

of 190 "true fabrication" claims had a closest source match within 20% — suggesting retrieval from plausible-but-wrong sources rather than random generation. Average distance: 18.9%.

scale confusions (millions for billions) detected across all 500 claims. The model always got the order of magnitude right — even when everything else was wrong.

See it yourself