CLAUDE LABJP
DESIGN — Claude Design gets a major update: design-system imports, direct canvas editing, and more export formatsCODE — Claude Design can start from your local codebase and hand a design off to Claude Code to implementFABLE — Fable 5, a Mythos-class model made safe for general use, is now available in Claude Code v2.1.170FIX — Mid-stream connection drops now preserve partial responses instead of showing a raw errorSCROLL — A new wheelScrollAccelerationEnabled setting disables mouse-wheel scroll acceleration in fullscreenTIER — The Claude Design beta is available to Pro, Max, Team, and Enterprise customersDESIGN — Claude Design gets a major update: design-system imports, direct canvas editing, and more export formatsCODE — Claude Design can start from your local codebase and hand a design off to Claude Code to implementFABLE — Fable 5, a Mythos-class model made safe for general use, is now available in Claude Code v2.1.170FIX — Mid-stream connection drops now preserve partial responses instead of showing a raw errorSCROLL — A new wheelScrollAccelerationEnabled setting disables mouse-wheel scroll acceleration in fullscreenTIER — The Claude Design beta is available to Pro, Max, Team, and Enterprise customers
Articles/API & SDK
API & SDK/2026-06-17Advanced

When Claude API Extracts the Wrong Value With Full Confidence — Designing the Verification Layer

When you extract invoices or contracts with Claude API, the scariest failure isn't an exception — it's plausible-but-wrong JSON. Here is how I build a verification layer that catches silent extraction errors with schema checks, arithmetic reconciliation, and dual-extraction agreement, in TypeScript.

Claude API76document processing2structured extractionverification3TypeScript17production98

Premium Article

A few weeks into running an automated invoice intake, someone asked me why the accounting numbers didn't add up even though the error logs were spotless. The cause was easy to find. On one low-resolution scan, Claude had misread a digit in the total amount, reported it as confidence: 0.96, and sailed through without throwing anything.

As an indie developer running several services in parallel, I assumed at first that exception handling alone could keep this class of error out. The most dangerous thing in structured extraction isn't the API going down or the JSON breaking. It is a plausible but wrong value flowing downstream without raising a single exception. JSON.parse succeeds, Zod passes, the logs stay green, and only the ledger quietly drifts. This article is a field note on how to build the verification layer that catches those silent errors before they reach production. It isn't about assembling the extraction pipeline itself — it's about the judgment that comes after extraction: whether the result can be trusted at all.

Don't treat confidence as a signal

The first assumption to drop is that the model's confidence is a usable quality metric. An LLM's self-reported confidence correlates almost not at all with whether the output is correct. The more badly an invoice is misread, the more decisively it can return a high number. Confidence is the model's self-assessment of its own output, not the result of checking against external truth.

So I never use confidence as a gate — only for prioritization. If a value comes back low, it goes near the front of the human review queue; that's all. The accept/reject decision is made by verification that lives outside the model. There are three sources of verification: schema (is it structurally valid?), arithmetic reconciliation (do the numbers agree internally?), and dual-extraction agreement (does an independent pass produce the same result?). Let's go through them.

Stage one: schema guards shape, nothing more

Zod validation is the first wall, but you have to size what it can defend accurately. A schema can only reject structural anomalies — wrong type, missing required field, a value outside an enum. It can guarantee that total is a number, but it can say nothing about whether that number is right.

Even so, weaving extraction-specific constraints into the schema drops a fair share of errors at the first stage. Force dates into ISO format, make amounts non-negative, restrict currency to the ISO 4217 enum. By dropping every "impossible shape" here, the arithmetic stage downstream can focus purely on substantive errors.

// src/schema.ts
import { z } from "zod";
 
const isoDate = z
  .string()
  .regex(/^\d{4}-\d{2}-\d{2}$/, "Extract as YYYY-MM-DD")
  .refine((s) => !Number.isNaN(Date.parse(s)), "Must be a real date");
 
const money = z.number().finite().nonnegative();
 
export const InvoiceSchema = z.object({
  invoiceNumber: z.string().min(1).optional(),
  issueDate: isoDate.optional(),
  dueDate: isoDate.optional(),
  vendor: z.object({ name: z.string().min(1), taxId: z.string().optional() }),
  lineItems: z
    .array(
      z.object({
        description: z.string().min(1),
        quantity: z.number().positive().optional(),
        unitPrice: money.optional(),
        amount: money,
      })
    )
    .min(1, "An invoice with zero line items is treated as a failed extraction"),
  subtotal: money.optional(),
  tax: money.optional(),
  total: money,
  currency: z.string().length(3), // ISO 4217
});
 
export type Invoice = z.infer<typeof InvoiceSchema>;

The quiet win is .min(1) on lineItems. An invoice with no line items essentially never exists, so if zero comes back you can declare the extraction failed. When you design the schema as a "detector of business-impossible shapes" rather than a "data-correctness check," the first net tightens considerably.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A verification layer that ignores the model's self-reported confidence and instead catches bad extractions mechanically through schema, arithmetic reconciliation, and dual-extraction agreement
Checksum validation that uses arithmetic constraints like subtotal + tax = total to reject plausible-looking errors, plus how to draw the line for routing to human review
A staged design that extracts with Sonnet first and only re-arbitrates disagreements with Opus, balanced against prompt caching and batch processing for realistic cost allocation
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-17
Making the Numbers Add Up in a Multi-Tenant Claude API SaaS — Field Notes on Isolation and Cost Attribution
The first thing that breaks when you make a Claude API SaaS multi-tenant is the month-end reconciliation. Here are field notes on a single metering chokepoint, atomic counters, reconciling against Anthropic's bill, and proving tenant isolation with adversarial tests — with production TypeScript.
API & SDK2026-05-30
Continuing past max_tokens in the Claude API without duplicated text or broken code fences
Detect stop_reason: max_tokens, continue the generation with an assistant prefill, and stitch the parts back together without duplicated seams or broken code fences. A production-tested continuation pattern in TypeScript.
API & SDK2026-06-16
Keep a Decision Rationale Ledger for Autonomous Agents — So You Can Explain 'Why' Later
When an autonomous agent takes hard-to-reverse actions like a production deploy or a bulk delete, capture the chosen option, rejected alternatives, and assumptions in a structured ledger. Includes structured output, an append-only log, and tiering by impact.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →