CLAUDE LABJP
SCIENCE — Claude Science launches in beta, a workbench that unifies research tools and produces auditable artifactsMODEL — Fast mode for Claude Opus 4.7 retires on July 24; migrate to Opus 4.8 fast modeCODE — Claude Code v2.1.195 adds a toggle to disable mouse clicks in fullscreen modeCODE — Hyphenated hook matchers now match exactly instead of substring-matchingAGENT — Claude Science pairs a coordinating agent with specialists and a reviewer that checks citations and mathCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native accessSCIENCE — Claude Science launches in beta, a workbench that unifies research tools and produces auditable artifactsMODEL — Fast mode for Claude Opus 4.7 retires on July 24; migrate to Opus 4.8 fast modeCODE — Claude Code v2.1.195 adds a toggle to disable mouse clicks in fullscreen modeCODE — Hyphenated hook matchers now match exactly instead of substring-matchingAGENT — Claude Science pairs a coordinating agent with specialists and a reviewer that checks citations and mathCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access
Articles/Claude Code
Claude Code/2026-07-01Advanced

Don't Accept an Agent's Numbers and Citations As-Is — A Verification Gate Built on a Dedicated Auditor Subagent

A design that verifies every number and citation in an agent-generated summary using a separate subagent before accepting it — with working TypeScript for deterministic recomputation and fail-closed source matching.

Claude Code175subagents6verification4unattended automationClaude Agent SDK12

Premium Article

When Claude Science was announced, the part that stayed with me wasn't the number of new skills. It was the multi-stage shape: a coordinating agent calls specialist agents, and then a dedicated agent verifies the citations and calculations. Treating verification as an independent role, separate from generation, felt like the important idea.

As an indie developer, I run automated jobs for several sites on my own. One morning a generated metrics summary said "+18% week over week," but when I added up the raw numbers by hand, the real figure was +8%. Models produce plausible numbers and plausible-looking citations with unsettling fluency. And when a summary quietly drifts from the underlying data, no one notices as long as they're only reading the summary. Since that morning I've distrusted the very structure of "trusting an artifact's numbers and citations inside the same flow that produced them."

This article builds the missing piece with working code: a pre-acceptance gate that separates generation from verification, recomputes numbers deterministically, matches citations against the source text, and rejects the entire artifact if even one claim fails.

Why you must not verify "as a continuation of generation"

The failure happens when you ask the generating model itself, "Is this correct?" Self-checking inside the same context and the same train of thought leads the model to treat the numbers it just produced as a correct premise, so it overlooks the drift. It's like proofreading your own draft, alone, right after writing it.

The value of an independent verifier comes down to three things.

First, context isolation. The verifier takes only the artifact and the primary data and sources as input; it inherits none of the generation-time reasoning. Second, deterministic judgment. Numbers are recomputed from raw data by a function, not re-confirmed by a language model. Third, fail-closed. A claim that cannot be verified is treated as failing, not as "probably fine," and a single unpassable claim stops the whole artifact.

Extract claims at the right granularity (a claim ledger)

You cannot verify a free-form summary directly. First, extract the smallest verifiable units — numeric claims and citation claims — as structured data. The generating agent must emit this "claim ledger" alongside the summary, every time.

// claims.ts — types for verifiable claims
export type NumericClaim = {
  id: string;
  kind: "number";
  statement: string; // human-readable claim (the spot in the summary)
  metric: string;    // metric key used for recomputation
  value: number;     // the value the model claimed
  tolerance?: number; // relative tolerance (default if omitted)
};
 
export type CitationClaim = {
  id: string;
  kind: "citation";
  statement: string;
  quote: string;     // string that must exist in the source
  sourceId: string;  // identifier of the source text to match against
};
 
export type Claim = NumericClaim | CitationClaim;
 
export type Artifact = {
  summary: string;
  claims: Claim[];
};

The key point is that value is stored as "the value the model claimed." The verifier does not trust this value; it later compares it against a figure it computes itself. The ledger isn't an appendix to the artifact — it is the input to verification.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A number-verification function that recomputes each claimed value deterministically from raw data, with relative tolerance
A fail-closed citation check that normalizes quoted text and confirms it actually exists in the cited source
How to split work between a dedicated auditor subagent and deterministic checks, rejecting the whole artifact if any single claim fails
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Claude Code2026-06-13
Context Budgets for Nested Subagents: Designing Contracts So 5-Level Delegation Doesn't Lose Quality
Once subagents could nest, deeper delegation made summaries thinner and reruns more frequent. Here is how I rebuilt quality by adding four contracts between layers: token budgets, a handoff schema, failure isolation, and an independent grader.
Claude Code2026-06-13
How Deep Should Nested Subagents Go? Rethinking My Delegation Tree After Claude Code's 5-Level Limit
Claude Code subagents can now spawn their own subagents, up to five levels deep. I rebuilt my automation around nesting and settled on three levels — here is why.
Claude Code2026-05-30
Authoring Dynamic Workflows: Building Reusable Research Pipelines with phase / agent / pipeline
A hands-on guide to writing your own Claude Code Dynamic Workflows: the phase / agent / pipeline / parallel primitives, locking outputs with JSON Schema, porting the adversarial-verification pattern, and designing for token cost.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →