CLAUDE LABJP
SANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limitCONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflowsMCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcementMODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude CodeSANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limitCONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflowsMCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcementMODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude Code
Articles/API & SDK
API & SDK/2026-06-19Advanced

Grounding Claude on Your Own Knowledge Base with search_result Blocks

How to stop your own RAG setup from losing track of which article it cited, using Claude's search_result content block and structured citations — with real numbers from running it across four sites.

Claude API81CitationsRAG3search_resultindie developer14

Premium Article

This starts with an internal agent I built on the Claude API to pick related articles and fact-check claims across the content I publish. I run four technical blogs as an indie developer, and on the Japanese side alone I have more than 600 articles on hand. The usual self-hosted RAG shape: concatenate the search hits into a prompt and ask, "Which of these articles supports this claim?"

The first version worked well enough. But a habit surfaced over time that I could not ignore. Claude would answer "this is described in article X" — and sometimes article X was not among the candidates I had passed in. I also could not trace which part of the concatenated text it had actually read. The provenance dissolved into prose and could not be verified after the fact. For an internal tool, that lack of honesty still bothered me.

The mechanism that forces "which source, which passage" to come back as structured data is Claude's Citations feature, specifically the search_result content block. Here are the notes from swapping it into production and running it for a few weeks, including the potholes along the way.

Why prose citations can't be verified

When you concatenate body text yourself, Claude sees one long string. Headers like ## Article A convey a semantic boundary, but they are not machine-readable reference handles. So the grounding comes back as natural language — "according to Article A" — and you need post-processing to map that back to the source with regular expressions.

That post-processing was brittle. When Claude paraphrased a title slightly, or merged several articles into a vague "based on these," matching failed. In my setup the mechanical match rate plateaued around 70%. The remaining 30% had to be checked by hand, which halved the point of automating it.

The root cause is simply that I was not passing reference IDs. If you don't send them, you can't get them back. The search_result block exists precisely to pass documents in as search results that carry an ID.

Anatomy of the search_result block

search_result is a dedicated block you can place in a message's content array. A single result is expressed as three parts: a source identifier, a title, and an array of text fragments.

search_result_block = {
    "type": "search_result",
    "source": "https://claudelab.net/articles/api-sdk/claude-api-prompt-caching-monthly-cost-half-guide",
    "title": "Halving monthly cost with prompt caching",
    "content": [
        {"type": "text", "text": "Putting a 5-minute TTL cache breakpoint at the end of system..."},
        {"type": "text", "text": "A 1-hour TTL suits large static context. Pricing is..."},
    ],
    "citations": {"enabled": True},
}

The key point is that content is an array of text blocks, not a string. Claude returns the index of which entry it relied on, so how you split the fragments becomes the granularity of your citations. Splitting per paragraph makes it easy to link back to "this paragraph of the article" later. Note too that only blocks with citations.enabled set to true are eligible to be cited.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Make Claude attach which document and which passage it relied on as structured data, every time, via search_result blocks
Move off hand-rolled context concatenation and cut request tokens by ~40% while reducing missed citations
Wire the citations array back into article links, plus three search_result gotchas I hit in production
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-05-27
Tail Latency in Scheduled Claude API Workloads: A Three-Layer Guardrail Against Retry Storms
After running six sites in parallel through scheduled Claude API tasks for several months, 14 days of logs revealed three distinct p95/p99 patterns and a retry storm I had been creating from my own client. This is the guardrail design I landed on — jitter, budget, circuit breaker — with the before/after numbers.
API & SDK2026-05-05
Building an Internal Document Search Agent with Claude API — Hybrid RAG, Role-Based Access Control, and Audit Logging in Production
Build a production-grade internal document search agent using Claude API and Python. Covers hybrid RAG (pgvector + BM25), department-level RBAC via PostgreSQL RLS, and compliance-ready audit logging — with working code for each component.
API & SDK2026-04-06
Building a Persistent Memory Agent with Claude API, pgvector, and Redis: A Complete
A complete guide to building production-ready persistent memory for Claude API agents using PostgreSQL + pgvector + Redis. Learn vector search, layered memory architecture, session management, and GDPR-compliant data handling.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →