●SANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)●PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limit●CONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflows●MCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)●CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcement●MODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude Code●SANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)●PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limit●CONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflows●MCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)●CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcement●MODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude Code
Grounding Claude on Your Own Knowledge Base with search_result Blocks
How to stop your own RAG setup from losing track of which article it cited, using Claude's search_result content block and structured citations — with real numbers from running it across four sites.
This starts with an internal agent I built on the Claude API to pick related articles and fact-check claims across the content I publish. I run four technical blogs as an indie developer, and on the Japanese side alone I have more than 600 articles on hand. The usual self-hosted RAG shape: concatenate the search hits into a prompt and ask, "Which of these articles supports this claim?"
The first version worked well enough. But a habit surfaced over time that I could not ignore. Claude would answer "this is described in article X" — and sometimes article X was not among the candidates I had passed in. I also could not trace which part of the concatenated text it had actually read. The provenance dissolved into prose and could not be verified after the fact. For an internal tool, that lack of honesty still bothered me.
The mechanism that forces "which source, which passage" to come back as structured data is Claude's Citations feature, specifically the search_result content block. Here are the notes from swapping it into production and running it for a few weeks, including the potholes along the way.
Why prose citations can't be verified
When you concatenate body text yourself, Claude sees one long string. Headers like ## Article A convey a semantic boundary, but they are not machine-readable reference handles. So the grounding comes back as natural language — "according to Article A" — and you need post-processing to map that back to the source with regular expressions.
That post-processing was brittle. When Claude paraphrased a title slightly, or merged several articles into a vague "based on these," matching failed. In my setup the mechanical match rate plateaued around 70%. The remaining 30% had to be checked by hand, which halved the point of automating it.
The root cause is simply that I was not passing reference IDs. If you don't send them, you can't get them back. The search_result block exists precisely to pass documents in as search results that carry an ID.
Anatomy of the search_result block
search_result is a dedicated block you can place in a message's content array. A single result is expressed as three parts: a source identifier, a title, and an array of text fragments.
search_result_block = { "type": "search_result", "source": "https://claudelab.net/articles/api-sdk/claude-api-prompt-caching-monthly-cost-half-guide", "title": "Halving monthly cost with prompt caching", "content": [ {"type": "text", "text": "Putting a 5-minute TTL cache breakpoint at the end of system..."}, {"type": "text", "text": "A 1-hour TTL suits large static context. Pricing is..."}, ], "citations": {"enabled": True},}
The key point is that content is an array of text blocks, not a string. Claude returns the index of which entry it relied on, so how you split the fragments becomes the granularity of your citations. Splitting per paragraph makes it easy to link back to "this paragraph of the article" later. Note too that only blocks with citations.enabled set to true are eligible to be cited.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Make Claude attach which document and which passage it relied on as structured data, every time, via search_result blocks
✦Move off hand-rolled context concatenation and cut request tokens by ~40% while reducing missed citations
✦Wire the citations array back into article links, plus three search_result gotchas I hit in production
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Minimal implementation: pass results straight through
Take the top few hits from your own full-text search (in my case a simple inverted index over article metadata) and repackage them as search_result blocks for the Messages API.
import anthropicclient = anthropic.Anthropic()def to_search_results(hits): blocks = [] for h in hits: blocks.append({ "type": "search_result", "source": h["url"], "title": h["title"], "content": [{"type": "text", "text": p} for p in h["paragraphs"]], "citations": {"enabled": True}, }) return blockshits = my_search("how to choose prompt cache TTLs", top_k=5)resp = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{ "role": "user", "content": [ *to_search_results(hits), {"type": "text", "text": "Using only the sources above, summarize TTL choices in three points."}, ], }],)
Each text block in the response carries a citations array with cited_text, source, and the index of the fragment it referenced, all structured. No more guessing article names out of prose.
Before / After: moving off hand-rolled concatenation
Here is the change side by side. First, the old hand-rolled approach.
# Before: concatenate body text into one stringcontext = ""for i, h in enumerate(hits): context += f"## Source {i+1}: {h['title']}\n{h['full_text']}\n\n"prompt = f"""Answer using the sources below, and cite as [Source N] at the end.{context}Question: {question}"""# -> citations are prose; matching relies on regex and breaks on paraphrase
Now replaced with search_result.
# After: pass as structured search resultsresp = client.messages.create( model="claude-sonnet-4-6", max_tokens=1024, messages=[{ "role": "user", "content": [*to_search_results(hits), {"type": "text", "text": question}], }],)# citations come back as structured datafor block in resp.content: if block.type != "text": continue for c in getattr(block, "citations", None) or []: print(c.source, "->", c.cited_text[:40])
Two practical effects. First, the instruction telling the model to write citations in prose became unnecessary, and I could pass only the relevant paragraphs from search instead of full_text. That dropped per-request input tokens by about 40% on my workload. Second, the match rate: because results come back keyed by ID, post-processing matching is structurally unnecessary, and the 30% I had been checking by hand essentially vanished.
Wiring citations back into article links
The biggest upside of structured citations is how cleanly they drop into a UI. Insert linked footnotes at the relevant spots in the answer text.
def render_with_footnotes(content_blocks): html, notes = [], [] for block in content_blocks: if block.type != "text": continue cites = getattr(block, "citations", None) or [] if not cites: html.append(escape(block.text)) continue marks = [] for c in cites: idx = len(notes) + 1 notes.append((idx, c.source, c.title)) marks.append(f'<sup><a href="#fn{idx}">[{idx}]</a></sup>') html.append(escape(block.text) + "".join(marks)) foot = "".join( f'<li id="fn{i}"><a href="{escape(src)}">{escape(t)}</a></li>' for i, src, t in notes ) return "".join(html) + f"<ol>{foot}</ol>"
Put the article's canonical URL in source and each footnote links straight to the right article. Since I split content per paragraph, adding an anchor to the URL to jump to the exact paragraph slotted in without trouble. From the reader's side, "where did this answer come from" is now traceable at a glance, and the internal tool's trustworthiness rose noticeably.
Returning search_result from a tool
If your search is split out as a separate tool (or an MCP server), you can return search_result blocks as the contents of a tool_result. The agent calls search itself and the results arrive with citations attached.
# pack search_result into the tool_use responsetool_result = { "type": "tool_result", "tool_use_id": tu.id, "content": to_search_results(my_search(tu.input["query"], top_k=5)),}messages.append({"role": "user", "content": [tool_result]})
In this shape, citations stay consistent even across multi-step loops of search, grounded answer, and deeper follow-up. I wired this into my fact-checking pipeline and run it on a schedule from Claude Code. The search itself stays sealed inside the tool, and only ID-bearing results reach the API, so the separation of concerns lands cleanly too.
Measured results in production
A rough before/after comparison from a few weeks of running it. These are measurements from an internal workload, so treat them as reference figures, but the trend held steady.
Metric
Hand-rolled (Before)
search_result (After)
Mechanical citation match rate
~70%
~100% (returned by ID)
Avg input tokens per request
baseline
~40% reduction
Avg latency to answer
~2.7s
~2.1s
Manual source verification
~30% of all answers
exceptions only (a few %)
Latency fell as a side effect of narrowing the passed text from full bodies to relevant paragraphs. Cost and latency dropped together while provenance became traceable — a rare, clean win.
Three gotchas I hit in production
The migration was not flawless. Here are the actual errors and mismatches, with fixes.
Forget citations.enabled and the citations come back empty. If you are passing blocks but the citations array is always empty, suspect this first. I made the result builder enforce the flag and throw immediately when it is missing.
Coarse fragments produce coarse citations. Put one long block in content and the citation points at "the whole paragraph," dropping footnote precision. Splitting per paragraph — and breaking long paragraphs closer to sentence level — kept the cited range usefully narrow.
A search miss cannot be rescued by citations. Obvious in hindsight: search_result shows grounding from the documents you passed. If the relevant article never made it into top_k, Claude honestly answers that the sources contain no basis. When citations come back empty repeatedly, suspect your own search recall before the API. In production I monitor it alongside search hit counts.
If you try this next
If you still have a self-hosted RAG concatenating context by hand, I'd recommend swapping in search_result somewhere low-stakes first — a read-only verification agent, say. Just having citations return by ID lets a whole chunk of post-processing code disappear, and you feel the verifiability of answers step up. For the fine details, Anthropic's Citations documentation is the accurate primary source.
Having my own tools correctly cite my own writing back to me is a small thing, but it's the kind of quiet reassurance that compounds when you keep running this for the long haul. Thanks for reading.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.