●BILLING — 1 day to the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents move to separate monthly credits ($20/$100/$200) metered at full API rates, no rollover●FABLE5 — Claude Fable 5, a Mythos-class model billed as Anthropic's most capable generally available release, is usable in Claude Code v2.1.170+ (launched Jun 9)●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, with smarter model and region handling●ENTERPRISE — Custom roles gain admin permissions, letting members reach billing and privacy settings without Owner access●PLUGINS — New plugin search plus better Chrome, VSCode, and terminal workflows; session, memory, and permission bugs fixed●UI — New setting disables mouse-wheel scroll acceleration in fullscreen; the /model picker now shows model families correctly●BILLING — 1 day to the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents move to separate monthly credits ($20/$100/$200) metered at full API rates, no rollover●FABLE5 — Claude Fable 5, a Mythos-class model billed as Anthropic's most capable generally available release, is usable in Claude Code v2.1.170+ (launched Jun 9)●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, with smarter model and region handling●ENTERPRISE — Custom roles gain admin permissions, letting members reach billing and privacy settings without Owner access●PLUGINS — New plugin search plus better Chrome, VSCode, and terminal workflows; session, memory, and permission bugs fixed●UI — New setting disables mouse-wheel scroll acceleration in fullscreen; the /model picker now shows model families correctly
Letting Claude Read Live Pages: Implementing web_fetch Without the Footguns
How to pull the actual text of official pages and PDFs straight into Claude's context with the web_fetch tool. Covers the URL-validation rule that trips everyone up, the settings that keep token costs down, and why fetch errors arrive as HTTP 200 — based on what I hit running it in production.
I asked a scheduled job to "summarize the latest changes in Claude Code in three lines," and what came back was a tidy summary of features from several months ago. The model answers from its training-time knowledge, so that was expected behavior. The real problem was that the stale summary nearly slipped into an article draft before anyone noticed.
Official changelog pages change almost weekly. If you let Claude read that page itself, the summary is always grounded in the current text. That is exactly what the web_fetch tool does. Unlike web search, it takes a URL you specify and pulls the full page (and PDFs) straight into context — a server-side tool that does the retrieval for you.
As an indie developer running several sites under Dolice Labs on autopilot, I added a "read the official source directly" step into the pipeline, and factual errors in generation dropped noticeably. Here is what actually tripped me up while wiring it in.
web_fetch and web_search play different roles
The first thing people conflate is the difference from web_search. web_search is for "run a query and gather candidates from across the web"; web_fetch is for "open one page you already know about and read all of it."
The rule of thumb is simple: if a URL is already in the context, fetch it; if you do not yet know the URL, search first. Claude chooses to fetch when:
A URL appears in a user message or a previous tool result
The user names a specific resource (an article, a README, a pricing page) and web_search is also enabled, so it can locate the page first
By contrast, an open-ended question like "what are best practices for REST API design?" will not trigger a fetch, because it does not point at a specific page. Knowing this boundary makes it much easier to debug the two classic mismatches: "I gave it a URL but it won't fetch" and "it keeps running a search instead."
Get it running with the smallest call first
The minimal setup in the Python SDK is surprisingly short. You add a single web_fetch entry to the tools array, and the model handles retrieval and reading on its side.
import anthropicclient = anthropic.Anthropic()resp = client.messages.create( model="claude-opus-4-8", max_tokens=1024, messages=[ { "role": "user", "content": "Summarize the key points of this page in three bullets: " "https://platform.claude.com/docs/en/agents-and-tools/tool-use/web-fetch-tool", } ], tools=[ { "type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 3, } ],)for block in resp.content: if block.type == "text": print(block.text)
Because the feature is in beta, some environments require the beta header web-fetch-2025-09-10. With a recent SDK, passing the tool definition alone may just work, but when you see an unsupported-style error, the header is the first thing to check.
Keep in mind that web_fetch is a server tool. Unlike a client tool, where you receive a tool_use, run it yourself, and return the result, the retrieval happens entirely on Anthropic's side. You write no HTTP request. In exchange, the responsibility for which URLs get read stays firmly on your side, as we'll see next.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦When to use web_fetch_20250910 vs web_fetch_20260209, from a minimal call to reading the response
✦The validation rule that a URL must already appear in the conversation, and how to build around it safely
✦Using max_content_tokens, allowed_domains and max_uses to cap both token cost and attack surface in production
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
For PDFs, source.type becomes base64 and media_type becomes application/pdf, so you get base64 data instead of plain text. For summarization you can ignore the distinction, but if you want to store or parse the fetched body yourself, remember to branch between text and base64. retrieved_at matters more than it looks: web_fetch sits behind a cache, so a stale timestamp is your signal that "the latest" was actually a slightly older version.
The biggest trap: the URL must already be in the conversation
This is the one detail that is easy to miss on a quick read of the docs. web_fetchcannot retrieve a URL that has never appeared in the conversation context.
Concretely, the URLs it can fetch are limited to:
URLs written in a user message
URLs in client-side tool results
URLs that came out of a previous web_search or web_fetch
In other words, a URL that Claude assembles by guessing from the prompt ("this is probably the link") gets rejected. This is by design, to prevent data exfiltration — it stops an attacker from getting a URL with secret data baked into the query string fetched.
My first version got stuck right here. I had Claude build a category-page URL from an article slug and tried to fetch it. The result was url_not_allowed. The fix is simple: always write the URL you want read explicitly into the user message yourself. If you need to start from a search, pair it with web_search and let it fetch a URL that appears in those results.
resp = client.messages.create( model="claude-opus-4-8", max_tokens=4096, messages=[ { "role": "user", "content": "Find the latest Claude Code release notes and lay out only the " "single newest change, with citations.", } ], tools=[ {"type": "web_search_20250305", "name": "web_search", "max_uses": 3}, { "type": "web_fetch_20250910", "name": "web_fetch", "max_uses": 5, "citations": {"enabled": True}, }, ],)
This "search, pick a candidate, fetch the full text, answer with citations" flow fits automation where you do not have the URL in hand. If you can name the page, fetch alone; if you do not yet know where it lives, combine with search.
Cap tokens and attack surface at the same time
Using web_fetch carries no extra charge by itself. The cost is purely the tokens of the fetched body once it lands in context. The flip side is that reading a large page unguarded eats tokens fast. As a rough guide, a 100 kB documentation page reaches about 25,000 tokens.
This is where max_content_tokens earns its keep. If the fetched body exceeds the limit, it gets truncated, so for summarization I set it around 8,000. For a 25,000-token page that compresses the intake to roughly 32%, and when you run dozens of monthly scheduled jobs, that one line visibly changes the bill.
Two more worth pairing with it are allowed_domains and max_uses. The former pins retrieval to domains you trust; the latter caps how many fetches a single request can make.
One caveat: you cannot set allowed_domains and blocked_domains in the same request — it is one or the other. For automation with a known set of trusted sources, I recommend the allow-list approach. Starting from an explicit allow list feels far less accident-prone than trying to plug holes after the fact with blocked_domains.
Errors arrive as HTTP 200
Another trap is how errors come back. A failed fetch is not an HTTP error status; it arrives inside a 200 response body as a web_fetch_tool_error. If you only wrote an exception handler, you will never notice the failure.
def extract_fetch_errors(resp): errors = [] for block in resp.content: if getattr(block, "type", None) != "web_fetch_tool_result": continue inner = block.content if getattr(inner, "type", None) == "web_fetch_tool_error": errors.append(inner.error_code) return errorscodes = extract_fetch_errors(resp)if codes: # In automation, log it here and optionally retry against another domain print("web_fetch failed:", codes)
The possible error_code values are worth knowing: url_too_long (URL over 250 characters), url_not_allowed (blocked by domain rules or URL validation), url_not_accessible (the target returned an HTTP error), too_many_requests (rate limited), unsupported_content_type (anything other than text and PDF), max_uses_exceeded, and the internal unavailable.
In practice the ones I hit most were unsupported_content_type and "the body came back nearly empty." web_fetch does not support pages rendered dynamically with JavaScript. Point it at a client-rendered SPA and you get the skeleton with thin content. If you already know the target is a dynamic page, it is more realistic to choose a different retrieval path than to fight web_fetch.
For large documents, reach for dynamic filtering
When you need only a specific section out of a long PDF or a huge document, the newer tool version web_fetch_20260209 helps. Here Claude writes code to filter the fetched body before it loads into context, keeping only what you need and discarding the rest, which holds token use down while preserving accuracy.
resp = client.messages.create( model="claude-opus-4-8", max_tokens=4096, messages=[ { "role": "user", "content": "From this long PDF, extract only the passages about pricing: " "https://example.com/whitepaper.pdf", } ], tools=[ {"type": "web_fetch_20260209", "name": "web_fetch"}, {"type": "code_execution_20250825", "name": "code_execution"}, ],)
Dynamic filtering requires the code execution tool to be enabled, and note that supported models are limited (newer generations such as Fable 5 and Opus 4.8). Conversely, if the page you fetch is small to begin with, the benefit is thin and the setup more complex. My own split is: a page of a few kB goes through the classic web_fetch_20250910, while a research-paper PDF in the hundreds-of-thousands-of-tokens range goes through web_fetch_20260209. Choosing by how much content you pull in, rather than by how new the tool version is, makes both the results and the cost easier to predict.
One next step
Pin a single trusted documentation page with allowed_domains, cap it with max_content_tokens set to 8,000, and have that minimal setup summarize it. Log both retrieved_at and any error codes there, and the production behavior becomes far easier to read once you ship it. Details of the spec do shift, so confirm the tool version and beta header in the official Anthropic documentation each time.
Reading primary sources directly is an unglamorous step, but it moved the needle on factual accuracy more than almost anything else. I hope it helps if you're wrestling with the same accuracy problem in your own generation.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.