●MODEL — Claude Opus 4.8 improves coding, agentic, and professional work, with consistency for long-running tasks●PLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hour●SANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP servers (Cloudflare/Daytona/Modal/Vercel)●MODEL — Fable 5 (1M-token context, always-on adaptive thinking) was suspended on June 12 under a US export-control directive●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task●MCP — Enterprise-managed MCP connectors (Okta) enable zero-touch access (Team/Enterprise beta)●MODEL — Claude Opus 4.8 improves coding, agentic, and professional work, with consistency for long-running tasks●PLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hour●SANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP servers (Cloudflare/Daytona/Modal/Vercel)●MODEL — Fable 5 (1M-token context, always-on adaptive thinking) was suspended on June 12 under a US export-control directive●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task●MCP — Enterprise-managed MCP connectors (Okta) enable zero-touch access (Team/Enterprise beta)
Putting a Ceiling on the pause_turn Loop: Running Long Server Tools Safely Unattended
A production design for continuing pause_turn safely in unattended runs, where long server tools like web_search and code execution are involved. Covers branching all four stop_reason values in one loop, capping continuations and wall-clock time, and accumulating usage across paused segments.
As an indie developer, I batch-generate articles for several sites overnight, and one morning a few of them were missing the fresh information they were supposed to have searched for. Nothing in the error log. When I opened a saved response, stop_reason was pause_turn — and my generation loop had happily stopped there. web_search hadn't finished in a single round trip; it had returned a "pause," and my loop read that pause as completion.
You don't see pause_turn often, because short prompts never produce it. But the moment you involve a long server tool — web_search, web_fetch, or code execution — it can show up. And it arrives as a normal, successful response, not an exception, so as long as you swallow it you'll never notice. In unattended runs, that's exactly where silent truncation hides.
pause_turn Is a Third State, Neither Error Nor Done
If you treat every stop_reason as "the reason the response ended," pause_turn will trip you up every time. The starting point is to split the values into "finished" and "still going."
stop_reason
State
What you must do next
end_turn
Done normally
Nothing. The output is final
max_tokens
Cut off mid-output
Decide: continue, or record as incomplete
tool_use
Continue (client side)
Append a tool_result and re-request
pause_turn
Continue (server side)
Append the response as-is and re-request
refusal
Safety refusal
Don't retry; handle it by design
tool_use and pause_turn look similar, but what you append differs. With tool_use you run the tool yourself and add a new user message containing the tool_result. With pause_turn, the partial output from the server-side tool is already inside the assistant response, so you append the response blocks as-is, with no extra input, and the turn keeps going. The basic branching itself is laid out in implementation patterns for not dropping stop_reason, so if you aren't even checking max_tokens yet, start there. This piece focuses on what comes after: how to design for pause_turn when you're running long tools unattended.
Reproduce It First — Which Tools Produce pause_turn
Before defending against it blindly, it helps to make your own code emit a pause_turn once. Enable a server-side tool and ask something that likely needs several searches.
import anthropicclient = anthropic.Anthropic(api_key="YOUR_API_KEY")resp = client.messages.create( model="claude-opus-4-8", max_tokens=4096, tools=[{"type": "web_search_20260318", "name": "web_search"}], messages=[{ "role": "user", "content": "Summarize three June 2026 Claude Developer Platform updates with sources", }],)print(resp.stop_reason) # may be pause_turnprint([b.type for b in resp.content])# e.g. ['text', 'server_tool_use', 'web_search_tool_result', 'text']
The thing to internalize: even on pause_turn, content already holds partial blocks — text, server_tool_use, web_search_tool_result. It is not an empty response. That's precisely why swallowing it leaves you with a half-finished body presented as final. My own first mistake was exactly this: mistaking the in-progress text for the finished product.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Branch pause_turn, tool_use, end_turn, and max_tokens in a single continuation loop so server tools never get silently truncated
✦Add a continuation cap, a wall-clock budget, and cross-segment usage accumulation so a runaway turn can't quietly rack up cost in an overnight batch
✦Take away paste-ready code for streaming event ordering and an observability log that stops you from misreading pause_turn as end_turn
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
In unattended runs, instead of special-casing pause_turn, it's sturdier to handle the three "continue" values (tool_use, pause_turn, max_tokens) consistently in one loop. The skeleton looks like this.
def run_turn(client, model, messages, tools, *, max_tokens=4096): while True: resp = client.messages.create( model=model, max_tokens=max_tokens, tools=tools, messages=messages, ) reason = resp.stop_reason if reason == "pause_turn": # Server-side tool mid-flight. Append the response and continue. messages.append({"role": "assistant", "content": resp.content}) continue if reason == "tool_use": # Client-side tool. Append results as a tool_result message. messages.append({"role": "assistant", "content": resp.content}) tool_results = execute_client_tools(resp.content) # your impl messages.append({"role": "user", "content": tool_results}) continue if reason == "max_tokens": # Continue, or record as incomplete. Use-case dependent. messages.append({"role": "assistant", "content": resp.content}) messages.append({"role": "user", "content": "Continue the previous output"}) continue # end_turn / refusal break the loop return resp
The key is that pause_turn and tool_use append in different shapes. pause_turn appends only the assistant response; tool_use adds a user-role tool_result on top of that. Get this backwards and one path double-counts tool results while the other calls the tool forever.
The Easiest Thing to Break Is How You Re-append
The most common failure when continuing a pause_turn is transforming content before appending it. If you try to be clever and "save only the text blocks," you break the server_tool_use / web_search_tool_result pairing, and the next request returns a 400. The rule is: during continuation, don't touch resp.content at all — append the whole thing. Do your formatting and extraction once you have the final response (end_turn).
Extended thinking adds one more wrinkle. thinking blocks carry a signature, and that too must be re-appended unmodified. Drop the signature or edit the contents and the reasoning continuity breaks, so the continuation is rejected. If you decide "while in pause_turn, only read and append — never edit," you prevent this whole class of bug at once. For the server tools themselves, I lean on the structure in handling live context with web_fetch.
Unattended Runs Need a Cap and a Time Budget
This is where code for interactive use and code for unattended batches diverge. while True: is fine when a human is watching, but in an overnight batch you won't notice when a continuation runs longer than expected. A server tool can search or execute many times within a single "turn," so for complex prompts pause_turn may repeat over and over. Without a cap, a turn that won't stop quietly piles up cost.
I always add a two-stage guard on continuation count and elapsed time.
The numbers depend on your workload. In my article generation, two or three searches resolve in at most five or six continuations, so max_continuations=12 sits there as a safety valve that catches "clearly something is wrong." I return status: aborted instead of raising, because in unattended runs I'd rather keep going when one item is incomplete. Not letting a single runaway turn take down the whole overnight batch is the priority.
How to Count Usage Across Segments
Continuing a pause_turn runs messages.create multiple times, so cost is the sum across segments. Read only the final response's usage and you miss every intermediate search and generation. In unattended runs, getting this wrong means your estimate and your bill slowly drift apart.
The reliable approach is to add up usage on each iteration of the loop.
def accumulate_usage(total, usage): total["input_tokens"] += usage.input_tokens total["output_tokens"] += usage.output_tokens # Server tool invocations are billed separately, so record them too su = getattr(usage, "server_tool_use", None) if su and getattr(su, "web_search_requests", None): total["web_search_requests"] += su.web_search_requests return total
And whether or not you aborted, always write one line per turn capturing how it ended. Mistaking pause_turn for end_turn is the kind of bug you catch the next morning as long as you're observing it at all. My log carries, at minimum, the final stop_reason, the continuation count, summed input/output, and search count. Just being able to pick out articles where the continuation count spiked overnight lets me stop a quality slip early. For separating out windows where 529 overload creeps in, it helps to read this alongside resilience patterns for production apps under 529 overload.
In Streaming, pause_turn Arrives Last
When you stream with a "thinking" indicator in the UI, stop_reason arrives in the final message_delta of the stream — and pause_turn is no exception. If your display state treats "done = end_turn only," you get the failure where pause_turn is never continued and the screen looks frozen.
with client.messages.stream(model=model, max_tokens=4096, tools=tools, messages=messages) as stream: for event in stream: if event.type == "text": render(event.text) # stream to the UI final = stream.get_final_message()# Only show "finished" on end_turnif final.stop_reason == "pause_turn": messages.append({"role": "assistant", "content": final.content}) # Re-open the stream to continue (share the same caps as above)
Streaming or not, keep the continuation decision in one place. Write display and continuation separately and you'll inevitably create an asymmetry where only one of them drops pause_turn. For the finer event ordering, I keep details in diagnosing why a stream cuts off mid-response.
If You Add One Thing Today, Count pause_turn
Trying to land the continuation loop, the guards, and usage accumulation all at once is heavy. The first step I'd suggest is adding one line to your current code: log the final stop_reason. That alone makes it visible, in tomorrow's check, which turns in your unattended batch are quietly stopping on pause_turn. Once the reality is visible, you can add the loop and the caps in order — there's no rush. For what it's worth, I added the guards two days after I added the counting. Visibility first, then design: the long way around turns out to be the shortcut.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.