⬡ API & SDK/2026-06-22Advanced

Putting a Ceiling on the pause_turn Loop: Running Long Server Tools Safely Unattended

A production design for continuing pause_turn safely in unattended runs, where long server tools like web_search and code execution are involved. Covers branching all four stop_reason values in one loop, capping continuations and wall-clock time, and accumulating usage across paused segments.

Claude⁴⁰ API²⁴ pause_turn tool-use¹⁹ production¹⁰⁰

✦ Premium Article

As an indie developer, I batch-generate articles for several sites overnight, and one morning a few of them were missing the fresh information they were supposed to have searched for. Nothing in the error log. When I opened a saved response, stop_reason was pause_turn — and my generation loop had happily stopped there. web_search hadn't finished in a single round trip; it had returned a "pause," and my loop read that pause as completion.

You don't see pause_turn often, because short prompts never produce it. But the moment you involve a long server tool — web_search, web_fetch, or code execution — it can show up. And it arrives as a normal, successful response, not an exception, so as long as you swallow it you'll never notice. In unattended runs, that's exactly where silent truncation hides.

pause_turn Is a Third State, Neither Error Nor Done

If you treat every stop_reason as "the reason the response ended," pause_turn will trip you up every time. The starting point is to split the values into "finished" and "still going."

stop_reason	State	What you must do next
end_turn	Done normally	Nothing. The output is final
max_tokens	Cut off mid-output	Decide: continue, or record as incomplete
tool_use	Continue (client side)	Append a tool_result and re-request
pause_turn	Continue (server side)	Append the response as-is and re-request
refusal	Safety refusal	Don't retry; handle it by design

tool_use and pause_turn look similar, but what you append differs. With tool_use you run the tool yourself and add a new user message containing the tool_result. With pause_turn, the partial output from the server-side tool is already inside the assistant response, so you append the response blocks as-is, with no extra input, and the turn keeps going. The basic branching itself is laid out in implementation patterns for not dropping stop_reason, so if you aren't even checking max_tokens yet, start there. This piece focuses on what comes after: how to design for pause_turn when you're running long tools unattended.

Reproduce It First — Which Tools Produce pause_turn

Before defending against it blindly, it helps to make your own code emit a pause_turn once. Enable a server-side tool and ask something that likely needs several searches.

import anthropic
 
client = anthropic.Anthropic(api_key="YOUR_API_KEY")
 
resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    tools=[{"type": "web_search_20260318", "name": "web_search"}],
    messages=[{
        "role": "user",
        "content": "Summarize three June 2026 Claude Developer Platform updates with sources",
    }],
)
print(resp.stop_reason)        # may be pause_turn
print([b.type for b in resp.content])
# e.g. ['text', 'server_tool_use', 'web_search_tool_result', 'text']

The thing to internalize: even on pause_turn, content already holds partial blocks — text, server_tool_use, web_search_tool_result. It is not an empty response. That's precisely why swallowing it leaves you with a half-finished body presented as final. My own first mistake was exactly this: mistaking the in-progress text for the finished product.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Branch pause_turn, tool_use, end_turn, and max_tokens in a single continuation loop so server tools never get silently truncated

✦Add a continuation cap, a wall-clock budget, and cross-segment usage accumulation so a runaway turn can't quietly rack up cost in an overnight batch

✦Take away paste-ready code for streaming event ordering and an observability log that stops you from misreading pause_turn as end_turn

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Branch All Four stop_reason Values in One Loop

In unattended runs, instead of special-casing pause_turn, it's sturdier to handle the three "continue" values (tool_use, pause_turn, max_tokens) consistently in one loop. The skeleton looks like this.

def run_turn(client, model, messages, tools, *, max_tokens=4096):
    while True:
        resp = client.messages.create(
            model=model, max_tokens=max_tokens, tools=tools, messages=messages,
        )
        reason = resp.stop_reason
 
        if reason == "pause_turn":
            # Server-side tool mid-flight. Append the response and continue.
            messages.append({"role": "assistant", "content": resp.content})
            continue
 
        if reason == "tool_use":
            # Client-side tool. Append results as a tool_result message.
            messages.append({"role": "assistant", "content": resp.content})
            tool_results = execute_client_tools(resp.content)  # your impl
            messages.append({"role": "user", "content": tool_results})
            continue
 
        if reason == "max_tokens":
            # Continue, or record as incomplete. Use-case dependent.
            messages.append({"role": "assistant", "content": resp.content})
            messages.append({"role": "user", "content": "Continue the previous output"})
            continue
 
        # end_turn / refusal break the loop
        return resp

The key is that pause_turn and tool_use append in different shapes. pause_turn appends only the assistant response; tool_use adds a user-role tool_result on top of that. Get this backwards and one path double-counts tool results while the other calls the tool forever.

The Easiest Thing to Break Is How You Re-append

The most common failure when continuing a pause_turn is transforming content before appending it. If you try to be clever and "save only the text blocks," you break the server_tool_use / web_search_tool_result pairing, and the next request returns a 400. The rule is: during continuation, don't touch resp.content at all — append the whole thing. Do your formatting and extraction once you have the final response (end_turn).

Extended thinking adds one more wrinkle. thinking blocks carry a signature, and that too must be re-appended unmodified. Drop the signature or edit the contents and the reasoning continuity breaks, so the continuation is rejected. If you decide "while in pause_turn, only read and append — never edit," you prevent this whole class of bug at once. For the server tools themselves, I lean on the structure in handling live context with web_fetch.

Unattended Runs Need a Cap and a Time Budget

This is where code for interactive use and code for unattended batches diverge. while True: is fine when a human is watching, but in an overnight batch you won't notice when a continuation runs longer than expected. A server tool can search or execute many times within a single "turn," so for complex prompts pause_turn may repeat over and over. Without a cap, a turn that won't stop quietly piles up cost.

I always add a two-stage guard on continuation count and elapsed time.

import time
 
def run_turn_guarded(client, model, messages, tools, *,
                     max_tokens=4096, max_continuations=12, wall_budget_s=180):
    start = time.monotonic()
    continuations = 0
 
    while True:
        resp = client.messages.create(
            model=model, max_tokens=max_tokens, tools=tools, messages=messages,
        )
        reason = resp.stop_reason
 
        if reason in ("pause_turn", "tool_use", "max_tokens"):
            continuations += 1
            elapsed = time.monotonic() - start
            if continuations > max_continuations or elapsed > wall_budget_s:
                # Likely runaway. Abort and surface as "incomplete".
                return {"status": "aborted", "reason": reason,
                        "continuations": continuations, "elapsed": elapsed,
                        "partial": resp}
 
            messages.append({"role": "assistant", "content": resp.content})
            if reason == "tool_use":
                messages.append({"role": "user",
                                 "content": execute_client_tools(resp.content)})
            elif reason == "max_tokens":
                messages.append({"role": "user", "content": "Continue the previous output"})
            continue
 
        return {"status": "ok", "reason": reason, "result": resp}

The numbers depend on your workload. In my article generation, two or three searches resolve in at most five or six continuations, so max_continuations=12 sits there as a safety valve that catches "clearly something is wrong." I return status: aborted instead of raising, because in unattended runs I'd rather keep going when one item is incomplete. Not letting a single runaway turn take down the whole overnight batch is the priority.

How to Count Usage Across Segments

Continuing a pause_turn runs messages.create multiple times, so cost is the sum across segments. Read only the final response's usage and you miss every intermediate search and generation. In unattended runs, getting this wrong means your estimate and your bill slowly drift apart.

The reliable approach is to add up usage on each iteration of the loop.

def accumulate_usage(total, usage):
    total["input_tokens"] += usage.input_tokens
    total["output_tokens"] += usage.output_tokens
    # Server tool invocations are billed separately, so record them too
    su = getattr(usage, "server_tool_use", None)
    if su and getattr(su, "web_search_requests", None):
        total["web_search_requests"] += su.web_search_requests
    return total

And whether or not you aborted, always write one line per turn capturing how it ended. Mistaking pause_turn for end_turn is the kind of bug you catch the next morning as long as you're observing it at all. My log carries, at minimum, the final stop_reason, the continuation count, summed input/output, and search count. Just being able to pick out articles where the continuation count spiked overnight lets me stop a quality slip early. For separating out windows where 529 overload creeps in, it helps to read this alongside resilience patterns for production apps under 529 overload.

In Streaming, pause_turn Arrives Last

When you stream with a "thinking" indicator in the UI, stop_reason arrives in the final message_delta of the stream — and pause_turn is no exception. If your display state treats "done = end_turn only," you get the failure where pause_turn is never continued and the screen looks frozen.

with client.messages.stream(model=model, max_tokens=4096,
                            tools=tools, messages=messages) as stream:
    for event in stream:
        if event.type == "text":
            render(event.text)        # stream to the UI
    final = stream.get_final_message()
 
# Only show "finished" on end_turn
if final.stop_reason == "pause_turn":
    messages.append({"role": "assistant", "content": final.content})
    # Re-open the stream to continue (share the same caps as above)

Streaming or not, keep the continuation decision in one place. Write display and continuation separately and you'll inevitably create an asymmetry where only one of them drops pause_turn. For the finer event ordering, I keep details in diagnosing why a stream cuts off mid-response.

If You Add One Thing Today, Count pause_turn

Trying to land the continuation loop, the guards, and usage accumulation all at once is heavy. The first step I'd suggest is adding one line to your current code: log the final stop_reason. That alone makes it visible, in tomorrow's check, which turns in your unattended batch are quietly stopping on pause_turn. Once the reality is visible, you can add the loop and the caps in order — there's no rush. For what it's worth, I added the guards two days after I added the counting. Visibility first, then design: the long way around turns out to be the shortcut.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.