CLAUDE LABJP
CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handlingCODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stableADMIN — New org model restrictions let administrators control which models are availableMCP — Structured output, remote MCP, and session resume all get more reliableMODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K outputLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per taskCODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handlingCODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stableADMIN — New org model restrictions let administrators control which models are availableMCP — Structured output, remote MCP, and session resume all get more reliableMODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K outputLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
Articles/API & SDK
API & SDK/2026-06-27Advanced

When Claude API Streaming Stops Without an Error: Detecting Silent Stalls and Resuming Mid-Stream

How to catch the 'silent stall' where Claude API streaming stops with no exception at all, using a content-level watchdog that times the gap between tokens, plus a resume path that carries received text forward as an assistant prefill, and a four-layer timeout budget for long-running automation.

streaming19api38python22production105reliability10sse2

Premium Article

The hard streaming failures are not the disconnects that raise an exception. They are the stops that say nothing at all: deltas simply stop arriving. Your try/except catches nothing. There is no stack trace in the logs. The process is alive, the socket is open — and yet the response ends partway through. I missed this "silent stall" for a while.

While running unattended article generation across several of my sites as an indie developer, a handful of outputs were always saved with the last few paragraphs missing. No alert ever fired. What made the cause hard to find was that the code looked completely normal: I was iterating stream.text_stream to the end with a for loop, but the loop was exiting early. This piece is the setup I settled on for detecting that silent stall and continuing from where it stopped. I'll take the usual advice about longer timeouts and retries as given, and focus on what was needed beyond it.

Why ReadTimeout doesn't fire on a silent stall

Most write-ups stop at "increase the read timeout." But on a silent stall, that timeout often never fires — because the SDK's read timeout measures the interval between bytes arriving on the socket, not the interval between meaningful content.

A Server-Sent Events path frequently has a reverse proxy or load balancer emitting comment lines (harmless lines like : ping) at fixed intervals, and Anthropic's own ping events may keep arriving too. The result: not a single content delta has come, yet something keeps landing on the socket. The socket looks alive, so the read timeout keeps resetting and never fires. The connection is healthy; the content is dead. That is what a silent stall actually is.

So the granularity you watch is wrong by default. What you need to guard is not "are bytes arriving" but "is meaningful content (a text_delta) making progress." That is what the watchdog below measures.

An inter-token watchdog

The idea is simple: record the timestamp of the last text_delta, and if the gap exceeds a threshold, abort it yourself. Think of it as a content-layer guard added alongside the SDK's socket-layer timeout, not a replacement for it.

import time
import threading
import anthropic
 
class StreamStalled(Exception):
    """No meaningful delta arrived within the window (silent stall)."""
 
def stream_with_watchdog(
    client: anthropic.Anthropic,
    messages: list,
    model: str = "claude-sonnet-4-6",
    max_tokens: int = 8192,
    stall_seconds: float = 25.0,   # tolerated gap between text deltas
):
    """
    Watch the arrival gap between text deltas; raise StreamStalled past stall_seconds.
    Buffer received text so the caller can carry it forward on a stall.
    """
    buf: list[str] = []
    last_delta = {"t": time.monotonic()}
    stop = threading.Event()
    closer = {"fn": lambda: None}
 
    def watchdog():
        while not stop.wait(1.0):
            if time.monotonic() - last_delta["t"] > stall_seconds:
                stop.set()
                closer["fn"]()   # close the underlying connection to break the for-loop
                return
 
    wd = threading.Thread(target=watchdog, daemon=True)
 
    with client.messages.stream(
        model=model, max_tokens=max_tokens, messages=messages,
    ) as stream:
        closer["fn"] = stream.close   # let the watchdog close the connection
        wd.start()
        try:
            for text in stream.text_stream:
                last_delta["t"] = time.monotonic()
                buf.append(text)
                yield text
        finally:
            stop.set()
 
    received = "".join(buf)
    if stop.is_set() and time.monotonic() - last_delta["t"] > stall_seconds:
        raise StreamStalled(received)

Two implementation points matter. First, the watchdog must be able to call stream.close(). The for loop is blocking, so unless you close the connection from the outside, it will sit there waiting on a silence forever. Second, always keep received up to the moment of the stop — without it, "continue from where it stopped" is impossible.

Tune stall_seconds to your path. In my environment the first token can take around 20 seconds (especially with models that think longer), so I keep a separate, longer grace period for the first token (the first_token_seconds below) and cap the gap at 25 seconds only once text has started flowing. If the body has been moving and then goes silent for 25 seconds, I treat that as a path-side stop and abort, because recovering is faster than waiting.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A watchdog that measures the gap between text deltas — not socket reads — so it catches the silent stalls that ReadTimeout never fires on
A resume path that feeds received text back as an assistant prefill so generation continues from where it stopped instead of restarting, plus how to trim the overlap
A four-layer timeout budget (connect / first-token / inter-token / total) and how to set each threshold from measured p95–p99 rather than a guess
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-03-31
Claude API Streaming × Real-Time Chat UI: Production Implementation Guide
A practical guide to running Claude API streaming with Server-Sent Events in Next.js App Router at production grade, with measured latency, recovery patterns, and Cloudflare Workers edge-relay details from real indie operation
API & SDK2026-05-06
Claude API × Python in Practice: Building an AI Assistant with Tool Calling and Streaming
A practical guide to combining Claude API's Tool Use and Streaming in Python. Build a working AI assistant with real tool execution, complete source code included, plus a breakdown of the tricky parts that trip up most developers.
API & SDK2026-04-26
Building a Scalable Real-Time AI Chat Server with Claude API × WebSocket × Redis Pub/Sub — Node.js Production Architecture, Multi-User Management, and Cost Control
Production implementation of a real-time AI chat server using Claude API, WebSocket, and Redis Pub/Sub. Covers SSE vs WebSocket trade-offs, scalable Node.js connection management, JWT auth, and per-user cost control.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →