CLAUDE LABJP
WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skillsWWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Articles/API & SDK
API & SDK/2026-05-30Advanced

Continuing past max_tokens in the Claude API without duplicated text or broken code fences

Detect stop_reason: max_tokens, continue the generation with an assistant prefill, and stitch the parts back together without duplicated seams or broken code fences. A production-tested continuation pattern in TypeScript.

Claude API99stop_reason2long-form generation2TypeScript34production110

Premium Article

While batch-generating release notes in several languages for my six wallpaper apps, I noticed the output ended mid-sentence — yet the script happily moved on to the next step as if nothing was wrong. The cause was almost embarrassingly simple: I was concatenating content[0].text without ever checking stop_reason. I have been shipping apps solo since 2014, and tuning store pages for AdMob revenue is something I have done hundreds of times, but a generation that "cuts off in the middle" behaves differently from an ordinary bug. No error, no exception — it just breaks quietly. This article is about how to detect that silent failure and stitch the continuation back together safely, including the traps I actually hit in production.

What is happening the moment a long generation gets cut off

Every Claude API response carries a stop_reason field. When generation finishes naturally it is end_turn; when it pauses for a tool call it is tool_use; and when the model hits the max_tokens ceiling it is max_tokens. The moment you ask for something long, it is common for the model to stop mid-thought because it ran into that ceiling while it still had more to write.

The trap is that the content you get back still contains "everything written so far." The response comes back as a clean 200 with real text in it, so unless you inspect stop_reason there is no way to tell a finished answer from a truncated one. That is exactly what bit me in release-note generation: English and Japanese fit fine, but the third language hit max_tokens partway through a heading, and the half-written string flowed straight into my published store copy.

You might think raising max_tokens fixes it. That is only half true. Each model has an output ceiling, and even at the top some topics still need more. Worse, blindly maxing out max_tokens reserves a long generation budget even for short answers, which hurts latency. The practical answer is a continuation design: generate with a sensible max_tokens, and when you stop on max_tokens, write the rest and stitch it together.

Three ways it breaks when you ignore stop_reason

A sloppy continuation makes the output worse, not better. The three failure modes I actually ran into were these.

The first is leaving truncation in place. If you treat a single response as complete without checking stop_reason, text that ended mid-sentence or mid-code flows downstream. A mechanical pipeline never notices an unnaturally clipped ending.

The second is duplicated seams. When you ask the model to "continue from before," it tends to be polite and re-summarizes the previous paragraph or rewrites the same heading. Concatenate that naively and the same sentence appears twice.

The third is broken structure. If max_tokens lands inside a code block, the opening code fence never closes. When the continuation starts with a fresh fence, the Markdown parser miscounts the nesting and swallows the entire body into a code block. I only discovered this after seeing a config example in my release notes vanish into one giant grey box.

These are not independent problems — a single continuation design prevents all three in a chain. Let me build it up step by step.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A minimal continuation loop (about 40 lines of TypeScript) that detects stop_reason: max_tokens and keeps going
An overlap-detection trimming function that removes duplicated seams using a 200-character window
A fence-balance check that stops an unbalanced code fence from collapsing your whole document into a code block
Guard rails — a round cap and an estimated-USD budget gate — to stop runaway loops and cost
Secure payment via Stripe · Cancel anytime
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-04-14
Claude API Multi-Tenant SaaS Architecture Guide — User Isolation, Cost Attribution, and Rate Limiting in Production
A complete guide to building multi-tenant SaaS on Claude API. Covers tenant identification, per-tenant cost attribution, rate limiting, and data isolation — with production-ready TypeScript code throughout.
API & SDK2026-03-19
Building a RAG System with Claude API: Vector Search + 1M Token Context + Prompt Caching
Learn how to build a production-grade RAG (Retrieval-Augmented Generation) system using Claude API. Combines vector search, the 1M token context window, and prompt caching for optimal performance — with fully working TypeScript code.
API & SDK2026-06-02
Beyond Tools in MCP: Designing with Resources, Prompts, and Sampling
Cramming everything into MCP tools hits a wall fast. Here is how resources, prompts, and sampling untangle a server, told through a real wallpaper-app asset manager I cut from 14 tools down to 5.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →