CLAUDE LABJP
CORPS — Anthropic unveils Claude Corps (Jun 11), a $150M national fellowship placing 1,000 early-career workers inside US nonprofits; the first cohort starts in OctoberSUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, up to 5 levels deep — multi-stage delegation workflows out of the boxWORKFLOWS — Dynamic workflows arrive in research preview across CLI, Desktop, and VS Code for codebase-wide bug hunts and large migrations (Max/Team/Enterprise)BILLING — 2 days to the Jun 15 change: Agent SDK, headless runs, and GitHub Actions move to monthly credits ($20/$100/$200); Sonnet 4 and Opus 4 retire from the API the same dayFABLE5 — Fable 5 remains included free on Pro, Max, Team, and Enterprise through Jun 22CODE80 — IPO coverage reports Claude now writes over 80% of its own code, up from under 10% in February 2025CORPS — Anthropic unveils Claude Corps (Jun 11), a $150M national fellowship placing 1,000 early-career workers inside US nonprofits; the first cohort starts in OctoberSUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, up to 5 levels deep — multi-stage delegation workflows out of the boxWORKFLOWS — Dynamic workflows arrive in research preview across CLI, Desktop, and VS Code for codebase-wide bug hunts and large migrations (Max/Team/Enterprise)BILLING — 2 days to the Jun 15 change: Agent SDK, headless runs, and GitHub Actions move to monthly credits ($20/$100/$200); Sonnet 4 and Opus 4 retire from the API the same dayFABLE5 — Fable 5 remains included free on Pro, Max, Team, and Enterprise through Jun 22CODE80 — IPO coverage reports Claude now writes over 80% of its own code, up from under 10% in February 2025
Articles/API & SDK
API & SDK/2026-06-13Intermediate

Retiring the 'Please Continue' Prompt — Single-Pass Long-Form Generation with Claude Fable 5's 128k Output

Every month my report generation hit the output cap and I had to ask Claude to 'continue from here.' Claude Fable 5's 128k output let me retire that workflow: a streaming implementation, a resume-after-disconnect pattern, and a measured cost comparison against chunked generation.

claude-fable-52128k-outputstreaming16long-form-generationcost-optimization19

Premium Article

At the start of every month, I generate operations reports for the four apps I run as an indie developer — crash trends, store review replies, AdMob revenue movement. Stitched into one document, a month's report runs close to 40,000 Japanese characters.

At that length, the recurring headache was the output cap. Generation would stop mid-document, I would send a "please continue from here" follow-up, then read across the seam and tidy it up. A heading level would drift, a caveat paragraph would appear twice, the tone would shift slightly in the back half. That stitching work cost me about twenty minutes every month.

Claude Fable 5, released on June 9, supports up to 128,000 output tokens. It is also included at no extra cost on the major plans until June 22, so this seemed like the right week to find out whether my report generation could become a single pass. The short version: the seam-fixing work went to zero, and the measured cost came in about 26% lower than the chunked approach. The longer version involves a streaming-first implementation and a few traps worth knowing about, which is what this article covers.

What Chunked Generation Was Actually Costing Me

Let me be honest about the old setup first. I used to set max_tokens to 16,384 and generate the report in four passes. From the second request onward, I would include everything generated so far and ask Claude to continue.

Three problems kept surfacing:

  1. Seam quality was unreliable. Asking for a continuation sometimes produced a preamble — "Understood, continuing the report" — or a partial restatement of the previous paragraph before the actual continuation. Not every time, but across four stitches per month, something almost always leaked in.
  2. Structure drifted. In later requests, Claude only sees the document as pasted input rather than as something it is currently writing. Granularity that was ### in the first half would sometimes come back as ## in the second.
  3. Input tokens accumulated. Pass two re-sends pass one's output; pass three re-sends passes one and two. The more chunks, the more times the same text gets billed as input.

None of this was fatal. But spending twenty minutes a month inspecting seams in an automated pipeline felt like the automation had quietly stopped paying for itself.

Fable 5's Output Spec — 128k Tokens and Always-On Adaptive Thinking

Claude Fable 5 is the generally available model in the new Mythos class positioned above Opus: a 1M-token context window and up to 128,000 output tokens. The API model string is claude-fable-5, priced at $10 per million input tokens and $50 per million output tokens. If you remember the 1M context window as a Sonnet beta that was later retired — I wrote about that in Migrating from the Claude 1M Context Window Beta: Everything You Need to Do Before April 30, 2026 — Fable 5 brings it back as a standard capability.

Two spec details matter before you write any code.

Adaptive thinking shows up in output_tokens

Fable 5 thinks adaptively on every request, scaling its reasoning to task difficulty. In my logs, usage.output_tokens consistently came in 10–20% higher than what I estimated from the saved document alone; I read that gap as the thinking overhead landing on the output side. The practical consequence: estimate costs from measured usage, not from the visible text length.

Long outputs assume streaming

A 128k-class output takes several minutes to generate. If you architect around a single non-streaming request, you will collide with HTTP timeouts, and the Anthropic SDK itself nudges long-running requests toward streaming. Designing for streaming from the start is the realistic path.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Replace chunked 'please continue' long-form generation with one streamed 128k-token pass, using working Python you can adapt today
Recover from mid-stream disconnects with partial saves and an assistant-prefill resume pattern that avoids re-introductions
Estimate your post-June-23 usage-credit burn from real token counts — the single pass measured about 26% cheaper than chunking
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-05-02
Cancelling Claude API Streams the Right Way: AbortController, Token Billing, and Connection Hygiene
How to cancel Claude API streams with AbortController, what gets billed when you stop mid-stream, and the production gotchas — Node.js + Python.
API & SDK2026-05-29
Splitting Claude API prompt cache into 5m and 1h tiers — separate TTLs cut cost and stabilize ops
Anthropic's cache_control supports two TTLs: 5 minutes and 1 hour. Splitting them into a two-tier layout — 1h for static system/tools, 5m for variable few-shot — meaningfully changed both my costs and my on-call life. Here's the design with the numbers I observed.
API & SDK2026-05-28
Why JSON.parse Fails on Claude API Streaming tool_use Arguments — and How to Fix It
When you stream a Claude API response with tool_use, calling JSON.parse on each input_json_delta throws SyntaxError. Here is the correct way to assemble partial_json fragments, plus disconnect handling.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →