CLAUDE LABJP
FABLE 5 — Claude Fable 5 is available again to users worldwide from July 1 after US export controls were liftedSCIENCE — Claude Science, a workbench for researchers, is in beta; the AI for Science credit program is open through July 15CODE — Claude Code adds dynamic workflows (research preview) and raises weekly usage limits by 50% through July 13MODEL — Claude Sonnet 5 is the default across all plans at $2/$10 per million tokens through August 31GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost control)SECURITY — A new cybersecurity classifier ships alongside the Fable 5 redeploymentFABLE 5 — Claude Fable 5 is available again to users worldwide from July 1 after US export controls were liftedSCIENCE — Claude Science, a workbench for researchers, is in beta; the AI for Science credit program is open through July 15CODE — Claude Code adds dynamic workflows (research preview) and raises weekly usage limits by 50% through July 13MODEL — Claude Sonnet 5 is the default across all plans at $2/$10 per million tokens through August 31GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost control)SECURITY — A new cybersecurity classifier ships alongside the Fable 5 redeployment
Articles/Claude Code
Claude Code/2026-07-05Advanced

When to Use Claude Code's Native 1M Context — and When Not To: A Cost-Based Rule

With Sonnet 5 as the default, Claude Code now handles a native 1M-token context. A big window is convenient, but every token you park in it is billed again each turn. Should you load the whole repo, or feed slices? Here is an estimable token model and a decision rule that gives a concrete answer per situation, with working code and the traps to avoid.

Claude Code180Sonnet 55context6cost optimization131M

Premium Article

I handed a large repository to Claude Code, felt reassured that it would "read all of it," and half an hour later opened the usage screen. My hand stopped.

That single session had consumed several times my usual tokens. The work itself finished correctly. But that job did not need that window size. Feed it only the slices it needed, and the same result would have cost far less.

On June 30, 2026, Claude Sonnet 5 became the default across all plans, and Claude Code gained a native one-million-token context. The old 1M was a beta limited to specific models. Now it is within reach by default. Which is exactly why the instinct to "load everything because the window is wide" quietly melts money.

This article turns "when to use a big window and when not to" into something you decide by arithmetic rather than by feel, with code you can run against your own price sheet and repo.

A big window changes cost, not speed

Let me clear up one misconception first. Widening the context does not make the model smarter, nor necessarily faster. What changes is what you can show it at once and what you pay every turn to do so.

As a conversation progresses, Claude Code repeatedly resends the prior exchange as input. Whatever sits in the window is billed as input tokens on every response. That is the crux. Keep a 10,000-token file resident across 20 round trips, and (outside of any cache) those 10,000 tokens can be billed roughly 20 times.

So the cost of a big window scales as "amount loaded × times you touch it." For a one-shot read it is noise; for long exploration or iterative refactoring, that multiplication is what bites.

The estimate: "everything resident" vs "sliced"

Let us put the decision in symbols.

SymbolMeaning
T_ctxTokens kept resident in the window (e.g. the whole repo)
T_qTokens per instruction / question
T_outTokens per response
NRound trips in the session
p_in / p_outInput / output unit price (per million tokens)
s_in / s_outLong-context surcharge multipliers (e.g. input ×2)
cPrompt cache hit rate (0 to 1)

When you keep everything resident, the input cost is dominated by resending T_ctx every turn. The cached fraction c is billed at the cheaper cache-read rate, so the effective input cost is roughly:

input_cost(resident) ≈ N × T_ctx × p_in × s_in × (1 - c + c × r_cache)

Here r_cache is the ratio of cache-read price to normal input price (around 0.1 in many setups, i.e. about one tenth). The formula makes it visible: the higher c is, the cheaper resident becomes.

When you slice and feed only what each turn needs, you do not resend T_ctx; you send only the file fragment T_slice(i):

input_cost(sliced) ≈ Σ_i ( T_slice(i) × p_in × s_in' )

s_in' can be the no-surcharge multiplier (1.0) if the total you load stays under the long-context tier. That is where slicing earns its keep. Pricing commonly changes the surcharge based on whether you cross the 200K-token tier, so slicing under the tier lowers the multiplier itself.

Written out it looks obvious, but the practically important point is a single one: resident cost spikes precisely when N is large, c is low, and T_ctx straddles the tier.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to estimate, with an explicit formula and Python, whether widening the window or slicing is cheaper — with the long-context surcharge left as a variable so you can plug in your own price sheet
A decision function that mechanically decides whether to use the 1M window from repo size, revisit count, and cache hit rate — with the reasoning behind each threshold
The typical ways a large window fails to help while quietly inflating cost, and how to avoid each
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Claude Code2026-07-04
A 1M Context Window Is the New Default — So I Built an Admission Policy Instead of Filling It
Sonnet 5 is now the Claude Code default and native 1M context is standard. The hard errors disappeared, but a quieter kind of degradation took their place. Here is how I made it visible with a probe, plus an admission policy and an effective-token-cost view — with working code and my own measurements.
Claude Code2026-07-02
Which Model Ran Last Night's Unattended Session? Building Model Attribution and Default-Drift Detection After the Sonnet 5 Switch
Claude Code's default model switched to Sonnet 5, and unpinned headless runs changed models silently. Here is a working design for extracting the actual model from run output, appending an atomic run record, and deciding per task lineage whether to pin or follow the default.
Claude Code2026-06-14
Measuring a Week of Headless Usage the Night Before the Billing Change
With headless Claude Code moving to monthly credits on June 15, I spent a week logging how many tokens my unattended runs actually consume, so I could pick a plan based on numbers instead of a guess.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →