CLAUDE LABJP
BILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runsBILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runs
Articles/Claude.ai
Claude.ai/2026-06-15Advanced

When a Long-Running Agent's Context Quietly Decays — Budgeting and Compaction

An agent that runs all night gets sloppier by morning. The cause is dilution from accumulated context. Here is how to treat context as a budget, measure its decay, and keep it healthy with compaction — with working code and field notes.

context-engineeringagents6context-window6prompt-caching6production92

Premium Article

By Morning, the Judgment Had Gotten Sloppy

The first odd thing I noticed about an agent I run overnight was that its commit messages suddenly turned curt by morning. At midnight the output traced the context carefully; at six in the morning, on the same kind of task, it returned terse replies that seemed to ignore the first half of the instructions. I hadn't changed the model, and I hadn't changed the prompt. The only thing that had changed was the volume of context piled up in the session.

This is not unusual. A prompt that works perfectly in a one-off conversation stops behaving as expected when you let it run for a long time. Most of the time the cause is not the model's capability but the design of the context. In agents that call tools dozens of times in particular, past tool results and conversation history quietly accumulate and dilute the instructions you actually want to land.

This article is about keeping a long-running agent's context healthy. We'll treat context as a budget to allocate, measure its decay, and fire compaction at a threshold — three things, with working code. It is context engineering in the broad sense, but the focus is narrow: the specific way context rots in agents that keep running.

Why Context Rots — Accumulation, Dilution, Position Effects

What happens in a long session breaks down roughly into three things.

The first is accumulation. Every time the agent calls a tool, the input and output stay in the history. A single file read might be a few thousand tokens, a search result ten thousand — over dozens of turns, the share occupied by your actual instructions keeps shrinking.

The second is dilution. The context window may be wide, but the model's effective attention — what it can strongly reference at once — is not infinite. When an important constraint is buried under a mass of intermediate logs, its relative weight drops. In my own observations, running several agents autonomously as an indie developer, responses that broke the constraints stated at the top visibly increased once history crossed roughly 150K tokens, even with the same system prompt.

The third is the position effect. In long contexts, information placed in the middle tends to be overlooked. This is the "lost in the middle" phenomenon, and an important fact tucked into the center of a long tool log carries far less weight than you would expect.

The conclusion is simple. Context is not something where "more is always better." It needs to be treated as a finite resource whose allocation you design.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How accumulated tool results and history dilute context, plus three signals to catch it numerically
An implementation pattern that allocates a context budget across four layers and fires compaction at a threshold (Python, working code)
Where to place prompt-cache breakpoints, and how to think about token budgets after the 2026-06-15 billing change
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Claude.ai2026-05-04
Claude Mythos in Production: A Deployment Playbook From Internals to Operations
Most Claude Mythos coverage stops at conceptual overviews. This playbook covers actual production deployment — System Card highlights that change implementation, gateway architecture, sandboxing, prompt-injection defense, monitoring, and scaling — with patterns from running Mythos-aware services.
Claude.ai2026-04-29
Make Claude Your Production Debugging Companion: A Practical Design for Log Triage, Hypothesis Generation, and Repro Scripts
A field-tested blueprint for solo developers who carry their own pager. We split production debugging into three jobs Claude can actually own — log summarization, hypothesis generation, and minimal repro — with full prompts, sanitization code, and traps that cost me real downtime.
Claude.ai2026-04-22
The Claude Design to Claude Code Handoff — A Production-Ready Playbook
A detailed field guide to chaining Claude Design and Claude Code into one workflow — prompt templates, cleanup commands, and the 12-point checklist I run before shipping.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →