CLAUDE LABJP
BILLING — Day two after the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents now bill against separate monthly credits ($20/$100/$200) at full API rates with no rollover, making first-day cost measurements the basis for any reworkREGULATED — TCS partnered with Anthropic to bring Claude to banks, airlines, and other regulated industries, while DXC integrates Claude into the core systems those sectors rely onRETIRED — Sonnet 4 and Opus 4 left the API on Jun 15; confirm via your logs that scripts referencing them have moved to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 remain suspended under a US export-control directive (since Jun 12); Anthropic says it is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can spawn their own sub-agents up to five levels deep, widening the design space for multi-stage delegationBILLING — Day two after the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents now bill against separate monthly credits ($20/$100/$200) at full API rates with no rollover, making first-day cost measurements the basis for any reworkREGULATED — TCS partnered with Anthropic to bring Claude to banks, airlines, and other regulated industries, while DXC integrates Claude into the core systems those sectors rely onRETIRED — Sonnet 4 and Opus 4 left the API on Jun 15; confirm via your logs that scripts referencing them have moved to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 remain suspended under a US export-control directive (since Jun 12); Anthropic says it is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can spawn their own sub-agents up to five levels deep, widening the design space for multi-stage delegation
Articles/API & SDK
API & SDK/2026-06-16Advanced

Taming Token Bloat in Long-Running Agents with Context Editing and the Memory Tool

For long-running agents whose input tokens balloon as tool results pile up, here is how to pair context editing with the memory tool and measure the savings with count_tokens, including a working backend implementation.

Claude API73context editingmemory toolagents7token optimization

Premium Article

As an indie developer, I once let a research agent in my automated publishing pipeline run uninterrupted, and somewhere around the twentieth tool call the input tokens crossed 70,000 and the per-turn cost stopped being something I could ignore. When I looked inside, most of the weight came from finished web_search results sitting untouched in the conversation history. The longer an agent runs, the more those stale tool results crowd the context. That is not unique to my setup; it is a common tax on any tool-heavy workflow.

The Claude API gives you two complementary levers for this. Context editing removes old tool results on the server side, and the memory tool offloads what you want to keep into files. Because their jobs are opposite, the practical move is to use them together rather than picking one. This article builds both up from a minimal setup to a production-ready one, and ends by measuring the savings with count_tokens before any real spend, with working backend code along the way.

Context editing "removes," the memory tool "retains" — opposite jobs

The first thing worth untangling is that these two features solve genuinely different problems.

Context editing (clear_tool_uses_20250919) automatically deletes old tool results on the server side once the history crosses a threshold. Each cleared block is replaced with a placeholder, so Claude knows a result used to be there and is now gone. In other words, search results or file reads you will not revisit get dropped without you having to trim them by hand.

The memory tool (memory_20250818), by contrast, is a client-side tool that Claude uses to write and read files itself. It saves what it learns into a /memories directory and can read it back on demand, even after context editing has cleared the history or a new session has started.

That contrast is the whole point. With context editing alone, an important finding buried in a cleared tool result disappears along with it. So you write just the "I will need this later" facts into memory, and you get both: a light history and persistent knowledge. Both are enabled with the beta header context-management-2025-06-27.

Enabling clear_tool_uses in its simplest form

Start with context editing on its own, in the plainest shape. You pass a single edit in context_management.

import anthropic
 
client = anthropic.Anthropic()
 
response = client.beta.messages.create(
    model="claude-opus-4-8",
    max_tokens=4096,
    messages=[{"role": "user", "content": "Research recent work on AI agents and summarize it"}],
    tools=[{"type": "web_search_20250305", "name": "web_search"}],
    betas=["context-management-2025-06-27"],
    context_management={"edits": [{"type": "clear_tool_uses_20250919"}]},
)

Whether clearing actually ran is reported in context_management.applied_edits. If you skip checking this and just assume "it must be working," you can easily miss that it never fired at all because the threshold was never reached.

applied = getattr(response, "context_management", None)
if applied and applied.applied_edits:
    for edit in applied.applied_edits:
        print(f"cleared tool uses: {edit.cleared_tool_uses} / "
              f"tokens saved: {edit.cleared_input_tokens}")
else:
    print("nothing cleared this turn (below threshold)")

The minimal setup omits trigger, so it runs at the default threshold. On short conversations it never fires, and that is correct behavior — not firing is not a bug. Watch the counts in your logs, then move on to tuning.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to actually choose trigger / keep / clear_at_least / exclude_tools for clear_tool_uses_20250919
A working memory-tool backend with path-traversal protection built in
Measuring the exact token savings with count_tokens before you ship
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-03-19
Claude API Advanced Tool Use: to Tool Search, Programmatic Tool Calling & Tool Use Examples
Master Claude API's advanced tool use features (Tool Search, Programmatic Tool Calling, Tool Use Examples) now GA. Build production-grade agents with 85% token reduction, 37% latency improvement, and 90% parameter accuracy.
API & SDK2026-06-16
Trusting Claude's Structured Output in Production — Validation Gates and Repair Loops
When Claude's structured output breaks 'occasionally' in production, combine tool-use enforcement, a schema validation gate, a single repair loop, and a graceful degradation fallback to eliminate broken JSON from your operations — with working TypeScript code.
API & SDK2026-06-16
Confirm Your Model Actually Responds Before a Scheduled Run Begins
A model you configured can be gone before your nightly job even wakes up. Tell retirement, withdrawal, and regional restriction apart with a single startup probe, then rewrite the run config to an eligible model — with complete, working TypeScript.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →