CLAUDE LABJP
MODEL — Claude Opus 4.8 lands, improving coding, agentic, and reasoning over 4.7 at the same priceCODE — Opus 4.8's Fast mode runs at 2.5x speed and is now three times cheaper than earlier modelsCODE — Auto-mode command classification expands, with denial tracking and live bash path autocompleteENTERPRISE — Connector permissions in custom roles let admins control which tools each role can useTEAM — Tag Claude directly in Slack and hand off tasks while you focus elsewhereMCP — MCP servers now show startup auth notices, making connection status easier to trackMODEL — Claude Opus 4.8 lands, improving coding, agentic, and reasoning over 4.7 at the same priceCODE — Opus 4.8's Fast mode runs at 2.5x speed and is now three times cheaper than earlier modelsCODE — Auto-mode command classification expands, with denial tracking and live bash path autocompleteENTERPRISE — Connector permissions in custom roles let admins control which tools each role can useTEAM — Tag Claude directly in Slack and hand off tasks while you focus elsewhereMCP — MCP servers now show startup auth notices, making connection status easier to track
Articles/API & SDK
API & SDK/2026-06-29Advanced

When Context Editing Made My Agent Re-run the Same Search — Field Notes on Clear Boundaries and Cache Invalidation

After turning on Context Editing to auto-clear tool results, the agent forgot what it had just read, re-ran the same tool, and the cache rebuilt every turn so costs went up. Field notes on instrumenting the silent regression and setting trigger, keep, and clear_at_least from measured data.

claude-api75context-editingcontext-management5agent12prompt-caching11cost-optimization24tool-use21

Premium Article

The "It Got Lighter but Slower" Feeling

I had been fighting context bloat in a long-running agent, so I added a single line of Context Editing — clear_tool_uses_20250919. The token graph dropped, just as expected. But responses didn't feel faster, and the end-of-month token bill had actually gone up.

Lining up the logs, I saw the cause: the agent kept forgetting content it had searched for and read moments earlier, then calling the same web_search again. Only the placeholder for the cleared result remained, so Claude concluded "something was here but it's gone" and went back to fetch the missing information. The tokens I had trimmed by clearing were being clawed right back by fresh tool results.

As an indie developer running content generation and monitoring across several sites unattended, this "the numbers improved but the real outcome got worse" state is the worst kind to debug. Nothing throws an error. Cost and quality just quietly erode. These notes are about dragging that silent regression into view with instrumentation, and tuning Context Editing into a setting that doesn't cost you money to enable.

Suspect First That the Clear Boundary Doesn't Match the Meaning Boundary

clear_tool_uses_20250919 clears the oldest tool results first. The trouble is that the line between "results Claude no longer needs" and "results still informing its next decision" can't be measured in tokens. If keep (the number of tool uses retained) is too small, results you still want to reference get wiped.

You can tell whether you're in a re-run loop just by plotting two series over time: the count of duplicate calls with the same tool and arguments, and the cache hit rate before and after each clear event. If duplicates rise and the cache is being rebuilt frequently, your clearing is too aggressive.

# Pull the actual clear and cache numbers out of a Context-Editing response
# Goal: observe "how much the clear saved" and "what the broken cache cost" on the same line
import anthropic, json, hashlib
 
client = anthropic.Anthropic(api_key="YOUR_API_KEY")
 
def call_with_context_editing(messages, tools):
    resp = client.beta.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=4096,
        messages=messages,
        tools=tools,
        betas=["context-management-2025-06-27"],
        context_management={
            "edits": [{
                "type": "clear_tool_uses_20250919",
                "trigger": {"type": "input_tokens", "value": 30000},
                "keep": {"type": "tool_uses", "value": 3},
                "clear_at_least": {"type": "input_tokens", "value": 5000},
            }]
        },
    )
    u = resp.usage
    applied = getattr(u, "context_management", None)
    print(json.dumps({
        "input_tokens": u.input_tokens,
        "cache_read": getattr(u, "cache_read_input_tokens", 0),
        "cache_write": getattr(u, "cache_creation_input_tokens", 0),
        "context_edits": str(applied),
    }))
    return resp

If cache_read_input_tokens stays near zero while cache_creation_input_tokens keeps climbing, the cache prefix is breaking on every clear. That is exactly what "lighter but more expensive" looks like.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A concrete way to detect the tool re-run loop by plotting two log series: duplicate tool-call counts and cache hit rate around each clear event
A minimal verification loop and code for setting trigger, keep, clear_at_least, and exclude_tools from measured token distributions instead of guesses
A decision rule that compares Prompt Caching prefix-invalidation cost against the tokens a clear saves, so you can reject settings that lose money before shipping
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-24
I Edited One Line of a Tool Description and the Whole Prompt Cache Rebuilt — Where to Place cache_control Breakpoints
Hit rate suddenly flatlined at zero because a volatile block sat upstream of stable ones. This walks through how prefix-cache cascade invalidation works, how to reorder blocks from stable to volatile, and where to spend your four cache_control breakpoints — with code and decision tables.
API & SDK2026-06-21
Don't Carry Search Results Twice: Trimming Consumed Blocks with response_inclusion
When an agent runs dynamic filtering, output tokens balloon because the raw search-result blocks a code execution call already consumed get echoed back into the response. Here is when response_inclusion: excluded is safe to use, when you must keep full, with implementation and a decision table.
API & SDK2026-06-24
My Morning Batch Was Missing the Prompt Cache Every Time — Warming Cadence and the Break-Even Math for the 1-Hour TTL
Jobs that run a few hours apart cold-miss the prompt cache even with a 1-hour TTL. Here is how to back out the right warming interval from the TTL, and how to write the break-even formula that decides whether warming pays off — with numbers from a four-site daily generation pipeline.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →