●MODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yet●PRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31●CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context window●GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)●CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoff●ENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alerts●MODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yet●PRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31●CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context window●GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)●CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoff●ENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alerts
claude-opus-4-7 Fast Mode Retires on 7/24 — Guard It at the Capability-Pair Level
Opus 4.7 fast mode is being retired on 2026-07-24. The model ID stays valid while only the speed parameter starts failing, so model-ID audits miss it. Here's a capability-pair preflight with automatic migration.
The nastiest failures in an unattended pipeline are the ones where the model is alive but only some of your requests break. The Opus 4.7 fast mode retirement scheduled for 2026-07-24 surfaces exactly this way. The claude-opus-4-7 model ID stays valid afterward, but any request that passes speed: "fast" starts returning an error.
As an indie developer running daily auto-publishing across several sites, I've caught model-ID retirements with an audit script before, yet I nearly missed one that was really a change to a speed option. Most migration checklists only ask "is the model ID I use still alive?" — they don't ask the layer below that: "is that speed option still valid for that model?"
Here we treat model and speed as a single "capability pair" and build a preflight that verifies the pair with a real probe before the main work begins. It's a capability-pair guard that keeps an unattended pipeline running past 7/24.
What actually changes — the model stays, only the speed option retires
Let's pin down the facts. What retires is Opus 4.7's fast mode, not the model itself.
Element
Before 7/24
After 7/24
claude-opus-4-7 (no speed)
Available
Available (unchanged)
claude-opus-4-7 + speed: "fast"
Available
Error
claude-opus-4-8 + speed: "fast"
Available
Available (migration target)
So any code calling Opus 4.7 withoutspeed is untouched. Only calls that explicitly set speed: "fast" are affected. The batches most likely to break quietly are exactly the ones that lean on fast mode for low latency.
That is why this slips through. An audit that asks "am I using a retired model ID?" marks claude-opus-4-7 as valid and waves it through. The speed option sits outside the audit's field of view.
Why a model-ID audit can't see it
Most migration audits look like this:
# A typical model-ID audit — it misses this retirement entirelyRETIRED_MODEL_IDS = { "claude-3-opus-20240229", "claude-3-5-sonnet-20240620", # ... list of retired model IDs}def audit_model_id(model: str) -> None: """Check that the model ID in use isn't on the retired list.""" if model in RETIRED_MODEL_IDS: raise RuntimeError(f"Using a retired model: {model}") # claude-opus-4-7 is not retired, so this passes
This audit sees the world only in units of "model ID." claude-opus-4-7 is alive, so it passes. But what actually fails is the combination (claude-opus-4-7, fast). The unit of analysis is off by one level.
The fix is to make the check one notch finer — from "model ID" to "model × speed." Everything below builds on that capability pair.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦You'll be able to catch the 'why does only this one job fail?' state before the deadline, even while your model-ID audit reports everything green
✦You'll implement a preflight that treats model-plus-speed as a single capability pair, so an unattended pipeline doesn't stall at the 7/24 boundary
✦You'll switch to a setup that auto-migrates to Opus 4.8 fast mode on failure and measures the cost and latency delta afterward
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
First, collect the allowed combinations and the deadline-bearing ones in a single place. The key is to store the deadline as a fact with an expiry date. Scatter dates through your code and you'll be grepping in a panic on the morning of 7/24.
from dataclasses import dataclassfrom datetime import datefrom typing import Optional@dataclass(frozen=True)class Capability: """A model + speed combination (a capability pair) and its retirement date.""" model: str speed: Optional[str] # None means no speed option retires_on: Optional[date] # None means no scheduled retirement successor: Optional["Capability"] = None # migration target# Registry of capability pairs. Facts (dates, targets) live in one place.OPUS_48_FAST = Capability("claude-opus-4-8", "fast", None)CAPABILITIES = { ("claude-opus-4-7", "fast"): Capability( model="claude-opus-4-7", speed="fast", retires_on=date(2026, 7, 24), successor=OPUS_48_FAST, ), ("claude-opus-4-7", None): Capability("claude-opus-4-7", None, None), ("claude-opus-4-8", "fast"): OPUS_48_FAST,}def lookup(model: str, speed: Optional[str]) -> Optional[Capability]: return CAPABILITIES.get((model, speed))
With a registry, "the pair I use now — when does it retire, and to what?" is a single lookup. The date-comparison logic stays in one place, so when a similar retirement hits another model you can extend the same skeleton.
Warn by deadline before you ever call
With the registry in place, you can decide statically before any real call. Run this first thing, right after an unattended pipeline starts.
from datetime import datedef check_deadline(model: str, speed: Optional[str], today: Optional[date] = None, warn_within_days: int = 14) -> None: """Check a pair's retirement date. Past due raises; approaching warns.""" today = today or date.today() cap = lookup(model, speed) if cap is None: # Not in the registry = unknown combination. Stop explicitly when unattended. raise RuntimeError(f"Unregistered capability pair: model={model}, speed={speed}") if cap.retires_on is None: return # no scheduled retirement if today >= cap.retires_on: raise RuntimeError( f"{model} + speed={speed} retired on {cap.retires_on}. " f"Migrate to: {cap.successor.model} + speed={cap.successor.speed}" ) remaining = (cap.retires_on - today).days if remaining <= warn_within_days: print(f"⚠️ {model} + speed={speed} retires in {remaining} days " f"({cap.retires_on}). Target: {cap.successor.model}")# Example (assume it runs on 2026-07-20)check_deadline("claude-opus-4-7", "fast", today=date(2026, 7, 20))# → ⚠️ claude-opus-4-7 + speed=fast retires in 4 days (2026-07-24). Target: claude-opus-4-8
This prevents the "oh, it's already past 7/24" surprise. But a static deadline check alone isn't enough. A date is only an announcement; an early cutover or a same-day rollout drift is possible. So we confirm with a real probe.
Confirm with a real preflight probe
The surest signal is to send one tiny request using the exact same capability pair as the main work. If it succeeds, that combination is alive today, in this environment. Not trusting the announced date but judging by the actual response is the whole point of a preflight.
import anthropicclient = anthropic.Anthropic()def probe_capability(model: str, speed: Optional[str]) -> bool: """Send a minimal request with the same pair as the main work to test it.""" kwargs = { "model": model, "max_tokens": 1, # minimize the cost "messages": [{"role": "user", "content": "ok"}], } if speed is not None: kwargs["speed"] = speed try: client.messages.create(**kwargs) return True except anthropic.BadRequestError: # An invalid speed option lands here (4xx) return False except anthropic.APIStatusError as e: # Treat transient 5xx as "unknown" and let the caller retry raise RuntimeError(f"Probe inconclusive (possible transient failure): {e}") from e# A max_tokens=1 probe is orders of magnitude cheaper than one real run
Separating 4xx errors (BadRequestError) from transient 5xx failures is the crux. A retired speed option comes back as the former, so you can conclude "migrate." Mistake the latter for retirement and a mere network blip triggers an unnecessary migration. Keep the inconclusive cases inconclusive and defer to your outer retry layer.
Self-heal by migrating on detection
Combine the deadline check with the real probe, and when the pair has retired, switch to the registry's successor. In unattended operation, always log the switch so you can trace it later.
def resolve_capability(model: str, speed: Optional[str]) -> tuple[str, Optional[str]]: """Resolve the pair to use. Return the successor if it has retired.""" cap = lookup(model, speed) if cap is None: raise RuntimeError(f"Unregistered capability pair: {model}/{speed}") # Judge by the static deadline first (faster and free versus a probe) if cap.retires_on and date.today() >= cap.retires_on: if cap.successor is None: raise RuntimeError(f"{model}/{speed} retired but no successor defined") print(f"↪ Past deadline, migrating: {model}/{speed} → " f"{cap.successor.model}/{cap.successor.speed}") return cap.successor.model, cap.successor.speed # Even before the deadline, a failing probe means an early retirement if not probe_capability(cap.model, cap.speed): if cap.successor is None: raise RuntimeError(f"{model}/{speed} retired; no successor defined") print(f"↪ Probe failed, migrating: {model}/{speed} → " f"{cap.successor.model}/{cap.successor.speed}") return cap.successor.model, cap.successor.speed return cap.model, cap.speed# Resolve once at the pipeline entranceMODEL, SPEED = resolve_capability("claude-opus-4-7", "fast")# The rest of the work uses MODEL / SPEED
Built this way, the pipeline slides over to Opus 4.8 fast mode across 7/24 without a human in the loop. And if a same-day rollout drifts so the probe still passes just after the date, it keeps using the original pair. Judging by both the announcement and the measurement is what makes unattended operation calm.
When even the successor is unavailable, or the probe keeps returning inconclusive, how you should behave depends on the nature of the job.
Situation
Recommendation
Why
Billing / payment-adjacent work
Fail-closed (stop)
Stopping and alerting beats running on the wrong model
Re-runnable work like draft generation
Drop the speed option, continue
You can still produce output at standard speed without fast
Pre-deadline, transient probe failure
Defer to the outer retry
Don't mistake a network blip for retirement
For re-runnable draft-generation jobs, I use the soft failure: if fast retires, drop the speed option, continue at standard speed, and just leave a migration log. Latency goes up, but that hurts less than stalling the whole pipeline. The whole decision collapses to one question: is this job safer stopped than finished on the wrong footing?
The delta to check after migrating
After moving to Opus 4.8 fast mode, don't stop at "it works" — take one real measurement of cost and latency. Speed behavior and token pricing can shift across model generations, and that can nudge your budget-alert thresholds off.
import timedef measure(model: str, speed: Optional[str], prompt: str, n: int = 5) -> dict: """A light measurement of latency and output tokens for a before/after compare.""" latencies, out_tokens = [], [] for _ in range(n): kwargs = {"model": model, "max_tokens": 256, "messages": [{"role": "user", "content": prompt}]} if speed is not None: kwargs["speed"] = speed t0 = time.perf_counter() resp = client.messages.create(**kwargs) latencies.append(time.perf_counter() - t0) out_tokens.append(resp.usage.output_tokens) latencies.sort() return { "p50_latency_s": round(latencies[n // 2], 3), "avg_output_tokens": round(sum(out_tokens) / n, 1), }# Run the same prompt before and after to decide whether to retune thresholds
One measurement lets you catch second-order issues — "I switched to fast but it's not as fast as I expected," "the unit price changed and my monthly forecast is off" — before they creep along quietly. In unattended operation, not skipping the post-migration observation matters more than the migration itself.
Common pitfalls
Scattering speed across env vars and job definitions: if every job carries its own speed="fast", a retirement forces a cross-cutting fix. Concentrate capability resolution in one entrance function.
Probing with a different pair than the main work: probe at standard speed, judge it "alive," then break on fast in the real work. Always probe with the exact (model, speed) the main work uses.
Mistaking transient failures for retirement: fold 5xx or timeouts into "retired" and every network blip triggers a needless migration. Base retirement only on 4xx.
Hard-stopping the moment the deadline passes: rollout drift can leave the old pair working for a while after the date. Use the deadline to warn, and let the real probe make the final stop decision — you avoid both misses and over-stopping.
Skipping post-migration measurement: "it works" isn't "it's optimal." Measuring latency and unit price once after migrating prevents false budget alerts.
Your next move
Grep your running code for speed="fast". If even one hit uses claude-opus-4-7, start by dropping this article's resolve_capability into your pipeline entrance. Rather than waiting for the deadline, guard at the level of the capability pair — that's the surest way I know to reach the morning of 7/24 without incident.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.