●MODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yet●PRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31●CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context window●GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)●CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoff●ENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alerts●MODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yet●PRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31●CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context window●GATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)●CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoff●ENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alerts
Don't Let the Opus 4.7 Fast Mode Retirement (July 24) Kill Your Unattended Jobs
claude-opus-4-7 fast mode retires on 2026-07-24, and speed: fast starts throwing errors. Here's how to keep unattended pipelines from breaking silently: mechanically detect where fast mode is used, add a fail-closed runtime guard, and migrate to 4.8 with working code.
On a Saturday morning, going back through the auto-posting logs for the sites I run as an indie developer, I realized one scheduled job had been running for months with speed: "fast" hardcoded into it. It's code nobody touches. And that's exactly why it will fall over the moment the parameter's contract changes, with no one watching.
On 2026-07-24, fast mode for claude-opus-4-7 retires. After that date, any request that passes speed: "fast" to claude-opus-4-7 will error out. The model ID itself stays alive, so this won't show up on a model-retirement checklist. And yet one specific parameter combination quietly expires. The more your work runs unattended, the more vulnerable it is to this kind of change: the model survives, but the setting dies.
Below, I'll build the steps to cross July 24 quietly, with code you can actually run. We'll cover finding the usages, deciding where to migrate, guarding at runtime, and measuring after the move, as one continuous flow.
What Happens on July 24 — speed: "fast" Becomes an Error
First, let's get the shape of the change exactly right. What retires isn't the model; it's the fast mode speed setting on claude-opus-4-7. Per Anthropic's notice, after July 24 passing speed: "fast" to claude-opus-4-7 will error, and if you want fast mode you'll need to move to Opus 4.8's fast mode.
Here's the impact at a glance.
Call pattern
Behavior after July 24
model=claude-opus-4-7 + speed=fast
Error (migration required)
model=claude-opus-4-7 (no speed)
Works normally
model=claude-opus-4-8 + speed=fast
Runs in fast mode
The awkward part is that in most codebases speed is a write-once-and-forget parameter. You add it to a step where you want to shave latency, then never revisit it. In an interactive app you'd notice the exception the instant it fires, but in a nightly batch or a weekly scheduled job, failed runs just pile up in a log nobody reads. I've had unattended jobs stall silently more than once, and each time the real cost was how late I noticed.
So the goal of this migration isn't simply swapping claude-opus-4-7 for claude-opus-4-8. It's this: know where you use fast mode without relying on human memory, and make the system fall to the safe side at runtime even if a spot slips through.
Take Inventory First — Detect Fast Mode Usage Mechanically
Relying on memory or a single grep guarantees misses, because speed might arrive through a variable or hide in the default of a shared wrapper function. So instead of a plain string search, we set up a check that surfaces the places where Opus 4.7 and fast live in the same call.
Start with a lightweight first-pass sweep across the whole repo.
# First-pass screen: lines where opus-4-7 and speed/fast sit close together.# It can't see variable-based values, so treat it as "narrow the field."grep -rniE "opus-4-7|speed.{0,20}fast|fast.{0,20}speed" \ --include="*.py" --include="*.ts" --include="*.js" \ --include="*.json" --include="*.yaml" --include="*.yml" . \ | grep -viE "node_modules|/dist/|/build/"
Once the first pass narrows things down, judge each call precisely. Here we target Python calls and use the AST to extract only cases where a single call has both model=...opus-4-7 and speed="fast". Reading the syntax tree instead of string proximity cuts false positives.
# fast_mode_scan.py - detect the co-occurrence of Opus 4.7 + speed=fast via ASTimport astimport pathlibimport sysdef literal(node): """Return the string value if it's a constant string, else None.""" if isinstance(node, ast.Constant) and isinstance(node.value, str): return node.value return Nonedef scan_file(path: pathlib.Path): """Scan one file; return line numbers of risky calls.""" hits = [] try: tree = ast.parse(path.read_text(encoding="utf-8")) except (SyntaxError, UnicodeDecodeError): return hits # leave unparseable files to the first-pass screen for node in ast.walk(tree): if not isinstance(node, ast.Call): continue model_val, speed_val = None, None for kw in node.keywords: if kw.arg == "model": model_val = literal(kw.value) elif kw.arg == "speed": speed_val = literal(kw.value) # Even if model comes via a variable, flag it when speed=fast is present. risky_model = model_val is not None and "opus-4-7" in model_val risky_speed = speed_val == "fast" if risky_speed and (risky_model or model_val is None): hits.append((node.lineno, model_val or "<variable>", speed_val)) return hitsdef main(root="."): total = 0 for path in pathlib.Path(root).rglob("*.py"): if any(p in path.parts for p in ("node_modules", "dist", "build", ".venv")): continue for lineno, model, speed in scan_file(path): print(f"{path}:{lineno} model={model} speed={speed}") total += 1 print(f"\n{total} location(s) total") # Wire into CI: exit non-zero on any hit so it can't be ignored. return 1 if total else 0if __name__ == "__main__": sys.exit(main(sys.argv[1] if len(sys.argv) > 1 else "."))
A run might look like this:
services/summarizer.py:88 model=claude-opus-4-7 speed=fastjobs/nightly_digest.py:41 model=<variable> speed=fast2 location(s) total
For a hit reported as model=<variable>, like the second one, check the variable by hand. If a shared wrapper's default holds claude-opus-4-7, that single line may actually fan out to several jobs. I wire this kind of scan into CI and fail the build on any hit. Just turning the inventory from "a chore I do when I remember" into "a check that runs every time" takes a lot of the anxiety out of migration gaps.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦You can get ahead of the July 24 break, where hardcoded speed: fast in unattended jobs suddenly starts failing
✦You'll be able to mechanically detect fast mode usage across your codebase and add a fail-closed runtime guard
✦After migrating to Opus 4.8 fast mode, you can measure latency and cost to decide whether it's actually worth it
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Decide Where to Migrate — Opus 4.8 Fast Mode, or Drop speed
Once you've found the spots, decide the destination per location. The reflex is to swap everything to 4.8 fast mode, but it's worth pausing. This is also a chance to ask whether that step needs fast mode at all.
Option
Good fit for
Watch out for
Migrate to Opus 4.8 + speed=fast
Interactive UI, spots where low latency drives perceived quality
Verify 4.8 fast mode pricing and behavior by measuring
Drop speed, run at normal speed
Nightly batches, weekly rollups where wait time doesn't matter
Removing fast can shift output tendencies; do a regression check
Split by step
Pipelines mixing interactive and batch stages
Centralize config; leave no hardcoded values behind
In my own setup, most unattended batches favor quiet stability over raw speed, so I took the chance to drop the speed setting entirely in many places. The split is simple: reserve fast mode for interactive moments where perceived speed actually matters. Allocating speed and effort per step pairs well with the thinking in tuning Claude's output cost and latency by assigning the effort parameter per stage, and taking inventory of both together keeps your decisions consistent.
Guard Fail-Closed at Runtime — a Capability Probe
However carefully you do the migration, misses never hit zero. So we slip one layer into the code that falls to the safe side when it meets an expired setting. Here "safe side" doesn't mean swallowing the exception and continuing silently. It means detecting it explicitly, dropping to an alternate setting, and leaving a record of that fact in the log.
# safe_speed.py - a guard that neutralizes retired speed settings at runtimeimport datetime as dtimport logginglogger = logging.getLogger("model_guard")# Opus 4.7 fast mode retirement date (treat conservatively, in UTC)FAST_MODE_SUNSET = dt.date(2026, 7, 24)def resolve_call_options(model: str, speed: str | None, today: dt.date | None = None) -> dict: """Normalize model and speed into a safe combination. - On/after the sunset, if opus-4-7 + fast arrives, drop fast and continue. - Log every decision; never fall back silently. """ today = today or dt.date.today() opts = {"model": model} wants_fast = speed == "fast" is_legacy_fast = "opus-4-7" in model and wants_fast if is_legacy_fast and today >= FAST_MODE_SUNSET: # Expired combination: drop fast and continue at normal speed. logger.warning( "opus-4-7 fast mode retired on %s. Dropping speed and continuing " "(the caller's migration is incomplete).", FAST_MODE_SUNSET.isoformat()) return opts # no speed attached if speed is not None: opts["speed"] = speed return opts# Usageopts = resolve_call_options("claude-opus-4-7", "fast")# opts == {"model": "claude-opus-4-7"} # fast is stripped on/after the sunsetresponse = client.messages.create(max_tokens=1024, messages=[...], **opts)
Note that silently stripping fast here is a stopgap to "avoid stopping," not the correct end state. A logger.warning is a signal that a migration is incomplete, so keep those logs under monitoring and formally migrate the flagged spots later.
A Migration Shim for Unattended Pipelines
For an interactive app, the single guard above is enough. But when several scheduled jobs share one client, writing resolve_call_options at every call site isn't realistic. So wrap the client thinly so that every call passes through the guard automatically.
# guarded_client.py - a wrapper that guards messages.create across the boardclass GuardedMessages: def __init__(self, inner): self._inner = inner def create(self, *, model, speed=None, **kwargs): opts = resolve_call_options(model, speed) # reuse the guard above return self._inner.create(**opts, **kwargs)class GuardedClient: """Keep the same .messages.create interface as the existing client.""" def __init__(self, inner): self._inner = inner self.messages = GuardedMessages(inner.messages)# Existing code only needs to swap the clientclient = GuardedClient(Anthropic())
With this shape, even if some jobs haven't been migrated, you at least avoid a simultaneous outage after July 24. Each job keeps running at normal speed, and the warning logs tell you which spots still need work. What's scary in unattended operation isn't the failure itself; it's not noticing the failure. Convert a stoppage from a single error into a monitorable stream of warnings. That bit of extra effort pays off later.
As broader insurance against a model becoming unavailable without warning, the state-machine design in designing a router for the day a model becomes unavailable without warning is worth a look. Treating the fast mode retirement as one instance within that larger frame keeps your response from being ad hoc.
What to Check After Migrating — Measure Latency and Cost
Don't assume "I moved to 4.8 fast mode, so speed is the same." Both the model and the mode changed, so both the feel and the bill can change. Every time I migrate, I record latency and output token counts across a handful of representative calls, then line up before and after.
# measure.py - a simple before/after comparison on the same promptimport timedef measure(client, model, speed, prompt, runs=5): latencies, out_tokens = [], [] for _ in range(runs): start = time.perf_counter() opts = {"model": model} if speed: opts["speed"] = speed r = client.messages.create(max_tokens=512, messages=[{"role": "user", "content": prompt}], **opts) latencies.append(time.perf_counter() - start) out_tokens.append(r.usage.output_tokens) latencies.sort() p50 = latencies[len(latencies) // 2] return {"p50_sec": round(p50, 2), "avg_out_tokens": sum(out_tokens) // len(out_tokens)}# Measure the same prompt before and after and lay them side by sideprompt = "Summarize the following changelog in three lines: ..."# before = measure(old_client, "claude-opus-4-7", "fast", prompt)# after = measure(new_client, "claude-opus-4-8", "fast", prompt)
What to look at is the median latency (p50) and the trend in output tokens. If output tokens grew, your actual bill rises even at the same per-token price. If speed is sufficient for the requirement, dropping speed and running at normal speed can be cheaper. Rather than judging on unit price alone ("fast means fast, fast means expensive"), one round of measurement on your own prompt saves you from paying for nothing.
Common Pitfalls
The migration itself is less likely to trip you than what surrounds it, so here are just the two I actually hit.
One is a speed="fast" hiding in a shared wrapper's default argument. No individual call writes speed, yet the wrapper's def create_message(..., speed="fast") applies it uniformly. Grep the call sites alone and you find nothing; you only learn about it when everything falls over on the retirement date. Checking the defaults too is the reliable move.
The other is judging the retirement date in local time. Near the date boundary, if you don't evaluate conservatively in UTC, your job's location and the server-side switch can drift apart and you get behavior a day off from what you expected. That's exactly why the guard above treats the sunset as a date constant and confines the boundary to one place.
Wrapping Up
Run this article's fast_mode_scan.py against your repo once, right now. Zero hits and you can meet July 24 with peace of mind; any hits and you have the exact list of spots to migrate. Detect first, then add one fail-closed guard, in that order, and you'll cross it without scrambling.
The more your work runs unattended, the more exposed it is to this kind of quiet expiry. I can't claim to have every job fully accounted for myself. Even so, shifting bit by bit toward systems that rely on checks rather than memory has cut down the number of mornings I open a log and go pale. If you run things the same way, I hope this helps.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.