●BILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rollover●RETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8●EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore access●SAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normally●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research preview●INCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runs●BILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rollover●RETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8●EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore access●SAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normally●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research preview●INCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runs
When a Model Disappears Without Warning: A State Machine for Retirement, Withdrawal, and Overload
A model can become unusable in hours for reasons that have nothing to do with a technical outage. This guide models three distinct flavors of 'unavailable'—retirement, withdrawal, and transient overload—as one availability state machine, with a router that keeps automated pipelines running. Working TypeScript and Python included.
One morning, before kicking off the automated publishing across my four sites, I opened the changelog and the official announcements as usual. One of the models I had been using for smoke tests the day before had been suspended within hours for reasons that were not a technical outage—withdrawn for all foreign-national users. There was no retirement email. The API had not started returning 429. The model name that answered yesterday simply stopped being accepted today.
What I run is a content-generation pipeline. No lives or payments depend on it. Even so, the fact that a headless job running overnight assumed a specific model name suddenly felt heavy. I did have a fallback, but that branch only anticipated 429 and 529—transient overload. Permanent retirement and a sudden, externally driven withdrawal were lumped together inside the same fallback().
What I learned that day is that "unavailable" wears several different faces, and pouring them all into one exception handler leads you to make the wrong recovery decision. Transient overload returns in minutes. Hammering a withdrawn model every few minutes only piles up wasted failures. A retired model never comes back no matter how long you wait. This article designs an availability state machine that holds these three as distinct states, and a router built on top of it that changes behavior per state.
"Unavailable" Has Three Faces
Let me classify, by nature, the kinds of "unavailable" that stop automation. I settled on three.
The first is retirement. The vendor announces it in advance, and past a certain date the model is permanently no longer accepted. Today the older claude-sonnet-4 and claude-opus-4 retired from the API. This is predictable and the successor is known. Waiting won't help, so the right move is to switch to the successor the moment you detect it.
The second is withdrawal. Not a technical outage, but a short-notice suspension driven by policy, legal, or security decisions—external factors. There is no announced date, and recovery carries the uncertainty of "maybe it will come back eventually." The case I opened with is this. Unlike retirement, no successor is waiting, so you need to decide whether to shift sideways to another logical role or temporarily scale that job down.
The third is overload. Transient unavailability that recovers on its own within minutes to hours, like 429 Too Many Requests or 529 Overloaded. If you perform a permanent switch here, you stay needlessly downgraded from the higher model you actually wanted. The correct response is exponential backoff, then return to the original once it recovers.
Handle all three in a single catch, and you get misjudgments: applying overload-style backoff endlessly to a withdrawn model, or permanently downgrading an overloaded model as if it were retired. That is why the states are held separately.
Decouple Logical Roles From Physical Model IDs
Before designing the state machine, I set up the registry that everything rests on. As long as business code knows model strings directly, every withdrawal or retirement means rewriting dozens of call sites. As an indie developer I learned this the hard way at first.
So the only thing the code references is a logical role (fast / balanced / deep), and the mapping from role to physical model ID, along with each model's availability state, is consolidated in one place.
// model-registry.tsexport type Role = "fast" | "balanced" | "deep";export type Availability = "available" | "overloaded" | "withdrawn" | "retired";interface ModelEntry { id: string; // physical model ID inputPer1M: number; // input token price (USD / 1M tokens) outputPer1M: number; // output token price availability: Availability; understudy?: string; // in-role fallback ID for withdrawal/retirement}// Each role holds a priority-ordered candidate list; the head is first choice.export const REGISTRY: Record<Role, ModelEntry[]> = { deep: [ { id: "claude-opus-4-8", inputPer1M: 5.0, outputPer1M: 25.0, availability: "available" }, { id: "claude-sonnet-4-6", inputPer1M: 3.0, outputPer1M: 15.0, availability: "available" }, ], balanced: [ { id: "claude-sonnet-4-6", inputPer1M: 3.0, outputPer1M: 15.0, availability: "available" }, { id: "claude-haiku-4-5", inputPer1M: 1.0, outputPer1M: 5.0, availability: "available" }, ], fast: [ { id: "claude-haiku-4-5", inputPer1M: 1.0, outputPer1M: 5.0, availability: "available" }, ],};
Business code requests a model by logical role, like route("deep"), and never knows the physical ID. To remove a withdrawn model from every job, you rewrite the availability of one entry in REGISTRY. That single-file containment is the core of the design.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A transition table that treats 'retirement' (announced permanent removal), 'withdrawal' (short-notice suspension), and 'overload' (minutes-to-hours of 429/529) as separate states, plus a router whose behavior changes per state.
✦A ModelRegistry that maps logical roles (fast / balanced / deep) to physical model IDs in one file, so a withdrawn model can be removed from every job with a one-line state change (TypeScript and Python).
✦A daily preflight probe that detects withdrawal with a single tiny request, and a cost-accounting routine that re-prices tokens on fallback so your monthly totals never quietly drift.
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Next, make explicit how each model's availability transitions. Leave it vague and you end up deciding "when do we revert?" by gut feel during operations, losing reproducibility.
I defined the transitions as follows.
available → overloaded: on a run of consecutive 429 or 529. Record a timestamp; mark as a candidate to auto-return to available after a cooldown.
overloaded → available: after the cooldown (default 10 minutes), once the first successful response is confirmed.
available → retired: past the announced retirement date, or on a permanent not-found error. Never auto-reverted.
available → withdrawn: on behavior indicating withdrawal (a model that succeeded until moments ago starts getting rejected en masse with validation or permission errors). With no successor defined, scale down to the next in-role candidate. Reverted only by an explicit operator action.
The key is that retired and withdrawn are not auto-recovered. Only overload naturally heals over time; the other two won't return unless the outside world changes. If a machine reverts on its own, it keeps slamming a withdrawn model and mass-produces failures.
// availability-machine.tsconst OVERLOAD_COOLDOWN_MS = 10 * 60 * 1000; // 10 minutesinterface Health { state: Availability; overloadedAt?: number; consecutive429: number;}const health = new Map<string, Health>(); // key: model IDexport function noteResult( id: string, outcome: "ok" | "overloaded" | "retired" | "withdrawn",) { const h = health.get(id) ?? { state: "available", consecutive429: 0 }; if (outcome === "ok") { h.state = "available"; h.consecutive429 = 0; h.overloadedAt = undefined; } else if (outcome === "overloaded") { h.consecutive429 += 1; if (h.consecutive429 >= 3) { h.state = "overloaded"; h.overloadedAt = Date.now(); } } else if (outcome === "retired") { h.state = "retired"; // no auto-recovery } else if (outcome === "withdrawn") { h.state = "withdrawn"; // operator action only } health.set(id, h);}export function effectiveState(id: string): Availability { const h = health.get(id); if (!h) return "available"; // Only overload heals naturally over time. if (h.state === "overloaded" && h.overloadedAt && Date.now() - h.overloadedAt > OVERLOAD_COOLDOWN_MS) { return "available"; // cooldown over; confirmed by the next success } return h.state;}
Implement the Router
With the registry and health in place, the router becomes a thin layer that just "takes a logical role and returns the first usable candidate." The distinctions among withdrawal, retirement, and overload are all absorbed into effectiveState, so the router itself carries no branching.
// router.tsimport Anthropic from "@anthropic-ai/sdk";import { REGISTRY, Role } from "./model-registry";import { noteResult, effectiveState } from "./availability-machine";const client = new Anthropic(); // ANTHROPIC_API_KEY from envfunction pickModel(role: Role): string { const candidates = REGISTRY[role] .filter((m) => effectiveState(m.id) === "available"); if (candidates.length === 0) { throw new Error(`no available model for role ${role}`); } return candidates[0].id;}function classifyError(err: unknown): "overloaded" | "retired" | "withdrawn" | "other" { const e = err as { status?: number; error?: { type?: string } }; if (e.status === 429 || e.status === 529) return "overloaded"; // A permanent not-found is treated as retirement. if (e.error?.type === "not_found_error") return "retired"; // A model that succeeded until moments ago, now rejected on permission, hints at withdrawal. if (e.status === 403 || e.error?.type === "permission_error") return "withdrawn"; return "other";}export async function route(role: Role, params: Omit<Anthropic.MessageCreateParams, "model">) { let lastErr: unknown; for (let attempt = 0; attempt < REGISTRY[role].length; attempt++) { const model = pickModel(role); try { const res = await client.messages.create({ ...params, model }); noteResult(model, "ok"); return { res, model }; } catch (err) { lastErr = err; const kind = classifyError(err); if (kind === "other") throw err; // unexpected: rethrow noteResult(model, kind); // overloaded may recover after cooldown, but this call falls // through to the next candidate to return a response now. } } throw lastErr;}
What pays off here is classifying the error kind up front with classifyError. A 403 or permission error can come from a misconfiguration, but when it starts suddenly on a model that succeeded until moments ago, it is worth treating as a withdrawal signal. If false positives worry you, add a guard: only flag withdrawn for a model that has a successful call within the last N minutes.
Detect Withdrawal Early With a Daily Preflight
Before the nightly batch runs, place a preflight that fires a single tiny call to each role's head candidate to catch withdrawal and retirement early. It is far cheaper and safer than discovering the problem mid-way through a heavy job.
# preflight.py — one lightweight probe per roleimport anthropicclient = anthropic.Anthropic() # ANTHROPIC_API_KEY from envPROBE_BY_ROLE = { "deep": "claude-opus-4-8", "balanced": "claude-sonnet-4-6", "fast": "claude-haiku-4-5",}def probe(model_id: str) -> str: try: client.messages.create( model=model_id, max_tokens=1, messages=[{"role": "user", "content": "ok"}], ) return "available" except anthropic.APIStatusError as e: if e.status_code in (429, 529): return "overloaded" # transient; batch may proceed if e.status_code == 404: return "retired" # switch to successor required if e.status_code == 403: return "withdrawn" # scale-down decision required raiseif __name__ == "__main__": blocking = [] for role, model_id in PROBE_BY_ROLE.items(): state = probe(model_id) print(f"{role:9s} {model_id:20s} -> {state}") if state in ("retired", "withdrawn"): blocking.append((role, model_id, state)) if blocking: for role, model_id, state in blocking: print(f"::alert:: role={role} model={model_id} state={state}")
A max_tokens=1 probe costs a trivial amount per call. In my own operation, one run a day across three roles came to a few yen a month. Slipping it in right before the nightly batch nearly eliminated the "discover the withdrawal mid-way through a heavy job" accident.
Don't Let Fallback Corrupt Your Cost Records
An easy thing to miss is cost accounting when a fallback changes the model. When the deep role downgrades from claude-opus-4-8 to claude-sonnet-4-6, the per-token price changes for the same token count. Keep charging at the first-choice price without recording the downgrade and your end-of-month totals quietly drift.
The router returns the model ID that actually succeeded, so always account using that return value.
// cost.tsimport { REGISTRY } from "./model-registry";export function recordCost( usedModelId: string, inputTokens: number, outputTokens: number,): number { const entry = Object.values(REGISTRY).flat().find((m) => m.id === usedModelId); if (!entry) throw new Error(`unknown model: ${usedModelId}`); const usd = (inputTokens / 1_000_000) * entry.inputPer1M + (outputTokens / 1_000_000) * entry.outputPer1M; // Price by the model actually used, not the first choice. return usd;}
Downgrades can be cheaper, but a configuration where the fast role climbs to balanced under overload can also raise the unit price. Anchor accounting on the actually-used model so re-pricing is correct in either direction. In a period like today, when the billing scheme itself is under review, this "account by actual use" becomes your direct way of observing real cost.
Lessons and Pitfalls From Production
After running this setup for a few weeks, here is what I saw, candidly.
First, treating automatic withdrawal detection as advisory only turned out to be the practical stance. Declaring a 403 to be a withdrawal leaves too much room for false positives. I eventually settled on: when the preflight probe reports withdrawn, notify a human, and let the operator confirm the registry state change. The only things I let the machine do automatically are overload scale-down and recovery. The line between what a machine may decide in production and what a person should decide is best drawn by the cost of a false positive.
Second, always keep one in-role alternative that is not from the same lineage. In an event like today's withdrawal, where a whole family becomes unusable at once, escaping to another model in the same family is no guarantee of safety. Place a generation that is lower in capability but reliably present as the deep role's scale-down target, so the worst case is still "scale down and finish."
Third, set the overloaded cooldown too short and you get oscillation—repeatedly reverting to the first choice during a peak only to be rejected again. I started at 10 minutes and tuned it against the measured duration of 529 spells. I recommend measuring how long overload actually lasts before settling on a value, rather than hard-coding a guess.
Finally, the essence of this design comes down to clearly holding "things that return if you wait" and "things that won't return until the outside world changes" as distinct states in code. When a day like today arrives, being able to change one line in the registry—rather than scrambling to rewrite code—is what keeps automation running. For someone operating several systems alone, that quiet assurance is worth a great deal.
As a next step, count how many places in your own pipeline hard-code the model string you use today, with a quick grep. That number is exactly how many edits a generation change or a withdrawal will cost you.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.