CLAUDE LABJP
BILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runsBILLING — The Jun 15 change is now live: Agent SDK, headless runs, GitHub Actions, and third-party agents leave subscription limits for separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverRETIRED — As of today, Sonnet 4 and Opus 4 are retired from the API; scripts referencing older models should switch to the latest generation such as Opus 4.8EXPORT — Claude Fable 5 and Mythos 5 are suspended for all foreign nationals under a US export-control directive (Jun 12); Anthropic calls it a misunderstanding and is working to restore accessSAFE — Only the two new Mythos-class models are affected; every other model including Opus 4.8 keeps running normallySUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents (up to 5 levels), and Dynamic workflows arrived in research previewINCIDENT — A Jun 5 outage raised error rates across claude.ai, the API, Claude Code, and Cowork, a reminder to design retries and fallbacks into automated runs
Articles/API & SDK
API & SDK/2026-06-15Advanced

When a Model Disappears Without Warning: A State Machine for Retirement, Withdrawal, and Overload

A model can become unusable in hours for reasons that have nothing to do with a technical outage. This guide models three distinct flavors of 'unavailable'—retirement, withdrawal, and transient overload—as one availability state machine, with a router that keeps automated pipelines running. Working TypeScript and Python included.

claude-api61architecture8resilience8model-migration2production91fallback7

Premium Article

One morning, before kicking off the automated publishing across my four sites, I opened the changelog and the official announcements as usual. One of the models I had been using for smoke tests the day before had been suspended within hours for reasons that were not a technical outage—withdrawn for all foreign-national users. There was no retirement email. The API had not started returning 429. The model name that answered yesterday simply stopped being accepted today.

What I run is a content-generation pipeline. No lives or payments depend on it. Even so, the fact that a headless job running overnight assumed a specific model name suddenly felt heavy. I did have a fallback, but that branch only anticipated 429 and 529—transient overload. Permanent retirement and a sudden, externally driven withdrawal were lumped together inside the same fallback().

What I learned that day is that "unavailable" wears several different faces, and pouring them all into one exception handler leads you to make the wrong recovery decision. Transient overload returns in minutes. Hammering a withdrawn model every few minutes only piles up wasted failures. A retired model never comes back no matter how long you wait. This article designs an availability state machine that holds these three as distinct states, and a router built on top of it that changes behavior per state.

"Unavailable" Has Three Faces

Let me classify, by nature, the kinds of "unavailable" that stop automation. I settled on three.

The first is retirement. The vendor announces it in advance, and past a certain date the model is permanently no longer accepted. Today the older claude-sonnet-4 and claude-opus-4 retired from the API. This is predictable and the successor is known. Waiting won't help, so the right move is to switch to the successor the moment you detect it.

The second is withdrawal. Not a technical outage, but a short-notice suspension driven by policy, legal, or security decisions—external factors. There is no announced date, and recovery carries the uncertainty of "maybe it will come back eventually." The case I opened with is this. Unlike retirement, no successor is waiting, so you need to decide whether to shift sideways to another logical role or temporarily scale that job down.

The third is overload. Transient unavailability that recovers on its own within minutes to hours, like 429 Too Many Requests or 529 Overloaded. If you perform a permanent switch here, you stay needlessly downgraded from the higher model you actually wanted. The correct response is exponential backoff, then return to the original once it recovers.

Handle all three in a single catch, and you get misjudgments: applying overload-style backoff endlessly to a withdrawn model, or permanently downgrading an overloaded model as if it were retired. That is why the states are held separately.

Decouple Logical Roles From Physical Model IDs

Before designing the state machine, I set up the registry that everything rests on. As long as business code knows model strings directly, every withdrawal or retirement means rewriting dozens of call sites. As an indie developer I learned this the hard way at first.

So the only thing the code references is a logical role (fast / balanced / deep), and the mapping from role to physical model ID, along with each model's availability state, is consolidated in one place.

// model-registry.ts
export type Role = "fast" | "balanced" | "deep";
export type Availability = "available" | "overloaded" | "withdrawn" | "retired";
 
interface ModelEntry {
  id: string;            // physical model ID
  inputPer1M: number;    // input token price (USD / 1M tokens)
  outputPer1M: number;   // output token price
  availability: Availability;
  understudy?: string;   // in-role fallback ID for withdrawal/retirement
}
 
// Each role holds a priority-ordered candidate list; the head is first choice.
export const REGISTRY: Record<Role, ModelEntry[]> = {
  deep: [
    { id: "claude-opus-4-8", inputPer1M: 5.0, outputPer1M: 25.0, availability: "available" },
    { id: "claude-sonnet-4-6", inputPer1M: 3.0, outputPer1M: 15.0, availability: "available" },
  ],
  balanced: [
    { id: "claude-sonnet-4-6", inputPer1M: 3.0, outputPer1M: 15.0, availability: "available" },
    { id: "claude-haiku-4-5", inputPer1M: 1.0, outputPer1M: 5.0, availability: "available" },
  ],
  fast: [
    { id: "claude-haiku-4-5", inputPer1M: 1.0, outputPer1M: 5.0, availability: "available" },
  ],
};

Business code requests a model by logical role, like route("deep"), and never knows the physical ID. To remove a withdrawn model from every job, you rewrite the availability of one entry in REGISTRY. That single-file containment is the core of the design.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A transition table that treats 'retirement' (announced permanent removal), 'withdrawal' (short-notice suspension), and 'overload' (minutes-to-hours of 429/529) as separate states, plus a router whose behavior changes per state.
A ModelRegistry that maps logical roles (fast / balanced / deep) to physical model IDs in one file, so a withdrawn model can be removed from every job with a one-line state change (TypeScript and Python).
A daily preflight probe that detects withdrawal with a single tiny request, and a cost-accounting routine that re-prices tokens on fallback so your monthly totals never quietly drift.
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-03
An Anti-Corruption Layer for Claude API Models — Keeping Generation Changes Out of Your Business Logic
Hard-coding model strings into business logic means production breaks quietly every time a generation is retired. Here is an anti-corruption layer that separates logical roles from physical model IDs, with working TypeScript and Python, migration costs, and the judgment calls behind it.
API & SDK2026-05-26
Designing Graceful Degradation for the Claude API — A Four-Tier Fallback Architecture That Keeps AI Features Quietly Alive
Once Claude API features hit real production traffic, model-level fallback alone stops being enough. This article walks through an SLI-driven four-tier degradation design, with Python and TypeScript code, SLO burn-rate alerting, and the operational trade-offs an indie developer actually runs into.
API & SDK2026-04-23
High-Availability Patterns for the Claude API — Making Sonnet/Haiku/Opus Fallback Work in Production
A single-model Claude API integration will fall over the first time rate limits or a regional hiccup land at peak hours. This is the production pattern for a Sonnet → Opus → Haiku fallback chain, with circuit breakers, streaming coverage, and the pitfalls you only learn the hard way.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →