⬡ API & SDK/2026-06-13Advanced

Auditing pinned model IDs before claude-sonnet-4 and claude-opus-4 retire from the API

On June 15, claude-sonnet-4 and claude-opus-4 retire from the API. Here is how to find every pinned model ID before then, measure output parity, and cut over safely with an alias layer and a fallback.

api-sdk⁹ migration⁵ model-deprecation production⁸⁸

✦ Premium Article

When you read "claude-sonnet-4 and claude-opus-4 retire from the API on June 15," the first thing to check is where those model IDs are hard-coded in your own code. The trouble is never in the obvious places. It is the nightly batch you wrote six months ago and never reopened, the default value of an environment variable, or the model: "claude-opus-4" buried inside a vendored wrapper you integrated long ago. On retirement day, that one spot quietly starts returning model_not_found.

As an indie developer, I have stepped on these forgotten constants more than once. The more reliable a piece of code is, the less you reread it, so when the retirement notice lands and you reach for a full-text search, your search term fails to match and you miss the very spots that matter. This walkthrough removes those misses mechanically, and carries the migration all the way through to a verified, staged cutover.

grep alone misses things — audit mechanically first

If grep -r "claude-opus-4" were enough, there would be no problem. What slips through are IDs embedded as defaults like os.environ.get("MODEL", "claude-opus-4"), IDs sitting as plain strings in config.json, and IDs written with a date suffix such as claude-opus-4-20250514.

So instead of searching for what is leaving, keep an allowlist of what is current and surface every other claude-*. Name the two retiring models explicitly, but also flag any unfamiliar ID as "confirm."

#!/usr/bin/env python3
"""Surface retiring or unknown model IDs left in a production codebase."""
import re
import sys
from pathlib import Path
 
# Model IDs current as of 2026-06 (anything else is treated as "confirm").
# Always verify the exact, latest IDs in the official docs.
ALLOWED = {
    "claude-opus-4-8",
    "claude-sonnet-4-6",
    "claude-haiku-4-5",
}
# Explicit targets retiring from the API on 6/15.
RETIRING = {"claude-sonnet-4", "claude-opus-4"}
 
# Anything that looks like a model ID, including a -20250514 date suffix.
MODEL_RE = re.compile(r"claude-[a-z]+-[0-9][a-z0-9-]*")
SCAN_EXT = {".py", ".ts", ".tsx", ".js", ".mjs", ".json",
            ".yaml", ".yml", ".env", ".toml", ".sh"}
 
 
def normalize(model_id: str) -> str:
    # Drop the trailing date suffix so we compare by "generation".
    return re.sub(r"-\d{8}$", "", model_id)
 
 
def scan(root: str):
    hits = []
    for path in Path(root).rglob("*"):
        if path.is_dir() or path.suffix not in SCAN_EXT:
            continue
        if "node_modules" in path.parts or ".git" in path.parts:
            continue
        try:
            text = path.read_text(encoding="utf-8", errors="ignore")
        except OSError:
            continue
        for lineno, line in enumerate(text.splitlines(), 1):
            for raw in MODEL_RE.findall(line):
                base = normalize(raw)
                if base in RETIRING:
                    hits.append((str(path), lineno, raw, "RETIRING 6/15"))
                elif base not in ALLOWED:
                    hits.append((str(path), lineno, raw, "unknown - confirm"))
    return hits
 
 
if __name__ == "__main__":
    root = sys.argv[1] if len(sys.argv) > 1 else "."
    found = scan(root)
    for path, lineno, model, tag in found:
        print(f"[{tag}] {path}:{lineno}  {model}")
    print(f"\n{len(found)} model IDs need attention", file=sys.stderr)
    sys.exit(1 if found else 0)

Run it at the repository root with python3 scan_models.py ., and any line tagged [RETIRING 6/15] is a spot that will error on the day. The allowlist approach also means that when a different model retires later, you update the allowlist and reuse the same scanner.

One caveat: fixing all the code does not fully settle it. IDs can linger in in-flight request logs or dashboard queries. Aggregate the last 30 days of request logs and print the distribution of model IDs actually being called — that catches the case where you thought you removed an ID from code but a code path still reaches it.

Why "sed it all at once" is dangerous

Once you have the hits, it is tempting to bulk-replace claude-opus-4 with claude-opus-4-8. But replacing a model ID is not a string edit — it is a behavior change. A newer generation can shift output token volume for the same prompt, shift latency, and subtly shift formatting habits (bulleted versus prose, how code blocks are fenced).

This is where one design decision pays off: do not scatter IDs through your code. If client.messages.create(model="...") appears in dozens of places, every replacement means reviewing dozens of spots — and you will miss exactly one. After getting burned by that one missed spot, I started confining model IDs to a single layer. The concrete implementation comes later.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A Python scanner that surfaces claude-sonnet-4 / claude-opus-4 hiding in code, cron jobs, config files, and vendored SDKs

✦A parity harness that measures what a naive replacement breaks (output token volume, formatting, downstream parsers) before retirement day

✦An alias layer that turns the next retirement into a one-line change, plus a fallback that surfaces model_not_found instead of swallowing it

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

A parity harness that measures output before retirement

Whether the replacement is safe is something you can measure without waiting for retirement. Take a handful of prompts you actually run in production, send the same input to both the old and the new model, and line up the output, token counts, and latency side by side.

import json
import time
from anthropic import Anthropic
 
client = Anthropic()  # reads ANTHROPIC_API_KEY from the environment
 
OLD = "claude-opus-4"      # retiring
NEW = "claude-opus-4-8"    # target (confirm the latest ID in the docs)
 
# Drop in 5-10 prompts you actually send in production.
PROMPTS = [
    "Classify this ticket as high/normal/low and give a one-sentence reason: ...",
    "Point out any bug in this diff, or return NONE if there is none: ...",
]
 
 
def run(model: str, prompt: str) -> dict:
    t0 = time.time()
    msg = client.messages.create(
        model=model,
        max_tokens=512,
        messages=[{"role": "user", "content": prompt}],
    )
    text = "".join(b.text for b in msg.content if b.type == "text")
    return {
        "text": text,
        "out_tokens": msg.usage.output_tokens,
        "latency_ms": round((time.time() - t0) * 1000),
    }
 
 
for prompt in PROMPTS:
    a = run(OLD, prompt)
    b = run(NEW, prompt)
    print(json.dumps({
        "prompt": prompt[:40],
        "identical": a["text"].strip() == b["text"].strip(),
        "out_tokens": [a["out_tokens"], b["out_tokens"]],
        "latency_ms": [a["latency_ms"], b["latency_ms"]],
    }))

If identical comes back mostly false, do not panic. Word-for-word identical output is the exception. What matters is whether the downstream breaks. If you json.loads the output, check that the new model's output still passes schema validation. If formatting shifts, any regex that slices the output will drift. If a classification task expects only a high/normal/low label, check that the new model does not add a preamble.

In these comparisons I have repeatedly found that the output itself improved, while a brittle downstream parser could not absorb the new formatting and broke. The thing to fix is usually the parser, not the model. Knowing this before retirement turns the cutover from "deploy and pray" into "deploy what you already verified."

An alias layer that makes the next retirement a one-liner

Once verified, do not sprinkle the target ID across the code. Collapse it into a single layer that resolves a logical name to a concrete ID. Naming by role is the trick.

// models.ts — an alias layer that confines model IDs to one place
export type ModelRole = "reasoning" | "balanced" | "fast";
 
const RESOLVE: Record<ModelRole, string> = {
  reasoning: "claude-opus-4-8",
  balanced: "claude-sonnet-4-6",
  fast: "claude-haiku-4-5",
};
 
export function modelFor(role: ModelRole): string {
  return RESOLVE[role];
}

Callers never know the concrete ID.

import { modelFor } from "./models";
 
const msg = await client.messages.create({
  model: modelFor("reasoning"),
  max_tokens: 1024,
  messages,
});

Whichever model retires next, the only change is one line in RESOLVE. Caller code stays untouched. That is the practical payoff of an anti-corruption layer (a boundary that keeps external concerns from leaking into your code): you move from full-text-searching on every retirement notice to editing one line in a config.

A safety net after retirement — do not swallow model_not_found

Even with an alias layer, a slow deploy or a cached copy of old code can produce model_not_found for a few minutes right after retirement. Leave that unguarded and those few minutes become an outage.

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Temporary landing spot if a retired ID is somehow still called.
const FALLBACK: Record<string, string> = {
  "claude-opus-4": "claude-opus-4-8",
  "claude-sonnet-4": "claude-sonnet-4-6",
};
 
export async function createWithFallback(
  params: Anthropic.MessageCreateParams,
) {
  try {
    return await client.messages.create(params);
  } catch (err) {
    if (err instanceof Anthropic.NotFoundError && FALLBACK[params.model]) {
      const next = FALLBACK[params.model];
      console.warn(`model ${params.model} unavailable -> falling back to ${next}`);
      metrics.increment("model_fallback", { from: params.model, to: next });
      return client.messages.create({ ...params, model: next });
    }
    throw err;
  }
}

But a fallback is one step from a cover-up. Always emit the console.warn and a metric. "The fallback keeps firing" equals "a code path that still references a retired ID survives." If you silently absorb it, code pointing at the retired ID lingers indefinitely. The right mental temperature: a safety net exists to detect that the migration is done and make itself unnecessary.

Do not flip it all at once — watch a canary

Finally, stage the cutover itself. You do need everything on the new model by the deadline, but flipping 100% at once means any unexpected behavior gap hits every user.

In practice I always go in this order. First, point only my own and internal traffic at the new model for a day and watch output quality and downstream error rate. If that is clean, widen to about 10% and put three numbers side by side on a dashboard: average output_tokens, downstream parser error rate, and p95 latency. Only after confirming no regression in the numbers do I move the rest.

// Stably route ~10% to the new model by hashing the user ID.
function useNewModel(userId: string): boolean {
  return hash(userId) % 100 < 10;
}

The nice thing about a canary is that rollback is "change the number from 10 back to 0." Get this in place with room to spare before retirement, and on the day you just set the percentage to 100 and the migration is done.

If you are racing the same migration, I hope this gives you a usable order of operations. Start by running the scanner above once at your production repo root. If even one [RETIRING 6/15] shows up, that is where model_not_found will land on the day.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.