⬡ API & SDK/2026-06-16Advanced

Confirm Your Model Actually Responds Before a Scheduled Run Begins

A model you configured can be gone before your nightly job even wakes up. Tell retirement, withdrawal, and regional restriction apart with a single startup probe, then rewrite the run config to an eligible model — with complete, working TypeScript.

Claude API¹¹⁶ Anthropic SDK⁴ Automation⁴² Fallback² Scheduled Jobs

✦ Premium Article

When Claude Fable 5 and Mythos 5 were suspended for foreign-national users on June 12, my first reaction was not technical worry but a broken plan. The models had shipped only three days earlier, on June 9. The very next run I had wired into my verification schedule died the moment it started, with a flat "this model is unavailable." On the same June 15, claude-sonnet-4 and claude-opus-4 retired from the API. Anything that hard-coded those IDs in a config file would have called a model that no longer existed the instant the nightly job woke up.

For unattended jobs, the awkward part is not that a model is "down." It is that something usable yesterday is gone before the job even starts. The request does not fail mid-flight — the execution plan has lost its premise. This article is about checking that premise once, at the entrance to the run, and self-healing the config before any real work begins.

Why per-request fallback alone is not enough for unattended runs

Claude Code's fallbackModel, and the usual "if it fails, try the next model" retry you write against the API, work well interactively. A person watching the screen notices a mid-stream switch and can react.

Unattended scheduled jobs — like the auto-publishing runs I operate across four sites as an indie developer — are different. If you only discover at item three of a batch that the model retired, items one and two may already have flowed to a different model through retries. Output quality becomes a mix, and you can no longer tell which artifact came from which model without combing the logs. Per-request fallback is a "notice partway and partially recover" mechanism. It does not guarantee that the run starts and stays on the right model.

What an unattended run actually needs is not per-request insurance but an entrance check: before any real work, settle the question of whether this model truly responds, for this account, at this moment.

What a startup preflight actually fixes: partial completion and cost

Call this entrance check a startup preflight. It targets two problems.

The first is partial completion. If you fix the model before the batch's real work, every item in that run is processed by the same model. No mid-run swap, no patchy quality.

The second is cost predictability. Retirement and withdrawal, unlike a 429 rate limit or a 529 Overloaded, do not recover if you wait. Throwing repeated retries at an expensive model during the real work and then giving up costs far more than firing a single max_tokens: 1 probe at startup to pick an eligible model. After the June 15 billing change moved headless execution and the Agent SDK onto a separate monthly credit pool with no rollover, killing that wasted spend at startup translates directly into the headroom you have left at month-end.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A complete resolveEligibleModel() in TypeScript that resolves an eligible model at startup with a cheap max_tokens:1 probe

✦Classification logic that separates model-not-found / unavailable / 403 into retirement, withdrawal, and regional restriction

✦The design call to resolve once per run and emit a single structured log line to keep credit spend negligible

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

"Unavailable" comes in three shapes

Before implementing a preflight, you have to separate the kinds of "unavailable" the API returns. In what I have observed, the error shape differs clearly by cause.

Retired: the model ID no longer exists. Like claude-sonnet-4, it returns a 404 not_found_error style "model not found" from a cutover date onward. This is permanent; waiting does nothing.
Withdrawn: the model exists but is temporarily pulled. Like the Fable 5 suspension, it can return a body such as model is currently unavailable. It may come back later.
Restricted by region or eligibility: access is denied because of where the account sits or its eligibility. It returns as a 403 permission_error, and the same model may succeed from a different account or region — an asymmetry worth naming.

The distinction matters because it changes the fallback decision. Retirement means you should permanently update an alias layer; withdrawal means route to the next choice temporarily and wait for recovery; restriction means the model should not be in the candidate list at all.

Resolve an eligible model with one cheap probe

The core is a function that takes a priority-ordered list of candidate models, fires a tiny probe at each from the top, and returns the first that responds. We use the Anthropic TypeScript SDK.

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
 
type Ineligibility = "retired" | "withdrawn" | "restricted" | "transient";
 
interface ProbeResult {
  model: string;
  ok: boolean;
  reason?: Ineligibility;
  detail?: string;
}
 
// Sort an API error into one of the three "unavailable" shapes, or a transient fault
function classify(err: unknown): Ineligibility {
  const status = (err as { status?: number })?.status;
  const message = String((err as { message?: string })?.message ?? "").toLowerCase();
 
  if (status === 404 || message.includes("not found")) return "retired";
  if (status === 403 || message.includes("permission")) return "restricted";
  if (message.includes("currently unavailable") || message.includes("not available"))
    return "withdrawn";
  // 429 / 529 / network blips are treated as transient: they may recover if you wait
  return "transient";
}
 
// Hit a single model with max_tokens:1 to check whether it responds
async function probe(model: string): Promise<ProbeResult> {
  try {
    await client.messages.create({
      model,
      max_tokens: 1,
      messages: [{ role: "user", content: "ping" }],
    });
    return { model, ok: true };
  } catch (err) {
    const reason = classify(err);
    return { model, ok: false, reason, detail: String((err as Error)?.message ?? "") };
  }
}

classify() is its own function so the rules live in one place. Error wording can change over time, so I make the status code the primary signal and keep the text as a secondary hint.

Next, the resolver that walks the candidate list. Only a transient fault gets one short retry on the same model; retirement, withdrawal, and restriction move straight to the next candidate, because waiting is pointless.

const sleep = (ms: number) => new Promise((r) => setTimeout(r, ms));
 
interface Resolution {
  chosen: string | null;
  attempts: ProbeResult[];
}
 
// candidates is "most preferred first" — pass the list already filtered by capability
async function resolveEligibleModel(candidates: string[]): Promise<Resolution> {
  const attempts: ProbeResult[] = [];
 
  for (const model of candidates) {
    let result = await probe(model);
 
    // Only a transient fault waits 2s and re-checks once (retired/withdrawn/restricted do not wait)
    if (!result.ok && result.reason === "transient") {
      await sleep(2000);
      result = await probe(model);
    }
 
    attempts.push(result);
    if (result.ok) {
      return { chosen: model, attempts };
    }
  }
 
  return { chosen: null, attempts };
}

The candidate ordering drives quality. The point is to build "most preferred first" as "best capability match first," not raw preference. If single-pass 1M-context generation is the requirement, put only models that meet it into candidates. Silently falling to a cheaper model that "runs but misses the requirement" is the nastiest failure in unattended work.

Apply the resolution to the run config and log exactly one line

The model the probe settles on is fixed as config for the duration of that run. The real-work code never knows a model ID directly; it just receives the one resolved value.

interface RunConfig {
  model: string;
  resolvedAt: string;
  fallbackFrom: string[]; // skipped candidates, for audit
}
 
async function buildRunConfig(candidates: string[]): Promise<RunConfig> {
  const { chosen, attempts } = await resolveEligibleModel(candidates);
 
  // Exactly one structured log line. For unattended runs, being able to trace later is the lifeline.
  console.log(
    JSON.stringify({
      event: "model_preflight",
      ts: new Date().toISOString(),
      chosen,
      skipped: attempts
        .filter((a) => !a.ok)
        .map((a) => ({ model: a.model, reason: a.reason })),
    })
  );
 
  if (!chosen) {
    // If every candidate is dead, stop here. Safer than starting on a wrong model.
    throw new Error(
      `preflight failed: no eligible model among [${candidates.join(", ")}]`
    );
  }
 
  return {
    model: chosen,
    resolvedAt: new Date().toISOString(),
    fallbackFrom: attempts.filter((a) => !a.ok).map((a) => a.model),
  };
}
 
// The caller does only this. The real work never thinks about model IDs.
const config = await buildRunConfig([
  "claude-opus-4-8",
  "claude-haiku-4-5-20251001",
]);
// Pass config.model to every subsequent messages.create

Keeping the log to a single JSON line is deliberate. I later pull these with grep '"event":"model_preflight"' and line up which run settled on which model. Throwing when every candidate is dead is intentional too: producing nothing recovers faster than starting on a retired model and mass-producing empty artifacts. The more unattended a job is, the more it benefits from stopping cleanly when it should stop.

Probe cost, and how far you may cache the result

A probe is cheap but not free. Even max_tokens: 1 bills input tokens and a minimal output. In practice, against a batch that spends hundreds to thousands of tokens per run, the startup probe lands around ten tokens. Its share of the whole batch is rounding error, and it is orders of magnitude smaller than hurling repeated expensive retries at a retired model.

Caching needs a line drawn. My policy is to fix the probe result only for the duration of that run and never reuse it across runs. As Fable 5's three-day withdrawal showed, availability can shift by the hour. "It passed last time" does not mean it passes now. Conversely, there is no need to probe repeatedly within one run. Resolve once at startup, then fix it for that run. That granularity is the right compromise between cost and freshness, in my view.

For jobs that fire frequently (say every five minutes), where probe cost starts to matter, a middle path is to give the probe result a short TTL (a few minutes) shared within the same process. But the longer the TTL, the more the "stays blind to a vanished model for a while" risk returns, so for unattended runs it is safer to keep it short.

One next step

Pick one of your scheduled jobs and change it so it calls buildRunConfig() once, right before the real work. Concentrate model IDs at the config entrance and pass only the one resolved value into the real work. With that separation in place, whichever model retires or is withdrawn next, you fix exactly one spot — the candidate list. The longer something runs unattended, the more deciding its failure mode in advance becomes your best source of calm.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.