A Three-Tier fallbackModel Setup for Claude Code — Keeping Unattended Runs Alive Through Overload Mornings

How I run Claude Code with a three-tier fallbackModel chain so overnight batches survive overload errors: logging which model actually ran, measuring quality drift on fallback days, and pairing it with deny rules.

claude-code¹²⁹ fallbackmodel automation⁹⁸ scheduled-jobs² reliability¹⁷

✦ Premium Article

One morning this June, my Crashlytics triage report arrived empty. As an indie developer I rely on a headless Claude Code run (claude -p) at 6 a.m. to classify the previous day's crash logs for my apps. That day, the log showed three overloaded_error responses (HTTP 529) in a row — the retry limit had been exhausted and the job had simply given up. It was a blunt reminder of how fragile a retry-only design really is. Unaddressed crashes feed directly into lower AdMob revenue and worse store reviews, which makes this triage the one batch in my operation I least want to lose.

When I tallied the previous 30 days of logs, the morning batch had come up empty on 4 of them — roughly a 13% loss rate. When you are sitting at the keyboard, "wait a bit and try again" solves this. Unattended, no amount of waiting wins against an overload that lasts longer than your retry window. So I moved to a three-tier setup using the fallbackModel setting that recently landed in Claude Code, switching models instead of merely retrying one. This article is a record of that design and what I learned actually operating it.

Where a retry loop alone stopped working

My pre-migration script was the usual exponential backoff:

#!/bin/bash
# Before: retries against the same model only. A long overload kills every attempt.
PROMPT_FILE="$HOME/ops/crashlytics_triage_prompt.md"
for i in 1 2 3; do
  if claude -p "$(cat "$PROMPT_FILE")" > /tmp/triage_result.md 2>/tmp/triage_err.log; then
    exit 0
  fi
  sleep $(( i * 60 ))  # 1 min -> 2 min -> 3 min
done
echo "triage failed after 3 attempts" >&2
exit 1

The flaw is that every retry queues up against the same congestion on the same model. A 529 is server-side overload; some mornings it clears in minutes, other mornings it lingers for half an hour or more. Looking back at the four failed days, all three retries had been spent within six minutes of the first failure — well inside the congestion window, every single time.

What changes with a fallbackModel array

Claude Code's fallbackModel accepts an array of up to three models in settings.json. When the primary cannot respond — overload, model unavailable — execution switches to the next model in line and keeps going. If retrying is rejoining the same queue, falling back is walking over to a different counter that happens to be open. For unattended runs, that difference is decisive.

// After: .claude/settings.json — switch models under overload instead of re-queueing
{
  "model": "claude-fable-5",
  "fallbackModel": ["claude-opus-4-8", "claude-sonnet-4-6"]
}

That is the entire change — the calling script stays untouched. In the 30 days since migrating, the fallback chain fired on 3 mornings, and the number of empty-report days dropped to zero. You cannot prevent 529s, but you can prevent them from costing you results.

One caveat from real operation: fallbackModel responds to overload and availability errors, not to timeouts. With an always-on adaptive-thinking model as the primary, even simple-looking tasks can think longer than expected, which makes wall-clock time harder to predict. I widened my batch-level timeout to 1.5x its previous value. Fallback and timeout are separate insurance policies and need to be designed separately.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦A working Fable 5 → Opus 4.8 → Sonnet 4.6 fallbackModel chain and the criteria behind each slot

✦A bash implementation that extracts the executing model from the stream-json init message and logs it to CSV

✦A measured 90% classification-agreement rate on a Sonnet 4.6 fallback day, and the two operational rules it produced

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Ordering the three tiers — what goes where

I ordered the chain like a staircase you descend one quality step at a time. My three criteria:

Tier 1 (primary): the model whose quality you actually want day to day. I am currently trialing Claude Fable 5 during its introductory window, with the explicit understanding that usage credits apply after June 22 and I may settle back on Opus 4.8 depending on cost
Tier 2: a model whose output style is close enough to the primary that prompts transfer as-is. Opus 4.8 tracks my triage classification rubric reliably, so it sits here
Tier 3: a light model you can honestly describe as "clearly better than no result at all." Sonnet 4.6 is fast and gets the broad classifications right

The configuration I would avoid is putting a lightweight model directly in tier 2 to minimize cost. The chain only fires a few times in tens of days, so the cost delta is noise at monthly scale — but a quality gap comes back to you as rework the very next morning. Precisely because activation is rare, spending quality on tier 2 is the trade that pays.

If you want the same philosophy at the API layer, the anti-corruption-layer approach in An Anti-Corruption Layer for Claude API Models — Keeping Generation Changes Out of Your Business Logic applies directly. The CLI's fallbackModel is, to my mind, that whole design compressed into one line of configuration.

Logging which model actually ran — extracting it from stream-json

The first thing that bothered me after enabling fallback was not knowing who wrote this morning's report. Fallback fires silently; without a record you can neither audit quality drift nor attribute cost.

I switched the output format to stream-json and now pull the model name from the session-init system message, appending it to a CSV:

#!/bin/bash
# Record the executing model; flag the report when it is not the primary
PROMPT_FILE="$HOME/ops/crashlytics_triage_prompt.md"
LOG="/tmp/triage_stream.jsonl"
PRIMARY="claude-fable-5"
 
claude -p "$(cat "$PROMPT_FILE")" \
  --output-format stream-json --verbose > "$LOG"
 
MODEL=$(jq -r 'select(.type == "system" and .subtype == "init") | .model' "$LOG" | head -1)
RESULT=$(jq -r 'select(.type == "result") | .result' "$LOG")
 
echo "$(date +%F),${MODEL}" >> "$HOME/ops/triage_model_history.csv"
 
if [ "$MODEL" != "$PRIMARY" ]; then
  # On fallback days, annotate the report itself so tomorrow-me cannot miss it
  printf "> Note: today's run used %s\n\n%s" "$MODEL" "$RESULT" > /tmp/triage_result.md
else
  printf '%s' "$RESULT" > /tmp/triage_result.md
fi

The key decision: treat a fallback event as an annotation on the report, not as an error. Fallback-day output is exactly what I want to inspect later, so the flag belongs in the place I read, not only in a CSV nobody opens. For unattended operation, that placement proved more reliable than any alerting.

Preparing for quality drift on fallback days

After one morning dropped all the way to tier 3 (Sonnet 4.6), I re-ran the same day's 52 crash logs through Fable 5 and compared classifications. 47 of 52 matched — about 90% agreement. All five disagreements were low-frequency crashes with multiple interacting causes, where the lighter model leaned conservative and routed them to "needs manual review."

From that measurement I set exactly two operational rules:

On fallback days, I manually re-check only the items classified as high priority — never the full report
If fallback fires more than three times in a month, I revisit either the primary model choice or the run's time slot

Preparing for quality drift is not about perfect verification; it is about deciding where trust ends and your own eyes begin. A measured 90% agreement rate turned out to be exactly enough evidence to draw that line with confidence.

Pairing it with glob deny rules — safety for runs that no longer stop

Once fallback guarantees your run keeps going, the fact that it keeps going becomes its own risk. Whatever model ends up executing, the areas it must never touch should be pinned down with permissions deny rules — which recently gained glob support, so patterns can be expressed compactly:

// .claude/settings.json — lines that no model, primary or fallback, may cross
{
  "permissions": {
    "deny": [
      "Write(~/ops/secrets/**)",
      "Bash(rm -rf*)",
      "Read(**/*.env)"
    ]
  }
}

I have come to see fallbackModel and deny rules as a matched pair: one provides the flexibility to keep executing, the other the rigidity of what stays protected no matter what executes. For permission design in unattended contexts more broadly, see Designing Claude Code Skills That Actually Run Unattended — Three Patterns to Avoid Permission-Dialog Stalls.

How the June 15 credit migration interacts with fallback

Headless claude -p moves to API-rate-based monthly credits on June 15. The easy thing to miss in a fallback design is that each tier consumes credits at a different weight. The chain fires rarely, yes — but a configuration that keeps a heavy primary running through a credit-starved final week risks stopping for budget reasons before overload ever becomes the problem.

I now check remaining monthly credits weekly and am experimenting with swapping the primary down one tier during the last week of each month. For the broader cost-side redesign of which pipeline stages deserve delegation at all, Reallocating My Automation Pipeline Ahead of the June 15 Billing Change covers the reasoning in detail.

What to do next

There are only three steps:

Count the failed days in your unattended-run logs over the past 30 days. Even one lost morning a month makes the migration worthwhile
Add the fallbackModel array to settings.json, ordering three models as a deliberate quality staircase
Wire up the stream-json model logging, and on the first morning the chain fires, measure the quality drift once

Those three steps get you an unattended setup that no longer dreads an overloaded morning.

If your own morning batches have been quietly coming up empty, I hope this record saves you the troubleshooting time it cost me.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.