Three Days of Running Claude Fable 5 Side by Side with Opus 4.8 — Settling My Model Split Before June 15

On the morning of June 9, while reading the release notes announcing Claude Fable 5, my eyes drifted to June 15 on the calendar. That is the day billing for the Agent SDK and headless runs moves to monthly credits aligned with API rates. Evaluating a new model and locking down a cost structure — two deadlines landing in the same week.

Fortunately, Fable 5 is included at no extra cost on Pro, Max, and other major plans until June 22. So the plan wrote itself: use the free window to run it side by side with Opus 4.8 on the maintenance tasks I handle every day as an indie developer, and settle the split before June 15. Starting June 10, I spent three days deliberately handing the same work to both models.

The setup — run the same prompt twice, changing only the model

I used three tasks that had genuinely come up in maintaining my wallpaper apps:

Refactoring the image-cache layer (Swift, a module of roughly 1,200 lines)
Clustering about 40 Crashlytics crash logs by root cause
A design review with the entire module handed over at once (roughly 90k tokens in total)

No formal benchmark. If you run the same prompt twice and change only --model, the conditions are close enough.

# Hand the same task to two models and compare the structure of the output
claude -p --model claude-fable-5   < task_refactor.md > out_fable.md
claude -p --model claude-opus-4-8  < task_refactor.md > out_opus.md
diff <(grep '^##' out_fable.md) <(grep '^##' out_opus.md)

One detail worth noting: these claude -p headless runs are exactly what moves off the subscription cap and onto monthly credits on June 15. The experiment doubled as a rehearsal for asking, "what will this kind of work cost after the change?"

The clear difference showed up when I stopped splitting the input

Until now, when asking for a design review, I would hand over the module in three or four chunks and finish with a consolidation pass: "now point out contradictions across everything you have seen." Not because the context would overflow, but because the model's grip on the early chunks always seemed to loosen by the later ones.

With Fable 5, I passed the entire module — about 90k tokens — in one go. What stood out in the review it returned were the cross-file consistency findings. One file defines the cache eviction policy; the prefetching code in another file quietly assumes that eviction never happens to its entries. That is exactly the kind of contradiction my chunked reviews had failed to surface, and Fable 5 caught it in a single request.

The step of deciding how to split the input simply disappears. For me, that is the practical meaning of the 1M-token context.

The cost picture needs to be drawn up front, though. API pricing is $10 input / $50 output per MTok, so a 90k-token read costs about $0.90 in input alone. This is not a model for chatty back-and-forth. Ask for one deep read, take the findings home — treating it as a "close reading on commission" seems right.

On single-shot fixes, the difference mostly vanished

By contrast, on the image-cache refactor and the 40-log crash clustering, I struggled to find any practical difference between the two outputs. Both refactoring diffs were apply-as-is quality. The crash grouping matched on 38 of 40 logs, with only two interpreted differently.

Looking back at the price list, there is an interesting alignment. Opus 4.8's fast mode is now three times cheaper and 2.5 times faster than before — and it sits at the same $10/$50 per MTok as Fable 5. The same budget buys you either speed or depth. High-frequency work like daily batches and one-off fixes goes to whichever side keeps the wait short. That was my conclusion after three days.

Operational notes — the fallback and the wobble in thinking time

Fable 5 ships with a safety mechanism that blocks responses in high-risk domains such as cybersecurity and biology or chemistry, falling back automatically to Opus 4.8 (reportedly under 5% of sessions). I never saw it trigger during the three days, but if you wire Fable 5 into automation, I would design the logs to record which model produced each output. When you investigate quality variance later, a silently swapped model is the first variable you want to rule out.

The other observation: always-on adaptive thinking makes response times wobble. Even a prompt that looks trivial can make the model decide to think harder, which makes the total duration of a batch harder to predict. If you put Fable 5 on a schedule, give your timeouts more headroom than before.

For overload resilience, Claude Code's fallbackModel setting lets you specify a chain of fallbacks:

// .claude/settings.json — sequential fallback under overload
{
  "model": "claude-fable-5",
  "fallbackModel": ["claude-opus-4-8", "claude-sonnet-4-6"]
}

As for keeping model generation changes out of your business logic, the anti-corruption layer I described in Designing a Model Abstraction Layer for the Claude API applies directly here. Collect every hard-coded model name in one place, and a week like this one becomes a one-line config change.

How I split the two models after June 15

Three days of parallel runs settled my dividing line:

Daily one-off fixes, log clustering, routine batches → Opus 4.8 (fast mode where latency matters)
The few-times-a-month "hand over the whole module" design reviews and cross-cutting audits → Fable 5, with a fixed budget of runs
Revisit schedule timeouts and fallback settings with adaptive thinking in mind

The free window closes on June 22. If you have not tried it yet, pick one of your everyday tasks and run the same prompt twice, changing only --model. Where the difference appears — and where it does not — will show up in the shape of your own work, not in secondhand impressions.

For my part, one experiment remains: large single-pass generations using the 128k-token output limit. I plan to make one more round trip before the window closes. If you are weighing the same decision during the introduction period, I hope these notes are of some use.