●MODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomy●PRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31●MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agents●CODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context window●CODE — Claude Code adds sandbox credential blocking and org-level model restrictions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●MODEL — Claude Sonnet 5 becomes the default across all plans, with stronger planning, tool use, and autonomy●PRICE — Sonnet 5 launches at $2 input / $10 output per million tokens through August 31●MODEL — Sonnet 5 nears Opus 4.8 performance at a lower price for always-on agents●CODE — Claude Code adopts Sonnet 5 as default with a native 1M-token context window●CODE — Claude Code adds sandbox credential blocking and org-level model restrictions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access
Which Model Ran Last Night's Unattended Session? Building Model Attribution and Default-Drift Detection After the Sonnet 5 Switch
Claude Code's default model switched to Sonnet 5, and unpinned headless runs changed models silently. Here is a working design for extracting the actual model from run output, appending an atomic run record, and deciding per task lineage whether to pin or follow the default.
On July 2, Claude Code's default model switched to Claude Sonnet 5. In interactive use you notice immediately — the model name is right there on screen. The sessions that worry me are the unattended ones launched without --model. They raise no errors. The logs look completely normal. And yet a different model wrote last night's code than the night before.
As an indie developer who runs article pipelines for several sites on scheduled Claude Code sessions here at Dolice Labs, the first thing I checked that morning was which task lineages had been riding the default. Then came the uncomfortable discovery: my run logs had never recorded the model, so I couldn't strictly prove which model my July 1 runs had used. If output quality shifted, I had no evidence to attribute it to the model or to my own prompt changes. This article is the design I built out of that morning: per-run model attribution plus default-drift detection.
Why a Default-Model Change Is the Nastiest Failure Mode in Unattended Runs
A default-model change, by contrast, succeeds loudly. Exit code 0, artifacts generated, everything green. What changes are the slow-burn properties: tone, structure, latency, unit cost. To be clear, I welcome this particular switch — Sonnet 5 ships with intro pricing ($2 per million input tokens and $10 per million output tokens through August 31, 2026, then $3/$15) and stronger planning and tool use. The problem isn't the model. The problem is being unable to state, from your own records, when each task switched and to what. Without that, root-cause analysis is permanently broken.
Take Inventory of Every Place a Model Gets Decided
Before adding any tooling, map where the model is actually being chosen in your environment. Claude Code accepts the setting through several channels, and if you don't know which one wins for each task, even good records will mislead you.
Channel
Example
Scope
Unattended-run caveat
CLI flag
claude -p --model claude-sonnet-5
That invocation only
Easiest to audit — it's visible in the launch command
model in settings.json
"model": "claude-opus-4-8"
All sessions in that project
Scattered across repos, easy to miss during inventory
Environment variable
ANTHROPIC_MODEL
Every run in that shell environment
Hides inside cron or runner config; hardest to spot
Unspecified (default)
—
Every session with none of the above
When the default moves, all of these move together — as they did this week
I wrote one line per task lineage stating which channel decides its model. Of my nine lineages, six were already pinned via flag or settings.json; three were riding the default — and those three had been running on Sonnet 5 since the morning of July 2 without telling me.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦If you couldn't tell whether a batch job's output changed because of the model or because of your prompt, you'll be able to answer in minutes with a per-run model trail
✦You'll take home working TypeScript that defensively extracts model IDs from both the headless JSON result and the transcript, appends records atomically, and judges drift
✦You'll be able to decide pin-versus-follow for each task lineage using three concrete axes: the intro-pricing deadline, deprecation ownership, and behavioral stability requirements
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Extracting the Actual Model From Run Output — Defensively, From Two Sources
Next, record the model that actually ran, on every execution. The result object from headless runs (claude -p --output-format json) carries usage information, but its exact field layout can shift between versions. I deliberately avoid betting on a single field and probe candidates in order. The session transcript (JSONL) is my second source, since assistant events carry the model ID.
// extract-models.ts — defensively extract model IDs from run output// Source 1: the result object from `claude -p --output-format json`// Source 2: the session transcript JSONL (assistant events)import { readFileSync } from "node:fs";export function modelsFromResultJson(raw: string): string[] { const found = new Set<string>(); try { const r = JSON.parse(raw); // Candidate 1: per-model usage breakdown (keys are model IDs) if (r && typeof r.modelUsage === "object" && r.modelUsage !== null) { for (const k of Object.keys(r.modelUsage)) found.add(k); } // Candidate 2: a flat `model` field if (typeof r.model === "string") found.add(r.model); } catch { // Not JSON — defer to the transcript, but never swallow silently console.error("Failed to parse result JSON; falling back to transcript"); } return [...found];}export function modelsFromTranscript(path: string): string[] { const found = new Set<string>(); for (const line of readFileSync(path, "utf8").split("\n")) { if (!line.trim()) continue; try { const e = JSON.parse(line); const m = e?.message?.model; // model ID on assistant events if (typeof m === "string" && m.startsWith("claude-")) found.add(m); } catch { /* skip corrupt lines (partial-write protection) */ } } return [...found];}// Expected output:// from result JSON → ["claude-sonnet-5"]// from transcript → ["claude-sonnet-5", "claude-haiku-4-5"] // subagents can mix in a second model
Two implementation details matter here. First, the return type is an array, not a string. Sessions that spawn subagents or background tasks can legitimately involve more than one model, and a "single primary model" assumption silently drops that information. Second, when the two sources disagree, I record both rather than throwing. The extractor's job is evidence, not arbitration — a human can reconcile later.
An Append-Only Run Record That Survives Crashes
Each extracted model list gets appended, one line per run, to a per-lineage record file. Unattended environments must assume the process can die mid-write, so I write to a temp file and rename it into place. Readers never see a half-written line.
// run-ledger.ts — atomic append to the run recordimport { readFileSync, writeFileSync, renameSync, existsSync } from "node:fs";export interface RunRecord { ts: string; // ISO 8601, normalized to one timezone lineage: string; // task lineage ID, e.g. "site-a-premium-fri" models: string[]; // model IDs actually observed source: "result" | "transcript" | "both"; exitCode: number;}export function appendRun(path: string, rec: RunRecord): void { const prev = existsSync(path) ? readFileSync(path, "utf8") : ""; const next = prev + JSON.stringify(rec) + "\n"; const tmp = `${path}.${process.pid}.tmp`; // PID-suffixed temp name avoids collisions writeFileSync(tmp, next, "utf8"); renameSync(tmp, path); // rename within one filesystem is atomic}// Expected output (one line):// {"ts":"2026-07-02T04:15:09+09:00","lineage":"site-a-premium-fri",// "models":["claude-sonnet-5"],"source":"both","exitCode":0}
The essential habit: write the record on success, not just on failure. A failure-only log captures nothing about events that succeed while changing — which is exactly what a default switch is.
Matching Observations Against Expectations — Pin Versus Follow
With records accumulating, each run compares its observation against a per-lineage policy. I kept the policy space to exactly two options: pin (this lineage must run on this model; anything else is a fault) and follow-default (ride the default, but tell me the moment it moves). Verdicts are three-valued — ok, drift, unknown — and unknown (no model observed at all) fails closed into an investigation queue.
Treating drift differently per policy is the operational core. A drift under pin means broken config or a stray environment variable, so that lineage's artifacts get quarantined before publication. A drift under follow-default is expected life, so nothing halts — but the notification always fires and that day's artifacts get one extra review pass. On July 2, all three of my unpinned lineages emitted exactly that one-line notice, and "Default moved: → claude-sonnet-5" turned what would have been days of confused diffing into a calm observation exercise.
What a Week of Observation Showed
I deliberately left the three lineages on follow-default for a week and compared against the pre-switch records. These are my local observations, not benchmarks, but the direction was unambiguous.
Metric
Before the switch (2-week average)
After Sonnet 5 became default (measured from 7/2)
Read
Wall time per run
~6m 10s
~4m 50s
Roughly 20% faster; fewer tool-call round trips
Artifact length variance
~±12%
~±19%
More structural freedom, more spread
Quality-gate rejections
~2 per week
1 per week
Fewer dropped instructions
On balance a clear improvement — but the length variance nearly tripped a downstream quality-gate threshold. Without the model trail, I would have spent days combing prompt diffs for a cause that wasn't there. One recorded line replaced that entire investigation.
Pin or Follow? The Three Axes I Used to Decide
Finally, decide per lineage whether to switch to pinning. I used three axes.
Price is time-limited. Sonnet 5's intro pricing ends August 31; from September 1 it returns to $3 input / $15 output per million tokens. Make permanent decisions at standard pricing and treat the intro window as an observation bonus, not a baseline.
Match stability requirements to the lineage. Lineages that generate published artifacts lean pin; lineages whose output passes through human review lean follow. And since behavior can shift even under an unchanged model ID, pairing this with the startup canary from When the Same Model Name Starts Behaving Differently — Catching Drift with a Boot-Time Canary closes the remaining gap.
My outcome: the two lineages that write published artifacts are now pinned explicitly to claude-sonnet-5 — pinning to the same model as the default still matters, because it moves the decision from the platform's hands into mine — and the third stays on follow-default with notifications. If you're on a team account, note that organization-level model restrictions are rolling out as well; add "admin restriction" as a fourth channel in your inventory table.
The First Step
Start tonight: from your very next run, record the model name — one line, from either the result JSON or the transcript. The three modules above can wait for the weekend; the evidence trail can't be backfilled. This won't be the last default-model change, and the position you want to be in next time is the quiet one: "every lineage, we can say exactly which model it ran on."
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.