Which Model Ran Last Night's Unattended Session? Building Model Attribution and Default-Drift Detection After the Sonnet 5 Switch

Claude Code's default model switched to Sonnet 5, and unpinned headless runs changed models silently. Here is a working design for extracting the actual model from run output, appending an atomic run record, and deciding per task lineage whether to pin or follow the default.

Claude Code¹⁷⁶ Sonnet 5² unattended automation² headless¹² model management scheduled tasks⁸

✦ Premium Article

On July 2, Claude Code's default model switched to Claude Sonnet 5. In interactive use you notice immediately — the model name is right there on screen. The sessions that worry me are the unattended ones launched without --model. They raise no errors. The logs look completely normal. And yet a different model wrote last night's code than the night before.

As an indie developer who runs article pipelines for several sites on scheduled Claude Code sessions here at Dolice Labs, the first thing I checked that morning was which task lineages had been riding the default. Then came the uncomfortable discovery: my run logs had never recorded the model, so I couldn't strictly prove which model my July 1 runs had used. If output quality shifted, I had no evidence to attribute it to the model or to my own prompt changes. This article is the design I built out of that morning: per-run model attribution plus default-drift detection.

Why a Default-Model Change Is the Nastiest Failure Mode in Unattended Runs

Model retirements and permission errors fail loudly. You can catch them with retries and fallbacks, and there are established patterns for it — I covered mine in Designing a Three-Tier fallbackModel Setup for Claude Code.

A default-model change, by contrast, succeeds loudly. Exit code 0, artifacts generated, everything green. What changes are the slow-burn properties: tone, structure, latency, unit cost. To be clear, I welcome this particular switch — Sonnet 5 ships with intro pricing ($2 per million input tokens and $10 per million output tokens through August 31, 2026, then $3/$15) and stronger planning and tool use. The problem isn't the model. The problem is being unable to state, from your own records, when each task switched and to what. Without that, root-cause analysis is permanently broken.

Take Inventory of Every Place a Model Gets Decided

Before adding any tooling, map where the model is actually being chosen in your environment. Claude Code accepts the setting through several channels, and if you don't know which one wins for each task, even good records will mislead you.

Channel	Example	Scope	Unattended-run caveat
CLI flag	`claude -p --model claude-sonnet-5`	That invocation only	Easiest to audit — it's visible in the launch command
`model` in settings.json	`"model": "claude-opus-4-8"`	All sessions in that project	Scattered across repos, easy to miss during inventory
Environment variable	`ANTHROPIC_MODEL`	Every run in that shell environment	Hides inside cron or runner config; hardest to spot
Unspecified (default)	—	Every session with none of the above	When the default moves, all of these move together — as they did this week

I wrote one line per task lineage stating which channel decides its model. Of my nine lineages, six were already pinned via flag or settings.json; three were riding the default — and those three had been running on Sonnet 5 since the morning of July 2 without telling me.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If you couldn't tell whether a batch job's output changed because of the model or because of your prompt, you'll be able to answer in minutes with a per-run model trail

✦You'll take home working TypeScript that defensively extracts model IDs from both the headless JSON result and the transcript, appends records atomically, and judges drift

✦You'll be able to decide pin-versus-follow for each task lineage using three concrete axes: the intro-pricing deadline, deprecation ownership, and behavioral stability requirements

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Extracting the Actual Model From Run Output — Defensively, From Two Sources

Next, record the model that actually ran, on every execution. The result object from headless runs (claude -p --output-format json) carries usage information, but its exact field layout can shift between versions. I deliberately avoid betting on a single field and probe candidates in order. The session transcript (JSONL) is my second source, since assistant events carry the model ID.

// extract-models.ts — defensively extract model IDs from run output
// Source 1: the result object from `claude -p --output-format json`
// Source 2: the session transcript JSONL (assistant events)
import { readFileSync } from "node:fs";
 
export function modelsFromResultJson(raw: string): string[] {
  const found = new Set<string>();
  try {
    const r = JSON.parse(raw);
    // Candidate 1: per-model usage breakdown (keys are model IDs)
    if (r && typeof r.modelUsage === "object" && r.modelUsage !== null) {
      for (const k of Object.keys(r.modelUsage)) found.add(k);
    }
    // Candidate 2: a flat `model` field
    if (typeof r.model === "string") found.add(r.model);
  } catch {
    // Not JSON — defer to the transcript, but never swallow silently
    console.error("Failed to parse result JSON; falling back to transcript");
  }
  return [...found];
}
 
export function modelsFromTranscript(path: string): string[] {
  const found = new Set<string>();
  for (const line of readFileSync(path, "utf8").split("\n")) {
    if (!line.trim()) continue;
    try {
      const e = JSON.parse(line);
      const m = e?.message?.model; // model ID on assistant events
      if (typeof m === "string" && m.startsWith("claude-")) found.add(m);
    } catch { /* skip corrupt lines (partial-write protection) */ }
  }
  return [...found];
}
 
// Expected output:
//   from result JSON → ["claude-sonnet-5"]
//   from transcript  → ["claude-sonnet-5", "claude-haiku-4-5"]  // subagents can mix in a second model

Two implementation details matter here. First, the return type is an array, not a string. Sessions that spawn subagents or background tasks can legitimately involve more than one model, and a "single primary model" assumption silently drops that information. Second, when the two sources disagree, I record both rather than throwing. The extractor's job is evidence, not arbitration — a human can reconcile later.

An Append-Only Run Record That Survives Crashes

Each extracted model list gets appended, one line per run, to a per-lineage record file. Unattended environments must assume the process can die mid-write, so I write to a temp file and rename it into place. Readers never see a half-written line.

// run-ledger.ts — atomic append to the run record
import { readFileSync, writeFileSync, renameSync, existsSync } from "node:fs";
 
export interface RunRecord {
  ts: string;          // ISO 8601, normalized to one timezone
  lineage: string;     // task lineage ID, e.g. "site-a-premium-fri"
  models: string[];    // model IDs actually observed
  source: "result" | "transcript" | "both";
  exitCode: number;
}
 
export function appendRun(path: string, rec: RunRecord): void {
  const prev = existsSync(path) ? readFileSync(path, "utf8") : "";
  const next = prev + JSON.stringify(rec) + "\n";
  const tmp = `${path}.${process.pid}.tmp`; // PID-suffixed temp name avoids collisions
  writeFileSync(tmp, next, "utf8");
  renameSync(tmp, path); // rename within one filesystem is atomic
}
 
// Expected output (one line):
// {"ts":"2026-07-02T04:15:09+09:00","lineage":"site-a-premium-fri",
//  "models":["claude-sonnet-5"],"source":"both","exitCode":0}

The essential habit: write the record on success, not just on failure. A failure-only log captures nothing about events that succeed while changing — which is exactly what a default switch is.

Matching Observations Against Expectations — Pin Versus Follow

With records accumulating, each run compares its observation against a per-lineage policy. I kept the policy space to exactly two options: pin (this lineage must run on this model; anything else is a fault) and follow-default (ride the default, but tell me the moment it moves). Verdicts are three-valued — ok, drift, unknown — and unknown (no model observed at all) fails closed into an investigation queue.

// drift-check.ts — compare policy against observed models
export type Policy =
  | { kind: "pin"; model: string }
  | { kind: "follow-default" };
 
export type Verdict = "ok" | "drift" | "unknown";
 
export function judge(
  policy: Policy,
  observed: string[],
  previous: string[] | null
): { verdict: Verdict; note: string } {
  if (observed.length === 0) {
    return { verdict: "unknown", note: "No model observed — suspect extractor breakage or an output-format change" };
  }
  if (policy.kind === "pin") {
    return observed.includes(policy.model)
      ? { verdict: "ok", note: `Pinned as expected: ${policy.model}` }
      : { verdict: "drift", note: `Pin ${policy.model} violated; observed ${observed.join(",")}` };
  }
  // follow-default: alert (but don't halt) when the primary model moves
  if (previous && previous[0] && observed[0] !== previous[0]) {
    return { verdict: "drift", note: `Default moved: ${previous[0]} → ${observed[0]}` };
  }
  return { verdict: "ok", note: "Following default" };
}
 
// Expected behavior:
//   pin claude-opus-4-8, observed ["claude-sonnet-5"] → drift (alert + quarantine today's artifacts)
//   follow-default, previous ["claude-opus-4-x"] → observed ["claude-sonnet-5"] → drift (notify only)

Treating drift differently per policy is the operational core. A drift under pin means broken config or a stray environment variable, so that lineage's artifacts get quarantined before publication. A drift under follow-default is expected life, so nothing halts — but the notification always fires and that day's artifacts get one extra review pass. On July 2, all three of my unpinned lineages emitted exactly that one-line notice, and "Default moved: → claude-sonnet-5" turned what would have been days of confused diffing into a calm observation exercise.

What a Week of Observation Showed

I deliberately left the three lineages on follow-default for a week and compared against the pre-switch records. These are my local observations, not benchmarks, but the direction was unambiguous.

Metric	Before the switch (2-week average)	After Sonnet 5 became default (measured from 7/2)	Read
Wall time per run	~6m 10s	~4m 50s	Roughly 20% faster; fewer tool-call round trips
Artifact length variance	~±12%	~±19%	More structural freedom, more spread
Quality-gate rejections	~2 per week	1 per week	Fewer dropped instructions

On balance a clear improvement — but the length variance nearly tripped a downstream quality-gate threshold. Without the model trail, I would have spent days combing prompt diffs for a cause that wasn't there. One recorded line replaced that entire investigation.

Pin or Follow? The Three Axes I Used to Decide

Finally, decide per lineage whether to switch to pinning. I used three axes.

Price is time-limited. Sonnet 5's intro pricing ends August 31; from September 1 it returns to $3 input / $15 output per million tokens. Make permanent decisions at standard pricing and treat the intro window as an observation bonus, not a baseline.
Pinning transfers deprecation ownership to you. The moment you pin, tracking that model ID's lifecycle becomes your job. Pinning a lineage that has no tracking in place is a slow-motion outage. The allowlist approach from Stopping Auto-Upgrades From Draining Your Credits — Pinning Execution Models with enforceAvailableModels transfers directly.
Match stability requirements to the lineage. Lineages that generate published artifacts lean pin; lineages whose output passes through human review lean follow. And since behavior can shift even under an unchanged model ID, pairing this with the startup canary from When the Same Model Name Starts Behaving Differently — Catching Drift with a Boot-Time Canary closes the remaining gap.

My outcome: the two lineages that write published artifacts are now pinned explicitly to claude-sonnet-5 — pinning to the same model as the default still matters, because it moves the decision from the platform's hands into mine — and the third stays on follow-default with notifications. If you're on a team account, note that organization-level model restrictions are rolling out as well; add "admin restriction" as a fourth channel in your inventory table.

The First Step

Start tonight: from your very next run, record the model name — one line, from either the result JSON or the transcript. The three modules above can wait for the weekend; the evidence trail can't be backfilled. This won't be the last default-model change, and the position you want to be in next time is the quiet one: "every lineage, we can say exactly which model it ran on."

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.