⬡ API & SDK/2026-07-03Advanced

A 40% Lower Price Doesn't Mean a 40% Lower Bill — Measuring the Opus 4.8 to Sonnet 5 Migration by Cost per Completed Task

Sonnet 5's intro pricing looks ~40% cheaper than Opus 4.8, yet extra tool turns can flip the math. Working TypeScript for consumption vectors, a paired-run harness, and break-even turn counts.

Claude API¹⁰¹ Sonnet 5³ cost engineering model migration TypeScript²¹

✦ Premium Article

On July 2, Claude Sonnet 5 became the default model across plans, with introductory pricing of $2 per million input tokens and $10 per million output tokens. Next to Opus 4.8 at $5/$25, that is roughly 40% cheaper at standard rates and about 60% cheaper during the intro window. I switched the overnight batches for the blogs I run that same evening and opened the next morning's cost ledger expecting a satisfying drop.

The drop was about 18%. On a model that costs 60% less per token.

Cross-referencing the usage logs told the story: on my tool-loop tasks, the median turn count had risen from 5 to 7, and those two extra turns inflated input tokens far more than intuition suggests. If you judge a migration by the price table alone, this effect stays invisible until the invoice arrives.

This piece builds a different yardstick — cost per completed task — as one continuous design: recording consumption vectors, running old and new models side by side, and solving for the break-even turn count. It is a small mechanism, the kind an indie developer can bolt on in an afternoon, but it changes the quality of the migration decision noticeably.

Per-task cost is a dot product, not a price

What you pay per task is the dot product of a price vector and a consumption vector.

Component	Price side	Consumption side
Input	$/MTok (input)	Total input tokens sent until the task completed
Output	$/MTok (output)	Total generated tokens
Cache reads	Much cheaper read rate	Input tokens served from cache
Retries	—	Every component spent on failed attempts

Swapping models swaps the price vector instantly — but it changes the consumption vector too. Sonnet 5 is positioned as the most agentic Sonnet yet, with stronger planning and tool use, and in practice it does not call tools the same number of times or produce the same output length as Opus 4.8 on identical tasks. Some task families consume less, some consume more. Which means the sign of your savings cannot, even in principle, be read off the price table.

Turn count inflates input tokens quadratically

Each turn of a tool loop resends the whole conversation as input. With S for the system prompt plus initial context and d for the history added per round trip (tool_result plus the previous assistant output), the total input for an n-turn task is approximately:

total input ≈ n×S + d×(0 + 1 + ... + (n-1)) = n×S + d×n(n-1)/2

The second term grows with the square of n. Here are real dollars for a shape close to my link-checking agent — S = 3,000, d = 1,200 (an 800-token tool_result plus 400 tokens of prior output), 400 output tokens per turn:

Model and price	4 turns	6 turns	vs. Opus 4.8 at 4 turns
Opus 4.8 ($5/$25)	$0.136	$0.240	baseline / +76%
Sonnet 5 intro ($2/$10)	$0.054	$0.096	-60% / -29%
Sonnet 5 standard ($3/$15)	$0.082	$0.144	-40% / +6%

At the same 4 turns, the discount tracks the price sheet exactly: 60% and 40%. Add two turns after the migration, though, and the intro-price saving shrinks to 29% — and at standard pricing, effective September 1, the task costs 6% more than it did on Opus 4.8. "We moved to the 40% cheaper model and the bill went up" is ordinary arithmetic for this task shape. Prompt caching softens the quadratic slope, but caches are scoped per model, so you cannot count on hits right after a switch — the dynamics I covered in the prompt-cache rewarm design for the Opus 4.8 to Sonnet 5 cutover.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦You'll be able to trace 'the price table says cheaper, but the bill barely moved' back to input tokens growing with the square of turn count

✦You can drop in a paired-run harness that runs the same task on both models and captures per-task effective cost and consumption profiles

✦You'll learn how to solve for the break-even turn count from your own prices and task shape, and make migration calls per task family

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

A minimal layer for recording consumption vectors

Everything you need is already in response.usage; the only missing piece is aggregation per task. Accumulate each turn's usage under a task ID:

// consumption.ts — accumulate a per-task consumption vector
export interface ConsumptionVector {
  taskId: string;
  model: string;
  turns: number;            // API calls, including tool round trips
  inputTokens: number;      // non-cached input
  outputTokens: number;
  cacheReadTokens: number;  // cache_read_input_tokens
  cacheWriteTokens: number; // cache_creation_input_tokens
  retries: number;          // failed attempts that were re-run
  completed: boolean;       // passed the success gate
}
 
export function emptyVector(taskId: string, model: string): ConsumptionVector {
  return { taskId, model, turns: 0, inputTokens: 0, outputTokens: 0,
           cacheReadTokens: 0, cacheWriteTokens: 0, retries: 0, completed: false };
}
 
// Add one turn's usage. If a response is missing usage, mark the task
// as "consumption unknown" so downstream aggregation can exclude it.
export function addTurn(v: ConsumptionVector, usage: {
  input_tokens: number; output_tokens: number;
  cache_read_input_tokens?: number; cache_creation_input_tokens?: number;
}): ConsumptionVector {
  return {
    ...v,
    turns: v.turns + 1,
    inputTokens: v.inputTokens + usage.input_tokens,
    outputTokens: v.outputTokens + usage.output_tokens,
    cacheReadTokens: v.cacheReadTokens + (usage.cache_read_input_tokens ?? 0),
    cacheWriteTokens: v.cacheWriteTokens + (usage.cache_creation_input_tokens ?? 0),
  };
}

One operational habit worth adopting: retries accrue to the same taskId, not to a new one. A failed attempt is money you spent to get that task done. Book retries separately and you end up with the upside-down conclusion that failure-prone models are cheap.

A paired-run harness for old and new models

Next, run the identical task spec on both models and collect consumption vectors in pairs. The essential piece is a success gate — a predicate that decides whether the task actually completed. Fast and cheap but unfinished is a cost, not a result.

// paired-run.ts — side-by-side measurement across models
import { ConsumptionVector, emptyVector, addTurn } from "./consumption";
 
interface TaskSpec {
  id: string;
  run: (model: string, onUsage: (u: any) => void) => Promise<unknown>;
  succeeded: (result: unknown) => boolean; // success gate
  maxRetries: number;
}
 
export async function measureOnce(
  spec: TaskSpec, model: string
): Promise<ConsumptionVector> {
  let v = emptyVector(spec.id, model);
  for (let attempt = 0; attempt <= spec.maxRetries; attempt++) {
    if (attempt > 0) v = { ...v, retries: v.retries + 1 };
    try {
      const result = await spec.run(model, (usage) => { v = addTurn(v, usage); });
      if (spec.succeeded(result)) return { ...v, completed: true };
    } catch { /* count the failure as a retry and keep going */ }
  }
  return v; // returns with completed: false; aggregate it separately
}
 
// Run the same spec N times per model, interleaved to avoid time-of-day bias
export async function pairedRun(
  spec: TaskSpec, models: [string, string], n: number
): Promise<ConsumptionVector[]> {
  const out: ConsumptionVector[] = [];
  for (let i = 0; i < n; i++) {
    for (const m of models) out.push(await measureOnce(spec, m));
  }
  return out;
}

Ten runs per task family is enough to see the shape. I sampled from the real overnight jobs behind the Dolice Labs sites rather than writing synthetic benchmarks — measuring your own production tasks is the whole point. Skip that and you get the classic outcome: winning the benchmark and losing the invoice.

Solve for the break-even turn count in advance

While the paired runs accumulate, you can also draw the theoretical defense line. "How many turns can the new model afford before it stops being cheaper?" depends only on the task shape (S, d, output per turn) and both price vectors:

// break-even.ts — how many extra turns a migration can tolerate
interface Price { inPerMTok: number; outPerMTok: number; }
interface Shape { base: number; growthPerTurn: number; outPerTurn: number; }
 
function taskCost(p: Price, s: Shape, turns: number): number {
  const input = turns * s.base + s.growthPerTurn * (turns * (turns - 1)) / 2;
  const output = turns * s.outPerTurn;
  return (input * p.inPerMTok + output * p.outPerMTok) / 1e6;
}
 
// Find the turn count where the new model matches the old model's cost
export function breakEvenTurns(
  oldP: Price, newP: Price, s: Shape, baselineTurns: number
): number {
  const ceiling = taskCost(oldP, s, baselineTurns);
  let lo = baselineTurns, hi = baselineTurns * 4;
  for (let i = 0; i < 40; i++) {
    const mid = (lo + hi) / 2;
    taskCost(newP, s, mid) < ceiling ? (lo = mid) : (hi = mid);
  }
  return lo;
}
 
const shape = { base: 3000, growthPerTurn: 1200, outPerTurn: 400 };
const opus = { inPerMTok: 5, outPerMTok: 25 };
console.log(breakEvenTurns(opus, { inPerMTok: 2, outPerMTok: 10 }, shape, 4)); // ≈ 7.6
console.log(breakEvenTurns(opus, { inPerMTok: 3, outPerMTok: 15 }, shape, 4)); // ≈ 5.7

For this shape, the tolerance is about 7.6 turns while intro pricing lasts and about 5.7 turns at standard pricing. Put differently: any task family whose median exceeds 6 turns at standard rates loses money on this migration despite the 40% price cut. Intro pricing ends August 31, so every permanent decision should also be computed at standard rates — the same discipline as in the effective-dated cost forecast for the Sonnet 5 intro-price expiry.

A week of paired runs: the answer split by task family

Here is what a week of side-by-side runs produced across three families. Costs are per completed task, retries included, incomplete runs excluded and tracked separately.

Task family	Median turns (Opus → Sonnet 5)	Per-task cost change	Decision
Draft generation (single-shot, no tools)	1 → 1	-59%	Migrate immediately
Internal link verification (tool loop)	5 → 7	-18%	Migrate after prompt rework
Tag classification (short-output batch)	1 → 1 (output -22%)	-63%	Migrate immediately

The single-shot families dropped almost exactly by the price ratio: when the consumption vector holds still, the discount passes straight through. The link checker was the interesting one — Sonnet 5 tended to decompose the verification into finer steps, adding turns. Writing the ceiling into the spec ("verify up to three links per response, five round trips maximum") brought the median back to 5 turns and improved the per-task cost to -41%. My main takeaway: as the model gets better at planning, you have to state your loop budget explicitly, or it will spend the extra diligence on your dime.

Pitfalls to defuse before you trust the numbers

The aggregation design trips people up more than the code does. In the order I hit them:

Never average incomplete tasks in. Consumption with completed: false belongs to a separate metric (wasted-shot rate). Mixing it in makes failure-prone models look cheap.
Keep cache components out of the input column. Caches are cold right after a switch, so the first week's effective cost reads high versus steady state. Holding cacheReadTokens separately lets you recompute once things warm up.
Align max_tokens and retry policy across both models. Otherwise you are measuring configuration differences, not model differences.
Do not let a price change straddle your measurement window. With the intro price ending August 31, a window that crosses that date mixes consumption changes with price changes. Cut windows where the price is constant.

Wrap-up — the first step is a turn-count histogram

Before any migration decision, pull the median and distribution of turn counts per task family from the usage logs you already have. That single query separates the families where the sticker discount applies as-is (one or two turns) from the ones where the consumption profile can flip the sign (five turns and up). The paired-run harness can come after. Since adopting this order, a model migration decision takes me half a day instead of a week of second-guessing.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.