●MODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic work●CODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retries●CODE — CPU use during streaming drops about 37%, improving stability on long-running sessions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●SECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentials●POLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agencies●MODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic work●CODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retries●CODE — CPU use during streaming drops about 37%, improving stability on long-running sessions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●SECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentials●POLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agencies
A Fail-Closed Model Pricing Registry So New Models Don't Quietly Break Your Cost Math
When Opus 4.8 and Haiku 4.5 landed in the Messages API, rates scattered across my code silently skewed the cost rollup. Here is how to centralize per-model rates and fail closed on unknown models, with complete working code.
The morning after Opus 4.8 and Haiku 4.5 arrived in the Messages API, my overnight cost rollup was quietly wrong. Not a single error was logged. The day-over-day chart just looked oddly low.
The cause was simple. Requests to the new model IDs succeeded and were billed correctly, but my aggregation script's rate table had no entry for those IDs, so that traffic slid through priced at zero. A crash would have been kinder; the worst failure is the one that under-reports in silence.
As an indie developer running automated posting across several sites, a new model is something I want to try and something that comes to break my accounting at the same time. Here is how to stop that particular failure for good, with code you can run.
Why Scattered Rates Break
When I first wrote this, I had if (model === "...") rate = ... inlined in every place that needed a cost: the aggregation script, the monthly billing reconcile, and the dashboard. Three copies.
That shape causes no trouble at all while the model set is fixed. It breaks at exactly one moment: when a new model ID appears. And breaking at that moment is the worst possible timing, because you usually try a new model precisely when you are watching costs closely.
The problems with scattered rates fall into three buckets.
Symptom
What happens
How hidden
Unknown model priced at 0
New-model traffic aggregates as zero cost
Very high — no exception is raised
Partial update
One of three copies keeps the old rate
Totals disagree by location
Cache ignored
Cache read/write priced at the input rate
Error grows the more cache you use
What they share is that no exception ever fires. That is exactly why only design can prevent them.
Centralize Rates in One Place
First, make a single source of truth for rates. The key idea is to store a structure keyed by model ID rather than hardcoding headline prices. The rates themselves change; the structure stays stable.
// pricing.ts — the source of truth. No rate is written anywhere else.// Values are USD per million tokens. Always fill from the official page.export interface ModelRate { input: number; // input tokens output: number; // output tokens cacheWrite5m: number; // 5-minute cache write (~1.25x input) cacheWrite1h: number; // 1-hour cache write (~2x input) cacheRead: number; // cache read (~0.1x input)}// Fill these from the current official pricing page before relying on them.// The point is to never paste "numbers from memory" here, since rates move.const RATES: Record<string, ModelRate> = { "claude-opus-4-8": rate(15.0, 75.0), "claude-sonnet-4-6": rate(3.0, 15.0), "claude-haiku-4-5": rate(/* input */ 0, /* output */ 0), // TODO: official values};// Derive cache rates from input/output via fixed multipliers.// The multipliers (5m 1.25x, 1h 2x, read 0.1x) stay stable as prices change.function rate(input: number, output: number): ModelRate { return { input, output, cacheWrite5m: input * 1.25, cacheWrite1h: input * 2.0, cacheRead: input * 0.1, };}export { RATES };
The rate() helper is deliberate. Get the input price right and the cache write/read prices fall out automatically. I once updated the input price and forgot the cache prices, so leaning on derivation gives me one less thing to forget.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦A model-id-keyed pricing registry plus a complete TypeScript cost calculator that reads the usage block, cache tokens included
✦A fail-closed design that refuses to score unknown models as zero, and a CI test that turns a new model launch into a red build instead of an incident
✦Cost attribution that accounts for 5-minute and 1-hour prompt cache multipliers, with the calls that paid off in my own automated posting setup
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The response usage carries the exact token breakdown your bill is based on. Run it through the registry and you get measured cost, not a guess.
// cost.ts — turn usage plus rates into the cost of one requestimport { RATES, ModelRate } from "./pricing";interface Usage { input_tokens: number; output_tokens: number; cache_creation_input_tokens?: number; // cache writes cache_read_input_tokens?: number; // cache reads}const PER = 1_000_000; // divide to per-tokenexport function costOf(model: string, usage: Usage, cacheTtl: "5m" | "1h" = "5m"): number { const r: ModelRate | undefined = RATES[model]; if (!r) { // The crux: do not score unknown models as 0 (implemented next). throw new UnknownModelError(model); } const write = cacheTtl === "1h" ? r.cacheWrite1h : r.cacheWrite5m; return ( (usage.input_tokens * r.input + usage.output_tokens * r.output + (usage.cache_creation_input_tokens ?? 0) * write + (usage.cache_read_input_tokens ?? 0) * r.cacheRead) / PER );}
Pricing cache_creation_input_tokens and cache_read_input_tokens at the input rate inflates the error the more caching you do. Reads are about a tenth of input, so splitting them out cleanly was enough to visibly tighten my rollup.
Don't Let Unknown Model IDs Pass at Zero
This was the heart of the first incident: make an unknown model an exception, not zero dollars. Fail closed. Push the cost to the safe side — the side you can detect, not the side that hides.
export class UnknownModelError extends Error { constructor(public readonly model: string) { super(`Unpriced model: ${model} — add it to pricing.ts`); this.name = "UnknownModelError"; }}// How the aggregation loop absorbs it: one unknown model must not halt// the whole job, but it must always surface (never be swallowed).export function aggregate(records: { model: string; usage: Usage }[]) { let total = 0; const unpriced = new Set<string>(); for (const rec of records) { try { total += costOf(rec.model, rec.usage); } catch (e) { if (e instanceof UnknownModelError) { unpriced.add(e.model); // not passed through as 0 — visible later continue; } throw e; } } return { total, unpriced: [...unpriced] };}
What I care about here is landing between "halt" and "swallow." Crashing an overnight job over a single unknown model is too much. But passing it through at zero takes you right back to the original incident. Returning unpriced alongside the total keeps the state I want: the numbers come out, and the gaps come out with them.
A Test That Turns a New Model Into a Red Build
Once design prevents the failure, build the state where the next new model gets noticed. Collect model IDs from real logs and keep one light test that fails if any of them is missing from the registry.
// pricing.test.ts — every model seen in real logs must be pricedimport { RATES } from "./pricing";// Model IDs pulled from recent request logs (auto-refreshed in ops)import { observedModels } from "./fixtures/observed-models";test("every observed model exists in the pricing registry", () => { const missing = observedModels.filter((m) => !(m in RATES)); expect(missing).toEqual([]); // one unpriced model turns CI red});test("rates are positive and output is at least input", () => { for (const [model, r] of Object.entries(RATES)) { expect(r.input, model).toBeGreaterThan(0); expect(r.output, model).toBeGreaterThanOrEqual(r.input); }});
The second test looks trivial but earns its keep. It is how I caught the haiku-4-5 row I had left at 0 under a TODO. The plain invariant "output should not be cheaper than input" reliably catches copy-paste misses during a rate update.
How Far It Drifts, in Numbers
Abstractions get deprioritized, so here is the order of magnitude. Say one overnight job uses 500k input and 80k output tokens, and 30% of that runs on a newly added, unpriced model. If the unpriced portion slides through at zero, that run reports roughly 30% under its true cost.
In my case, what hurt was less the dollar figure than the broken day-over-day comparison. On a day I tried a new model, usage actually went up, yet the chart showed it going down. When the sign of the change flips, people stop noticing the anomaly. Once I failed closed and surfaced unpriced models, a non-empty unpriced became the alert itself, and I could go straight to the cause instead of second-guessing the numbers.
Small Calls That Paid Off
A few decisions around pricing don't show up in any number but made operations easier.
One is keeping aliases (moving names like claude-3-5-haiku-latest) out of aggregation. The billed response returns a resolved, concrete model ID, so I made that the source of truth. The rule is simple: reconcile against the ID the response declares, not what you asked for at request time.
The other is treating a rate update as a code change with history. Rates move for external reasons, and if the change leaves no record, a later recomputation of past cost will not reconcile. I commit pricing.ts edits with an effective-date comment and compute historical traffic at the rate in force then.
Decision
Reason
Incident avoided
Reconcile on the response's real model ID
Aliases move their target
Old/new model mix-ups
Commit rate changes with an effective date
Rates move for external reasons
Mismatched historical recompute
Fail closed plus surface it
Pass-through zero is the real danger
Silent under-reporting
Your Next Step
If your rates are inlined all over the code today, start by creating a single pricing.ts and centralizing on the response's model field as the key. Just routing everything through one costOf that throws on unknown models means that the next morning a model appears, CI goes red and tells you, instead of the chart drifting in silence.
I am still fixing this as I run it, but the one principle that always holds for cost math is this: if it has to break, let it break visibly rather than quietly. Thank you for reading.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.