CLAUDE LABJP
MODEL — Claude Opus 4.8 improves coding, agentic, and professional work, with consistency for long-running tasksPLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hourSANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP servers (Cloudflare/Daytona/Modal/Vercel)MODEL — Fable 5 (1M-token context, always-on adaptive thinking) was suspended on June 12 under a US export-control directiveLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per taskMCP — Enterprise-managed MCP connectors (Okta) enable zero-touch access (Team/Enterprise beta)MODEL — Claude Opus 4.8 improves coding, agentic, and professional work, with consistency for long-running tasksPLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hourSANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP servers (Cloudflare/Daytona/Modal/Vercel)MODEL — Fable 5 (1M-token context, always-on adaptive thinking) was suspended on June 12 under a US export-control directiveLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per taskMCP — Enterprise-managed MCP connectors (Okta) enable zero-touch access (Team/Enterprise beta)
Articles/API & SDK
API & SDK/2026-06-22Advanced

When Your Claude API Cost Math Doesn't Match the Bill: Accounting for the Four Token Buckets

Turn on prompt caching and your homegrown cost tally drifts from the console bill. Here is how to weight the four token buckets the usage object returns and build a ledger you can reconcile.

Claude API82Cost Management3Prompt Caching5Operations4

Premium Article

As an indie developer, I run a daily digest across several of my own apps on Claude, and one month my own cost tally was off from the console bill by roughly ten percent.

I traced the logs. Request counts were right. Token counts were right. It still didn't add up.

There was exactly one cause. I was computing cost from input_tokens + output_tokens only. The moment I enabled prompt caching, that simple formula broke silently.

Cached tokens are not in input_tokens

This was my first wrong assumption.

The usage object splits input tokens by role. Tokens served from cache are not included in input_tokens. They land in separate fields.

# What usage actually looks like with caching on
usage = {
    "input_tokens": 412,                  # only uncached, regular input
    "cache_creation_input_tokens": 18500, # writes to cache (expensive)
    "cache_read_input_tokens": 17800,     # reads from cache (cheap)
    "output_tokens": 1240,
}

So if you cache a long system prompt, its body never shows up in input_tokens at all. Pricing off input_tokens alone misses the tens of thousands of tokens sitting in the cache.

In my case the shared digest prompt is about 18,000 tokens. It gets written as cache_creation on every cold start, then read back as cache_read on later calls. The naive formula ignored both.

Each bucket bills at a different rate

The key to matching the books is understanding that the four buckets do not share one rate.

Cache reads and writes bill as a multiple of the base input rate. The multipliers are stable; even when prices change, the ratios rarely do.

Bucketusage fieldMultiplier on base input rate
Regular inputinput_tokens1.0×
Cache write (5-min TTL)cache_creation_input_tokens1.25×
Cache write (1-hour TTL)cache_creation_input_tokens2.0×
Cache readcache_read_input_tokens0.1×
Outputoutput_tokensoutput rate (separate)

Reads are a tenth of base input. Writes are 1.25× to 2×. Treat them uniformly and the calls where caching is working drift the most. Over-price the reads and you over-count; price the writes at base and you under-count.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
Why cached tokens never appear in input_tokens, and how that quietly breaks naive cost math
A Python implementation that accounts for cache writes and reads at their correct multipliers
Logging one ledger row per call and reconciling against the console bill each month
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-04-24
Claude API Micro-SaaS Pricing Blueprint — Blending Usage, Subscription, and Freemium for Durable Margins
A practical blueprint for pricing a Claude API powered micro-SaaS: how to reverse-engineer healthy margins from token economics, blend usage-based, subscription, and Freemium models, and launch prices you can adjust without breaking trust.
API & SDK2026-04-24
Giving Claude Agents Long-Term Memory in Production — Seven Pitfalls and the Patterns That Fix Them
A production playbook for Claude agents with long-term memory — seven pitfalls that break memory agents live, and the design patterns that fix each one.
API & SDK2026-03-30
Claude API Pricing Guide 2026 — Complete Cost Breakdown for Every Model, Batch API, and Prompt Caching
A complete guide to Claude API pricing in 2026. Learn the per-token costs for Opus 4.6, Sonnet 4.6, and Haiku 4.5, how to save up to 95% with Batch API and Prompt Caching, and see real-world cost estimates for common use cases.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →