CLAUDE LABJP
BILLING — The Jun 15 change that would have moved Agent SDK, headless runs, GitHub Actions, and third-party agents to separate monthly credits has been pulled; that usage stays within your subscription limitsMANAGED — Code w/ Claude introduced Managed Agents that run in a sandbox you control and connect to your private MCP servers, keeping both execution and reachable services inside enterprise boundariesLIMITS — The same conference doubled Claude Code rate limits and raised API limits, giving multi-stage agent workflows more headroomSUBAGENTS — Claude Code adds nested sub-agents that can spawn their own agents, plus a safe mode that isolates broken configurationsEXPORT — Fable 5 and Mythos 5 remain suspended under a US export-control directive (since Jun 12); every other model including Opus, Sonnet, and Haiku runs normallyCODE — Claude Code keeps shipping updates: improvements to /doctor, Remote Control, and /bug, plus expanded fallback modelsBILLING — The Jun 15 change that would have moved Agent SDK, headless runs, GitHub Actions, and third-party agents to separate monthly credits has been pulled; that usage stays within your subscription limitsMANAGED — Code w/ Claude introduced Managed Agents that run in a sandbox you control and connect to your private MCP servers, keeping both execution and reachable services inside enterprise boundariesLIMITS — The same conference doubled Claude Code rate limits and raised API limits, giving multi-stage agent workflows more headroomSUBAGENTS — Claude Code adds nested sub-agents that can spawn their own agents, plus a safe mode that isolates broken configurationsEXPORT — Fable 5 and Mythos 5 remain suspended under a US export-control directive (since Jun 12); every other model including Opus, Sonnet, and Haiku runs normallyCODE — Claude Code keeps shipping updates: improvements to /doctor, Remote Control, and /bug, plus expanded fallback models
Articles/API & SDK
API & SDK/2026-06-17Advanced

Making the Numbers Add Up in a Multi-Tenant Claude API SaaS — Field Notes on Isolation and Cost Attribution

The first thing that breaks when you make a Claude API SaaS multi-tenant is the month-end reconciliation. Here are field notes on a single metering chokepoint, atomic counters, reconciling against Anthropic's bill, and proving tenant isolation with adversarial tests — with production TypeScript.

Claude API75multi-tenant2SaaS15cost attributionrate limitingRow Level SecurityTypeScript16production95

Premium Article

If you run several products at once, there's always a moment when a Claude API SaaS graduates from "it works" to "I can bill customers for it." The first thing that breaks at that moment isn't a feature or latency. It's the month-end math.

Your Anthropic invoice says $312, but the per-tenant amounts you've accumulated sum to $270. You can no longer tell whose usage the missing $42 was. This "the totals don't match" state quietly undermines your entire pricing model: if you don't know your cost of goods, you can't know the price that turns a profit.

These are field notes on implementing tenant isolation and cost attribution at the level of "reconcilable," not "roughly right" — with production TypeScript and the operational calls you'll be forced to make along the way. I assume Next.js + PostgreSQL + Redis (or Cloudflare KV), but the reasoning is stack-agnostic.

Why "roughly right" eventually collapses

For an ordinary web API, cost behaves like a fixed server bill, and a request log is enough to see who hit what. Claude API is different. Cost is driven by tokens, not request counts, and the unit cost of a single request swings from cents to dollars depending on prompt length, model, and how well caching lands.

Approximate that variability as "request count × average price" and you will systematically underbill your heavy users. The tenant sending long prompts has the most expensive requests, and averaging erases exactly that. You end up undercounting the customer you should be charging most.

So the starting point is simple: take the usage block that's in every Claude API response and attribute it, on the spot, to the tenant that caused the request. Never fill gaps with estimates. Record only the real numbers the response gives you. Whether you can hold that line determines whether month-end reconciliation is even possible.

Make metering inseparable from the API call

The single biggest reason the books stop balancing is forgotten metering. If every new feature calls anthropic.messages.create directly, each one needs metering bolted on, and eventually one won't get it. The unmetered path's cost vanishes silently and reappears as a discrepancy at month-end.

The fix is to funnel every path to Claude through one function. Build a single function that takes a tenant context, and make it the only way anything in the app can reach Claude.

// lib/claude-client.ts
import Anthropic from '@anthropic-ai/sdk';
import type { TenantContext } from '@/types/tenant';
import { checkRateLimit } from '@/lib/rate-limiter';
import { recordUsage } from '@/lib/usage-tracker';
 
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
 
export interface ClaudeRequestParams {
  model: string;
  messages: Anthropic.MessageParam[];
  system?: string;
  maxTokens?: number;
}
 
export async function callClaudeForTenant(
  tenant: TenantContext,
  params: ClaudeRequestParams,
): Promise<Anthropic.Message> {
  // 1. Rate limit first. Reject here and you spend zero tokens.
  if (!(await checkRateLimit(tenant.tenantId, tenant.requestsPerMinuteLimit))) {
    throw new Error(`RATE_LIMIT_EXCEEDED:${tenant.tenantId}`);
  }
 
  const message = await anthropic.messages.create({
    model: params.model,
    max_tokens: params.maxTokens ?? 4096,
    system: params.system,
    messages: params.messages,
  });
 
  // 2. Don't await metering. A failure must not block the response — but must not be swallowed.
  recordUsage(tenant.tenantId, {
    inputTokens: message.usage.input_tokens,
    outputTokens: message.usage.output_tokens,
    cacheReadTokens: message.usage.cache_read_input_tokens ?? 0,
    cacheWriteTokens: message.usage.cache_creation_input_tokens ?? 0,
    model: params.model,
    requestId: message.id,
  }).catch((err) => {
    // End this with console.error alone and nobody will notice the metering gap.
    reportCriticalError('usage_tracking_failed', { tenantId: tenant.tenantId, err });
  });
 
  return message;
}

Two things matter here. First, pull cache_read_input_tokens and cache_creation_input_tokens out of usage. Cache reads cost less than normal input; cache writes cost slightly more. Drop the distinction and you'll systematically misprice tenants who lean on caching.

Second, never let a metering failure end at console.error. Running four sites on autopilot as an indie developer, I've lived through a gap that "was in the logs" but that nobody was watching. Metering failures are quieter than feature bugs, and they map directly to money. They belong on a channel a human can't miss — Sentry, Slack, whatever you actually read.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A metering design that reconciles per-tenant cost against Anthropic's actual invoice within a few percent
A decision rule for the unavoidable race between budget checks and API calls — soft limits vs. atomic decrement
How to prove tenant isolation actually works by wiring an adversarial test into CI
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-05-30
Continuing past max_tokens in the Claude API without duplicated text or broken code fences
Detect stop_reason: max_tokens, continue the generation with an assistant prefill, and stitch the parts back together without duplicated seams or broken code fences. A production-tested continuation pattern in TypeScript.
API & SDK2026-06-16
Keep a Decision Rationale Ledger for Autonomous Agents — So You Can Explain 'Why' Later
When an autonomous agent takes hard-to-reverse actions like a production deploy or a bulk delete, capture the chosen option, rejected alternatives, and assumptions in a structured ledger. Includes structured output, an append-only log, and tiering by impact.
API & SDK2026-06-13
Claude API Python Advanced Cookbook: 20 Production Patterns You'll Actually Use
20 battle-tested Python patterns for the Claude API—retry logic, parallel processing, cost optimization, testing, and monitoring. Copy-paste ready code recipes.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →