⬡ API & SDK/2026-06-24Advanced

When Stripe's Bill and Your Own Ledger Drift Apart: Field Notes on Metered Billing for Claude API

Usage-based billing for Claude API looks clean until month-end, when Stripe's total and your own usage ledger quietly disagree. Here are the field-tested patterns for idempotent meter events, reconciliation jobs, and pricing-change-proof credits.

claude-api⁷⁰ stripe⁶ metered-billing billing-meters reconciliation

✦ Premium Article

A Claude-API-backed service with usage-based billing looks healthy for the first few weeks. The trouble surfaces at month-end: the amount Stripe finalizes and the "usage this month" figure on your own dashboard disagree — slightly, but reliably. A few yen sometimes, a few hundred for heavy users.

This drift is less a bug than a structural fact: you are writing to two separate ledgers (Stripe's meter and your own usage counter) independently. Rather than trying to drive the difference to zero, these notes are about keeping it detectable and explainable. This isn't a setup walkthrough — it's what actually mattered after running billing in production for about a year.

First: stop using `createUsageRecord`

Until recently, Stripe metered billing meant sending subscriptionItems.createUsageRecord() against a Subscription Item. It still works during the migration window, but new builds should use Billing Meters. The difference looks small and matters a lot operationally: a meter event says "for this customer, increment this metric by this amount" without you having to resolve a Subscription Item ID at all.

// lib/claude-metering.ts
import Anthropic from "@anthropic-ai/sdk";
import Stripe from "stripe";
 
const anthropic = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const stripe = new Stripe(process.env.STRIPE_SECRET_KEY!);
 
// Legacy: stripe.subscriptionItems.createUsageRecord(itemId, {...})  <- don't use for new work
// Modern: send an event to a meter, keyed by customer ID
async function sendMeterEvent(params: {
  stripeCustomerId: string;
  credits: number;
  identifier: string; // idempotency key (below)
}) {
  await stripe.billing.meterEvents.create({
    event_name: "claude_credits",
    identifier: params.identifier,
    payload: {
      stripe_customer_id: params.stripeCustomerId,
      value: String(params.credits),
    },
  });
}

event_name must match the meter you created in the Stripe dashboard. Configure the meter to sum value over the period, and each event accumulates straight into the month's usage.

Don't send tokens — send "credits"

This is the single biggest lever for keeping operations sane. Claude's per-token price differs between input and output, and between models. As of June 2026, the order of magnitude shifts across Sonnet, Opus 4.8, and the higher-tier Fable 5. If you send raw token counts to the meter, every price change rewrites the meaning of past events and your billing logic breaks.

So normalize tokens into your own credit unit before sending. Absorb per-model price differences into a conversion table, and hand Stripe nothing but a credit count. When a new model lands, you add one row to the table and move on.

// One place to hold per-model "round-to-credits" rates.
// Values are approximate as of June 2026; always confirm real prices on the official pricing page.
const MODEL_RATES: Record<string, { inPer1k: number; outPer1k: number }> = {
  "claude-sonnet-4-6": { inPer1k: 0.3, outPer1k: 1.5 },
  "claude-opus-4-8": { inPer1k: 1.5, outPer1k: 7.5 },
  "claude-fable-5": { inPer1k: 3.0, outPer1k: 15.0 },
};
 
// 1 credit = your smallest internal billing unit. Here we anchor it near "~0.1 yen" and integerize.
const YEN_PER_CREDIT = 0.1;
 
function tokensToCredits(model: string, inTok: number, outTok: number): number {
  const r = MODEL_RATES[model] ?? MODEL_RATES["claude-sonnet-4-6"];
  const yen = (inTok / 1000) * r.inPer1k + (outTok / 1000) * r.outPer1k;
  // Round up; we don't eat the remainder, and guarantee at least 1 credit per request.
  return Math.max(1, Math.ceil(yen / YEN_PER_CREDIT));
}

Why round to integers? Meter event values can be fractional, but floating-point rounding differences between Stripe and your own DB make reconciliation ambiguous — you can no longer tell a real discrepancy from a rounding artifact. Anchoring to integer credits means any drift shows up as a count mismatch, which is far easier to trace.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Idempotent Stripe Billing Meter events that prevent both double-counting and silent loss with one mechanism

✦A daily reconciliation job that catches drift between Stripe's aggregation and your own ledger before invoices finalize

✦A 'credits' abstraction that survives price changes like Opus 4.8 and Fable 5 without breaking billing logic

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

One idempotency key stops both double-counting and loss

Billing accidents come in essentially two flavors: sending the same usage twice (overbilling), or thinking you sent it when you didn't (underbilling — money out of your pocket). The Billing Meters identifier field handles both with one design. Stripe ignores events that repeat an identifier, so retries become safe to fire as many times as you like.

The key is making identifier deterministic from the request's natural key. A random UUID produces a new event on every retry and double-counts. I use the internal request ID.

import { createHash } from "crypto";
 
// Deterministic idempotency key from an internal request ID — same value on every retry.
function meterIdentifier(requestId: string): string {
  return "req_" + createHash("sha256").update(requestId).digest("hex").slice(0, 32);
}
 
// Separate "write to our ledger" from "send to Stripe".
export async function recordUsage(req: {
  requestId: string;
  userId: string;
  stripeCustomerId: string;
  model: string;
  inTok: number;
  outTok: number;
}) {
  const credits = tokensToCredits(req.model, req.inTok, req.outTok);
 
  // 1) Commit to our own ledger first — this is the source of truth. Stripe is the secondary aggregator.
  await ledger.insert({
    requestId: req.requestId,
    userId: req.userId,
    credits,
    model: req.model,
    reportedToStripe: false,
    createdAt: new Date(),
  });
 
  // 2) Send to Stripe. If it fails, the ledger still has it, so we can resend later.
  try {
    await sendMeterEvent({
      stripeCustomerId: req.stripeCustomerId,
      credits,
      identifier: meterIdentifier(req.requestId),
    });
    await ledger.markReported(req.requestId);
  } catch (err) {
    // Don't swallow send failures — leave the unreported flag for a retry worker to pick up.
    console.error("[meter send failed]", req.requestId, err);
  }
}

Making your own ledger the source of truth is the load-bearing decision here. If Stripe is your source of truth, data vanishes the moment a send fails. Write to the ledger first, and the Stripe send demotes to a re-runnable derived step. Because identifier is idempotent, resending never double-counts.

With streaming, wait for the moment tokens finalize

This is what tripped me up most. While a response streams with stream: true, the token count isn't final. It's tempting to report mid-stream, but a dropped connection then underreports.

The fix is simple: only finalize and record usage on the stream-completion event. The Anthropic SDK resolves to a final message that includes usage; wait for that.

const stream = anthropic.messages.stream({
  model: "claude-sonnet-4-6",
  max_tokens: 1024,
  messages: [{ role: "user", content: prompt }],
});
 
for await (const event of stream) {
  // UI rendering only here — no billing yet.
}
 
// After completion, pull usage from the final message, then record.
const final = await stream.finalMessage();
await recordUsage({
  requestId,
  userId,
  stripeCustomerId,
  model: final.model,
  inTok: final.usage.input_tokens,
  outTok: final.usage.output_tokens,
});

If you want a live usage bar, optimistically bump a DB counter for display and defer the authoritative Stripe send until the stream finishes. Splitting display from billing gives you both visual immediacy and billing accuracy.

Reconcile within the month — finding out at month-end is too late

Here's the heart of it. Your ledger and Stripe's meter accumulate independently, so you must reconcile periodically and catch drift early. Discover it at month-end and the invoice is already finalizing, leaving few options. I compare, once a day per customer, the ledger's month-to-date total against Stripe's meter aggregation.

// Daily batch: reconcile our ledger against Stripe's aggregation, per customer.
async function reconcileCustomer(stripeCustomerId: string, periodStart: number, periodEnd: number) {
  const ledgerTotal = await ledger.sumCredits(stripeCustomerId, periodStart, periodEnd);
 
  // Stripe meter aggregation for the period (per-meter event summaries).
  const summaries = await stripe.billing.meters.listEventSummaries(METER_ID, {
    customer: stripeCustomerId,
    start_time: periodStart,
    end_time: periodEnd,
  });
  const stripeTotal = summaries.data.reduce((s, x) => s + x.aggregated_value, 0);
 
  const diff = ledgerTotal - stripeTotal;
  if (diff !== 0) {
    // The sign tells you the direction of the cause:
    //   diff > 0 : in our ledger but not at Stripe (failed sends piling up)
    //   diff < 0 : more at Stripe than in our ledger (double-send / broken identifier)
    await alertReconcileDrift({ stripeCustomerId, ledgerTotal, stripeTotal, diff });
  }
  return diff;
}

The sign of the difference is your triage. Ledger higher means dropped sends (recover by replaying the unreported rows); Stripe higher means double-counting (the identifier's determinism broke). Write that one-line sign rule into your runbook and you'll handle a 2 a.m. alert calmly.

When reconciliation finds diff > 0, resend the unreported ledger rows with their identifier. Idempotency means any already-delivered events won't duplicate. That's exactly where the "ledger as source of truth" design pays off.

Put the overuse safety valve on your side, not Stripe's

For users to accept usage-based billing, they need to know it isn't unbounded. Enforce the cap from your real-time ledger counter, not Stripe's aggregation. Stripe's meter rollup lags, which makes it a poor gatekeeper.

What you're deciding	Where to read	Why
Allow this request now? (the cap gate)	Real-time ledger counter	Stripe's rollup lags; using it for instant gating lets users exceed the cap
How much to bill this month (final amount)	Stripe meter aggregation	Stripe is the system of record for billing; the ledger is a control copy
Are we drifting? (audit)	The daily difference between the two	The sign splits the root cause

Separating the roles is the trick: gate = your ledger, billing = Stripe, audit = the difference. Mix these three and lag and rounding all collect in one place, beyond untangling.

The design calls that quietly pay off

I run Stripe memberships across several sites under Dolice Labs, and the part of billing that takes the most care isn't dramatic outages — it's the class of issue that drifts a little at a time without anyone noticing. Once I moved to a ledger-as-source-of-truth setup with daily reconciliation, month-end finalization stopped being a scramble. Even when drift appears, knowing I can read the sign and replay unreported events to converge takes a surprising amount of weight off the operational mind.

Usage-based billing isn't done when it's "implemented correctly." It's only operable once you've designed for drift as a given and kept that drift explainable. A good next step is recording the reconciliation diff as a time-series metric, so you can spot a single user who drifts persistently — that's usually where a hole in your identifier design is hiding.

I hope this gives you solid footing if you're wrestling with the same reconciliation problem.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.