CLAUDE LABJP
MCP — Enterprise-managed MCP connectors arrive: admins provision once, users get zero-touch access on first login (Okta, Team/Enterprise beta)LEGAL — 20+ legal MCP connectors and 12 practice-area plugins ship for research, contracts, and matter managementAGENTS — Code w/ Claude unveils Managed Agents: plan the work, fan out to hundreds of subagents, verify before returningLIMIT — The 5-hour Claude Code rate window is doubled for Pro, Max, Team, and seat-based EnterpriseBILLING — The June 15 Agent SDK credit split was paused; this usage stays within your subscription limitsFIX — Claude Code stability fixes continue: stuck spinners, subagent transcripts, and remote task statusMCP — Enterprise-managed MCP connectors arrive: admins provision once, users get zero-touch access on first login (Okta, Team/Enterprise beta)LEGAL — 20+ legal MCP connectors and 12 practice-area plugins ship for research, contracts, and matter managementAGENTS — Code w/ Claude unveils Managed Agents: plan the work, fan out to hundreds of subagents, verify before returningLIMIT — The 5-hour Claude Code rate window is doubled for Pro, Max, Team, and seat-based EnterpriseBILLING — The June 15 Agent SDK credit split was paused; this usage stays within your subscription limitsFIX — Claude Code stability fixes continue: stuck spinners, subagent transcripts, and remote task status
Articles/API & SDK
API & SDK/2026-06-20Advanced

Putting Cloudflare AI Gateway in Front of Claude Made the Numbers I Needed Disappear — Field Notes on Instrumentation

After putting Cloudflare AI Gateway in front of Claude API, here is where I actually got stung — cost attribution, semantic-cache false hits, fallback quietly lowering quality, and budgets that don't really stop anything — with the code I used to fix each.

Claude API80Cloudflare AI GatewayProduction18Cost Optimization6Semantic CacheFallback2Cloudflare Workers13Observability3

Premium Article

The week after I added the gateway, my cost breakdown stopped making sense

The reasons for putting Cloudflare AI Gateway in front of Claude API usually collapse into four: make requests observable, throttle yourself before you hit the provider's rate limit, cut duplicate calls with caching, and route around a model outage. All legitimate, and the gateway genuinely handles them in a single managed layer.

Yet the week after I, as an indie developer, placed it in front of the content pipeline for my four Dolice Labs sites, I ran into a paradox: the breakdown was harder to read than before. The total-cost and latency graphs came out clean. But "which feature, which batch, used how much" had fallen out of the dashboard. The gateway only brokers traffic, so unless you hand it your own context, every request is recorded as one indistinguishable blob.

This is not a setup guide. It's a record of the four places where, after installing the gateway, I realized it sees and does less than I assumed — cost attribution, cache false hits, fallback's quiet quality drop, and budget enforcement — along with the code and the calls I made.

First, bind instrumentation context to every request

Whether you can filter the gateway logs later is decided at the moment you send the request: did you attach metadata? Whatever you pass in the cf-aig-metadata header lands in the logs, so put every axis you'll want to slice by in there. For me that was three: which site, which generation type, which batch run.

// src/lib/claude-gateway.ts
import Anthropic from "@anthropic-ai/sdk";
 
const GATEWAY_BASE_URL = process.env.CLOUDFLARE_GATEWAY_URL;
const ANTHROPIC_API_KEY = process.env.ANTHROPIC_API_KEY;
 
if (!GATEWAY_BASE_URL || !ANTHROPIC_API_KEY) {
  throw new Error("Both CLOUDFLARE_GATEWAY_URL and ANTHROPIC_API_KEY are required");
}
 
// Just swap baseURL to the gateway; existing SDK calls go through unchanged
const client = new Anthropic({
  apiKey: ANTHROPIC_API_KEY,
  baseURL: GATEWAY_BASE_URL,
});
 
export interface CallContext {
  site: string;       // e.g. "claudelab"
  workload: string;   // e.g. "recovery" / "brushup" / "daily"
  runId: string;      // per-batch identifier
}
 
export async function gatewayChat(
  params: Anthropic.MessageCreateParamsNonStreaming,
  ctx: CallContext
) {
  return client.messages.create(params, {
    headers: {
      // Whatever you set here lands in the AI Gateway logs, sliceable later
      "cf-aig-metadata": JSON.stringify({
        site: ctx.site,
        workload: ctx.workload,
        runId: ctx.runId,
        ts: new Date().toISOString(),
      }),
    },
  });
}

The reason a baseURL swap is enough: the Anthropic TypeScript SDK takes the HTTP destination from baseURL and reuses its auth headers and retry logic as-is. The gateway proxies Anthropic-compatible paths, so your app logic doesn't change by a line. The flip side is that any request where you forget the metadata is recorded as part of the anonymous blob. My logs right after install were exactly that.

To aggregate attribution, hit the Logs API and group by metadata. Only here does "which workload is driving spend" become a number.

// src/lib/gateway-attribution.ts
interface CostByWorkload {
  [workload: string]: { requests: number; tokens: number; usd: number };
}
 
// Rough input prices (USD / 1M tokens, approx. as of 2026-06)
const INPUT_PRICE: Record<string, number> = {
  "claude-haiku-4-5": 1,
  "claude-sonnet-4-6": 3,
  "claude-opus-4-8": 15,
};
 
export async function attributeCost(
  accountId: string,
  gatewayId: string,
  apiToken: string,
  since: Date
): Promise<CostByWorkload> {
  const url =
    `https://api.cloudflare.com/client/v4/accounts/${accountId}` +
    `/ai-gateway/gateways/${gatewayId}/logs?since=${since.toISOString()}`;
 
  const res = await fetch(url, {
    headers: { Authorization: `Bearer ${apiToken}` },
  });
  const { result = [] } = await res.json();
 
  const acc: CostByWorkload = {};
  for (const log of result) {
    const meta = safeParse(log.metadata);
    const key = meta?.workload ?? "unattributed";
    const price = INPUT_PRICE[log.model] ?? 3;
    const usd = (log.tokens_in ?? 0) * (price / 1_000_000);
    acc[key] ??= { requests: 0, tokens: 0, usd: 0 };
    acc[key].requests += 1;
    acc[key].tokens += log.tokens_in ?? 0;
    acc[key].usd += usd;
  }
  return acc;
}
 
function safeParse(s: unknown) {
  try { return typeof s === "string" ? JSON.parse(s) : s; } catch { return null; }
}

If the "unattributed" bucket is swelling, that's a sign calls are still going out without metadata. I treat driving that number toward zero as my measure of whether instrumentation is finished.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
How to restore per-feature and per-workload cost attribution that the gateway dashboard alone leaves out, through deliberate metadata design
An implementation that stops the semantic cache from returning a stale answer to a close-but-different question, using cache-key namespaces and skip conditions
Code that surfaces fallback's 'stays up but quietly degrades' behavior and a gate that actually stops budget overruns
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-05-29
Rewiring Claude API Observability with OpenTelemetry GenAI Conventions — A Design Memo for Model Migrations and Cost Audits
An implementation memo for rewiring production observability around Claude API to match the OpenTelemetry GenAI semantic conventions — span attributes, metrics, cost tracking, and model-migration replay — written from running this in indie services for six months.
API & SDK2026-06-16
Confirm Your Model Actually Responds Before a Scheduled Run Begins
A model you configured can be gone before your nightly job even wakes up. Tell retirement, withdrawal, and regional restriction apart with a single startup probe, then rewrite the run config to an eligible model — with complete, working TypeScript.
API & SDK2026-06-15
Centralizing the anthropic-beta Header So a Retired Beta Won't Kill Your Batch
Scattered anthropic-beta headers turn a beta retirement or GA graduation into a 400 that takes down an entire batch. A small capability registry, a startup preflight, and tiered fallback keep your pipeline running across feature generations.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →