CLAUDE LABJP
MODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yetPRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context windowGATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoffENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alertsMODEL — Claude Sonnet 5 is now the default model across all plans, the most agentic Sonnet yetPRICE — Sonnet 5 launches at $2/$10 per million tokens, available through August 31CODE — Claude Code adopts Sonnet 5 by default with a native 1M-token context windowGATEWAY — A self-hosted Claude apps gateway arrives for Amazon Bedrock and Google Cloud (SSO, policy, cost)CHROME — Claude in Chrome is now generally available with background notifications and draft PR handoffENTERPRISE — Enterprise gains richer admin analytics, model-level entitlements, and spend alerts
Articles/API & SDK
API & SDK/2026-07-04Advanced

Reading the Claude apps gateway Announcement, I Rebuilt My Indie-Scale Control Plane

The self-hosted Claude apps gateway is a control-plane/data-plane separation you can scale down. Per-app cost attribution, model allowlists, and fail-closed spend caps, implemented as a small Cloudflare Workers proxy.

Claude API102gatewaycost management7Cloudflare Workers14operations13architecture10

Premium Article

At the start of the month, I opened my Claude API bill and stopped. I could see the total. I could not see the breakdown.

In my setup, three things shared a single API key: an automation that drafts multilingual replies to App Store reviews, the content pipeline behind my blogs, and an experimental script I had written on a whim. Which one spent what? The billing page could not tell me.

Around the same time, Anthropic announced a self-hosted Claude apps gateway for Amazon Bedrock and Google Cloud. SSO, centralized policy enforcement, role-based access, per-user cost attribution, spend limits. The vocabulary is enterprise, but the underlying problem was exactly mine on that morning.

Rather than filing the announcement under "enterprise features that don't concern me," I tried reading it as a design document. Then I scaled the idea down to indie size and built a small proxy. This is the record of that implementation and migration.

What the gateway actually centralizes

Unpacking the official description, the gateway provides four things:

  1. Centralized authorization — deciding who (which app) may call a model, at the entrance of the call path
  2. Centralized policy — enforcing which models and features are allowed on the path itself, not in each app's code
  3. Cost attribution — recording usage per user (or per app), so the bill becomes decomposable
  4. Spend limits — capping attributed cost, and refusing requests before they go through once the cap is hit

The interesting part is that every one of these depends on a single move: funneling all model calls through one path. As long as each app calls api.anthropic.com directly, none of this is possible. Once the path is unified, authorization, measurement, and policy can all ride on that one point.

Borrowing networking vocabulary, this is a separation of control plane and data plane. The app focuses on the data plane — inference in, output out — while keys, policy, and metering move to the path. The gateway, as a product, is that separation in purchasable form. That is how I read it.

What to keep and what to drop at indie scale

There is no need to import an enterprise control plane wholesale. At my scale — one person, three products — the triage looked like this.

Safe to drop: SSO and role-based access. When the only operator is me, putting human authentication on the path adds little.

Worth keeping: three things.

  • Per-app tokens — apps never see the real key; each calls with its own internal token. This becomes the unit of attribution, and if a token leaks, you revoke that app alone
  • Model allowlists — pin which models each app may use, on the path. This catches typos in model strings and expensive models left in experimental code
  • Spend caps — a monthly ceiling per app, enforced by refusing requests. As discussed below, this should be fail-closed

Here is the migration order I followed:

  1. Inventory every product using the API key (they hide in surprising places — I had forgotten one in a cron job)
  2. Deploy the proxy; the real key lives only in the proxy's secrets
  3. Issue an internal token per app and repoint each app's base URL at the proxy
  4. Only after confirming every app has switched, rotate the real key (any stragglers still calling the API directly get flushed out here)

Putting step 4 last matters. Rotate first and any app you missed dies suddenly.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A complete minimal control-plane proxy on Cloudflare Workers and Durable Objects (per-app tokens, model allowlists, spend caps)
A real migration from three products sharing one API key to per-app cost attribution, including the forgotten high-cost config it exposed
A decision framework for choosing between a self-built proxy, LiteLLM, and the official gateway based on scale and required control granularity
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-24
What I Decided the Day the Ceiling Doubled: A Headroom Budget for Scheduled Jobs on One Shared API Key
Why I did not compress my intervals when the rate limit doubled, and how to design a headroom budget for running several scheduled jobs on one shared API key, with measurement and working code.
API & SDK2026-06-20
Putting Cloudflare AI Gateway in Front of Claude Made the Numbers I Needed Disappear — Field Notes on Instrumentation
After putting Cloudflare AI Gateway in front of Claude API, here is where I actually got stung — cost attribution, semantic-cache false hits, fallback quietly lowering quality, and budgets that don't really stop anything — with the code I used to fix each.
API & SDK2026-06-14
Record Which Model Actually Answered — Attestation Logging for Headless Pipelines
Persist the model field and usage from every API response so you can detect when the served model differs from the one you requested, and reconcile per-model cost ahead of the usage credits change.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →