TAG

Cost optimization

17 articles

Claude API⁸ Claude Code⁵ Automation² headless² indie dev² Model Selection² streaming² production² Prompt Caching² batch processing² API² effort¹

⬡ API & SDK/2026-06-20Advanced

Routing the effort Parameter Per Stage to Balance Claude's Output Cost and Latency

Claude's effort parameter governs all output tokens — thinking, prose, and tool calls. This guide replaces a blanket high setting with per-stage tiers and a dynamic router, grounded in measurements from a solo developer's automation pipeline.

⬡ API & SDK/2026-06-20Advanced

Putting Cloudflare AI Gateway in Front of Claude Made the Numbers I Needed Disappear — Field Notes on Instrumentation

After putting Cloudflare AI Gateway in front of Claude API, here is where I actually got stung — cost attribution, semantic-cache false hits, fallback quietly lowering quality, and budgets that don't really stop anything — with the code I used to fix each.

⟐ Claude Code/2026-06-14Advanced

Pacing Non-Rollover Monthly Credits: A Burn-Rate Scheduler That Avoids Both Early Exhaustion and Wasted Balance

Non-rollover monthly credits punish you for spending too fast and for spending too slow. Here is the design of a scheduler that derives a daily burn rate from remaining balance and days left, throttles headless runs automatically, and the real numbers from running it on a personal automation setup.

⟐ Claude Code/2026-06-14Intermediate

Measuring a Week of Headless Usage the Night Before the Billing Change

With headless Claude Code moving to monthly credits on June 15, I spent a week logging how many tokens my unattended runs actually consume, so I could pick a plan based on numbers instead of a guess.

⟐ Claude Code/2026-06-13Advanced

Pin Your Execution Model with enforceAvailableModels — Don't Let Auto-Upgrades Burn Through Your Credits

From June 15, monthly credits become non-rolling. This is a setup-layer approach to keep subagents and fallbacks from silently upgrading to pricier models, using enforceAvailableModels to pin the model set from the managed-settings layer, with a verification step you can run in CI.

⬡ API & SDK/2026-06-12Intermediate

Reallocating My Automation Pipeline Ahead of the June 15 Billing Change

On June 15, the Agent SDK, headless Claude Code, and GitHub Actions move to monthly usage credits. I audited every stage of my publishing pipeline against measured token logs and rerouted each one across three execution paths. Here is the reasoning.

⬡ API & SDK/2026-05-14Advanced

6 Traps I Hit Building In-App AI Chat with Claude API — Lessons from 10 Years of Indie Dev and 50M+ Downloads

Six real design mistakes I encountered shipping Claude API in-app chat to production — covering context management, streaming error detection, guardrails, session persistence, model versioning, and cost monitoring. Includes working TypeScript code.

⬡ API & SDK/2026-05-12Advanced

Combining Haiku 4.5, Streaming, and Prompt Caching to Cut Costs in a Personal App — An Implementation Record

A hands-on record of combining Claude Haiku 4.5, streaming, and prompt caching to improve both cost and response speed in a personal iOS/Android app — including the mistakes made along the way.

⟐ Claude Code/2026-05-04Intermediate

Build a Pipeline Where Docs Update Automatically Every Time Your Code Changes

Build a CI/CD pipeline that auto-generates README, CHANGELOG, and API docs whenever code changes. Use Claude Haiku 4.5 for cost-efficient classification and Sonnet 4.6 for quality output — cutting API costs by up to 70% while keeping documentation accurate.

⟐ Claude Code/2026-05-04Advanced

Claude Code vs OpenCode + Gemma 4 — A Strategic Guide for Cloud and Local AI Coding

OpenCode paired with Google Gemma 4 is being marketed as a 'free Claude Code'. After running Claude Code in production for half a year, here is an honest, four-axis framework for choosing between them — or, more often, combining them.

⬡ API & SDK/2026-05-02Advanced

Building a Cost-Optimized Multi-Provider AI Gateway with Claude API and LiteLLM — Fallback Design, A/B Testing, and Provider Migration Strategy

Learn how to build a production-grade multi-provider AI gateway centered on Claude API using LiteLLM. Covers fallback chain design, A/B testing, cost-based routing, and provider migration strategy with complete code examples.

◉ Claude AI/2026-04-26Intermediate

Claude Sonnet 4.6 vs Opus 4.6 — A Task-by-Task Selection Guide From Daily Use

Choosing between Sonnet 4.6 and Opus 4.6 comes up more often than you'd expect. From someone who uses both daily, here's a task-by-task breakdown of when the cost gap is justified and when it isn't.