●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Building AI Application Observability with Claude API and OpenTelemetry
Learn how to integrate OpenTelemetry with your Claude API applications for unified tracing, metrics, and logging. Covers token usage visualization, latency monitoring, cost alerting, and distributed tracing for agent workflows.
As production applications powered by Claude API continue to grow, engineering teams face operational challenges that differ significantly from traditional web services. "Why did this request take 3 seconds?" "What caused our API costs to double this month?" "Where exactly did the agent's tool call chain fail?" Without the ability to answer these questions instantly, running AI applications reliably in production becomes a constant struggle.
OpenTelemetry is the CNCF-backed standard framework for observability. It provides a unified approach to collecting and exporting three core signals — traces, metrics, and logs — and supports all major monitoring backends including Grafana, Datadog, and New Relic.
Prerequisites and Required Packages
The code examples in this article use Node.js with TypeScript. The design patterns apply equally to Python SDK implementations.
We recommend routing telemetry through the OpenTelemetry Collector to your backend (Grafana Tempo + Prometheus, Datadog, etc.). For local development, direct export without a Collector is also supported.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Master design patterns for unified tracing, metrics, and logging of Claude API calls using OpenTelemetry
✦Build real-time dashboards to visualize token usage, latency, and error rates across your AI application
✦Implement cost anomaly detection alerts and distributed tracing for agent workflows in production
Secure payment via Stripe · Cancel anytime
Architecture Design — Three Signals for AI Workloads
Observability for AI applications requires tracking metrics that go beyond what traditional web applications need.
Traces visualize the execution path of individual API requests and agent workflows. Metrics provide aggregated data on token consumption and latency, forming the foundation for cost management. Logs capture detailed error information and prompt audit trails.
OpenTelemetry SDK Initialization
Start by setting up the OpenTelemetry SDK that the entire application will use.
// src/telemetry/setup.tsimport { NodeSDK } from "@opentelemetry/sdk-node";import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";import { OTLPMetricExporter } from "@opentelemetry/exporter-metrics-otlp-http";import { PeriodicExportingMetricReader } from "@opentelemetry/sdk-metrics";import { Resource } from "@opentelemetry/resources";import { ATTR_SERVICE_NAME, ATTR_SERVICE_VERSION,} from "@opentelemetry/semantic-conventions";// Define application resource informationconst resource = new Resource({ [ATTR_SERVICE_NAME]: "claude-ai-app", [ATTR_SERVICE_VERSION]: "1.0.0", "ai.provider": "anthropic", "deployment.environment": process.env.NODE_ENV || "development",});// Trace exporter (OTLP over HTTP)const traceExporter = new OTLPTraceExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + "/v1/traces" || "http://localhost:4318/v1/traces",});// Metrics exporterconst metricExporter = new OTLPMetricExporter({ url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT + "/v1/metrics" || "http://localhost:4318/v1/metrics",});// Initialize the SDKconst sdk = new NodeSDK({ resource, traceExporter, metricReader: new PeriodicExportingMetricReader({ exporter: metricExporter, exportIntervalMillis: 30000, // Export metrics every 30 seconds }),});export function initTelemetry(): void { sdk.start(); console.log("OpenTelemetry initialized for Claude AI application"); // Graceful shutdown handler process.on("SIGTERM", () => { sdk .shutdown() .then(() => console.log("Telemetry shut down")) .catch((err) => console.error("Telemetry shutdown error", err)) .finally(() => process.exit(0)); });}
Call this setup at the very beginning of your application entry point.
// src/index.tsimport { initTelemetry } from "./telemetry/setup";initTelemetry(); // Must run before other importsimport { startServer } from "./server";startServer();
Instrumenting the Claude API Client
Build a wrapper around the Claude SDK that automatically traces every API request using the OpenTelemetry API.
// src/telemetry/claude-instrumentation.tsimport { trace, metrics, SpanStatusCode, context } from "@opentelemetry/api";import Anthropic from "@anthropic-ai/sdk";// Get tracer and meter instancesconst tracer = trace.getTracer("claude-api", "1.0.0");const meter = metrics.getMeter("claude-api", "1.0.0");// Define custom metricsconst tokenCounter = meter.createCounter("claude.tokens.total", { description: "Total tokens consumed by Claude API calls", unit: "tokens",});const inputTokenCounter = meter.createCounter("claude.tokens.input", { description: "Input tokens sent to Claude API", unit: "tokens",});const outputTokenCounter = meter.createCounter("claude.tokens.output", { description: "Output tokens received from Claude API", unit: "tokens",});const requestDuration = meter.createHistogram("claude.request.duration", { description: "Duration of Claude API requests", unit: "ms",});const requestCounter = meter.createCounter("claude.requests.total", { description: "Total number of Claude API requests",});const errorCounter = meter.createCounter("claude.errors.total", { description: "Total number of Claude API errors",});// Cost rates (as of March 2026)const COST_PER_1K_INPUT: Record<string, number> = { "claude-opus-4-6": 0.015, "claude-sonnet-4-6": 0.003, "claude-haiku-4-5-20251001": 0.0008,};const COST_PER_1K_OUTPUT: Record<string, number> = { "claude-opus-4-6": 0.075, "claude-sonnet-4-6": 0.015, "claude-haiku-4-5-20251001": 0.004,};const costCounter = meter.createCounter("claude.cost.usd", { description: "Estimated cost of Claude API calls in USD", unit: "usd",});// Instrumented Claude API clientexport class InstrumentedClaudeClient { private client: Anthropic; constructor(apiKey?: string) { this.client = new Anthropic({ apiKey }); } async createMessage( params: Anthropic.MessageCreateParamsNonStreaming ): Promise<Anthropic.Message> { // Create a span to trace the API call return tracer.startActiveSpan( "claude.messages.create", { attributes: { "ai.model": params.model, "ai.max_tokens": params.max_tokens, "ai.message_count": params.messages.length, "ai.has_tools": params.tools ? "true" : "false", "ai.has_system": params.system ? "true" : "false", }, }, async (span) => { const startTime = Date.now(); try { // Make the Claude API call const response = await this.client.messages.create(params); const duration = Date.now() - startTime; // Record response metadata on the span span.setAttributes({ "ai.response.id": response.id, "ai.response.model": response.model, "ai.response.stop_reason": response.stop_reason || "unknown", "ai.tokens.input": response.usage.input_tokens, "ai.tokens.output": response.usage.output_tokens, "ai.tokens.total": response.usage.input_tokens + response.usage.output_tokens, "ai.duration_ms": duration, }); // Record metrics const labels = { model: response.model }; inputTokenCounter.add(response.usage.input_tokens, labels); outputTokenCounter.add(response.usage.output_tokens, labels); tokenCounter.add( response.usage.input_tokens + response.usage.output_tokens, labels ); requestDuration.record(duration, labels); requestCounter.add(1, { ...labels, status: "success" }); // Estimate cost const model = response.model; const inputCost = (response.usage.input_tokens / 1000) * (COST_PER_1K_INPUT[model] || 0.003); const outputCost = (response.usage.output_tokens / 1000) * (COST_PER_1K_OUTPUT[model] || 0.015); costCounter.add(inputCost + outputCost, labels); span.setStatus({ code: SpanStatusCode.OK }); return response; } catch (error) { const duration = Date.now() - startTime; // Record error on span and metrics span.setStatus({ code: SpanStatusCode.ERROR, message: error instanceof Error ? error.message : "Unknown error", }); span.recordException(error as Error); errorCounter.add(1, { model: params.model, error_type: error instanceof Anthropic.APIError ? `${error.status}` : "unknown", }); requestDuration.record(duration, { model: params.model }); requestCounter.add(1, { model: params.model, status: "error", }); throw error; } finally { span.end(); } } ); }}
Simply using this client ensures that every Claude API call is automatically traced, with token consumption and latency recorded as metrics.
Distributed Tracing for Agent Workflows
When building agent workflows with tool use, tracking the execution path of each step becomes critical for debugging and optimization.
The key here is the tail_sampling configuration. Retaining all traces would incur massive storage costs, but by always keeping errors, slow requests, and expensive model calls, you ensure the data you need for troubleshooting is always available while keeping costs manageable.
Monitoring Prompt Caching Hit Rates
If you're using Claude API's prompt caching, tracking cache hit rates helps you measure the effectiveness of your cost optimization strategy.
// src/telemetry/cache-metrics.tsimport { metrics } from "@opentelemetry/api";import Anthropic from "@anthropic-ai/sdk";const meter = metrics.getMeter("claude-cache", "1.0.0");const cacheHitCounter = meter.createCounter("claude.cache.hits", { description: "Number of prompt cache hits",});const cacheMissCounter = meter.createCounter("claude.cache.misses", { description: "Number of prompt cache misses",});const cacheSavingsCounter = meter.createCounter( "claude.cache.savings_tokens", { description: "Tokens saved by prompt caching", unit: "tokens", });// Monitor cache effectivenessexport function recordCacheMetrics( response: Anthropic.Message, model: string): void { const usage = response.usage as Record<string, number>; // Get cache_creation_input_tokens and cache_read_input_tokens const cacheCreation = usage.cache_creation_input_tokens || 0; const cacheRead = usage.cache_read_input_tokens || 0; if (cacheRead > 0) { cacheHitCounter.add(1, { model }); // Cache reads are charged at 10% of normal rate, saving 90% cacheSavingsCounter.add(Math.floor(cacheRead * 0.9), { model }); } else if (cacheCreation > 0) { cacheMissCounter.add(1, { model }); }}
# Prompt caching hit ratesum(rate(claude_cache_hits[1h])) /(sum(rate(claude_cache_hits[1h])) + sum(rate(claude_cache_misses[1h]))) * 100# Estimated tokens saved by caching (per day)sum(increase(claude_cache_savings_tokens[24h]))
Local Development Environment with Docker Compose
Get observability running locally with this Docker Compose setup:
Run docker compose -f docker-compose.observability.yml up -d and access Grafana at http://localhost:3001.
A Note from an Indie Developer
Key Takeaways
Integrating OpenTelemetry into your Claude API applications gives you the tools to tackle the unique operational challenges of AI workloads: token consumption and cost visibility, latency optimization, error tracking, and distributed tracing of agent workflows. Tail sampling and prompt caching hit rate monitoring are particularly impactful in production, directly reducing costs while maintaining the observability you need.
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.