●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Claude API with Go: Production— Anthropic Go SDK, Concurrency, Tool Use & Microservice Integration
A practical guide to using Claude API with Go in production. Covers streaming with goroutines, concurrent Tool Use, rate limiting with channels, Gin/Echo integration, graceful shutdown, and Kubernetes deployment with working code examples.
When you first try to call the Claude API from Go, you run into friction that Python and TypeScript developers never encounter.
Streaming responses through a goroutine leads to panics when the context gets cancelled early. Parallel Tool Use triggers rate limit errors. Wiring the Gin handler to SSE requires flusher configuration that isn't documented anywhere obvious.
These are Go-specific patterns. Articles about the Python or TypeScript SDK won't help you here. I've integrated Claude API into several production Go services, and this guide documents the walls I hit along the way — with working code that solves each one.
The official Anthropic Go SDK (anthropic-sdk-go) was released in late 2024 and is still actively developed. Because it's newer than the Python SDK, there's significantly less community content available, which makes Go backend engineers especially likely to get stuck.
Setting Up the Anthropic Go SDK
Let's start with the basics: adding the dependency and writing a first call that actually works.
go get github.com/anthropics/anthropic-sdk-go
// main.gopackage mainimport ( "context" "fmt" "log" "os" "github.com/anthropics/anthropic-sdk-go" "github.com/anthropics/anthropic-sdk-go/option")func main() { // Always load the API key from environment — never hardcode it client := anthropic.NewClient( option.WithAPIKey(os.Getenv("ANTHROPIC_API_KEY")), ) ctx := context.Background() msg, err := client.Messages.New(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(1024)), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock("Describe Go's concurrency model in three sentences.")), }), }) if err \!= nil { log.Fatalf("API call failed: %v", err) } for _, block := range msg.Content { if block.Type == anthropic.ContentBlockTypeText { fmt.Println(block.Text) } }}
Key insight: The anthropic.F() helper wraps values in an Option type. The Go SDK explicitly tracks whether a field has been set — distinguishing between nil and "not set." It looks verbose at first, but this design gives you compile-time guarantees about required vs. optional API parameters.
Managing System Prompts and Conversation History
Real applications need to maintain conversation history. In Go, a struct is the natural way to hold this state.
// conversation.gopackage claudeimport ( "context" "fmt" anthropic "github.com/anthropics/anthropic-sdk-go")// ConversationSession manages the state of a single conversationtype ConversationSession struct { client *anthropic.Client systemText string history []anthropic.MessageParam model anthropic.Model}func NewConversationSession(client *anthropic.Client, systemPrompt string) *ConversationSession { return &ConversationSession{ client: client, systemText: systemPrompt, history: make([]anthropic.MessageParam, 0), model: anthropic.ModelClaude_Sonnet_4_6, }}// Send submits a user message and returns the assistant's replyfunc (s *ConversationSession) Send(ctx context.Context, userMsg string) (string, error) { s.history = append(s.history, anthropic.NewUserMessage( anthropic.NewTextBlock(userMsg), )) params := anthropic.MessageNewParams{ Model: anthropic.F(s.model), MaxTokens: anthropic.F(int64(2048)), Messages: anthropic.F(s.history), } if s.systemText \!= "" { params.System = anthropic.F([]anthropic.TextBlockParam{ anthropic.NewTextBlock(s.systemText), }) } resp, err := s.client.Messages.New(ctx, params) if err \!= nil { // Roll back the user message on failure // Without this, the next call will fail: "messages must alternate user/assistant" s.history = s.history[:len(s.history)-1] return "", fmt.Errorf("API call failed: %w", err) } var result string for _, block := range resp.Content { if block.Type == anthropic.ContentBlockTypeText { result += block.Text } } s.history = append(s.history, anthropic.NewAssistantMessage( anthropic.NewTextBlock(result), )) return result, nil}
The history rollback on error is easy to overlook, but it causes real problems. If a user message lands in history without a corresponding assistant response, the next API call fails with a message-ordering validation error.
Streaming: The Right Way
Streaming is where most Go developers get into trouble. Here are the concrete mistakes and their fixes.
The Goroutine Leak Pattern
// BAD: This leaks a goroutinefunc badStreaming(client *anthropic.Client) { ctx := context.Background() stream := client.Messages.NewStreaming(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(1024)), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock("Hello")), }), }) go func() { for stream.Next() { event := stream.Current() _ = event } // stream.Close() is never called — goroutine leaks }() // Function returns, goroutine is now orphaned}
// GOOD: Proper streaming with context-aware channel outputfunc streamToChannel(ctx context.Context, client *anthropic.Client, userMsg string, output chan<- string) error { stream := client.Messages.NewStreaming(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(2048)), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(userMsg)), }), }) defer stream.Close() // Always defer Close for stream.Next() { // Respect context cancellation between tokens select { case <-ctx.Done(): return ctx.Err() default: } event := stream.Current() switch ev := event.AsUnion().(type) { case anthropic.ContentBlockDeltaEvent: if delta, ok := ev.Delta.AsUnion().(anthropic.TextDelta); ok { // Send with context awareness — don't block forever select { case output <- delta.Text: case <-ctx.Done(): return ctx.Err() } } } } return stream.Err()}
Two things matter here: defer stream.Close() to prevent resource leaks, and checking ctx.Done() on every channel send so the goroutine exits cleanly when the client disconnects.
SSE Streaming Endpoint with Gin
// handler/stream.gopackage handlerimport ( "fmt" "net/http" "github.com/gin-gonic/gin")func (h *Handler) StreamChat(c *gin.Context) { var req struct { Message string `json:"message" binding:"required"` } if err := c.ShouldBindJSON(&req); err \!= nil { c.JSON(http.StatusBadRequest, gin.H{"error": err.Error()}) return } // SSE headers c.Header("Content-Type", "text/event-stream") c.Header("Cache-Control", "no-cache") c.Header("Connection", "keep-alive") c.Header("X-Accel-Buffering", "no") // Critical: disables nginx buffering flusher, ok := c.Writer.(http.Flusher) if \!ok { c.JSON(http.StatusInternalServerError, gin.H{"error": "streaming not supported"}) return } ctx := c.Request.Context() tokenCh := make(chan string, 10) // Buffered to absorb backpressure errCh := make(chan error, 1) go func() { defer close(tokenCh) err := h.claude.StreamMessage(ctx, req.Message, tokenCh) errCh <- err }() for { select { case token, ok := <-tokenCh: if \!ok { fmt.Fprintf(c.Writer, "data: [DONE]\n\n") flusher.Flush() return } fmt.Fprintf(c.Writer, "data: %s\n\n", token) flusher.Flush() case err := <-errCh: if err \!= nil { fmt.Fprintf(c.Writer, "event: error\ndata: %s\n\n", err.Error()) flusher.Flush() } return case <-ctx.Done(): // Client disconnected — goroutine exits via ctx cancellation return } }}
The X-Accel-Buffering: no header is easy to miss, but without it nginx buffers the response and your streaming looks broken from the client's perspective.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Developers stuck on goroutine leaks and context cancellation in Streaming will walk away with production-stable patterns they can implement today
✦Learn to safely execute concurrent Tool Use calls with Go's channel and errgroup model, including proper rate limiting that won't blow your API quota
✦Get a complete microservice architecture that works — from Gin/Echo integration to Docker containerization and Kubernetes graceful shutdown
Secure payment via Stripe · Cancel anytime
Concurrent Tool Use
Claude can request multiple tools simultaneously. Go's concurrency model is a natural fit here — but the implementation details matter.
Tool Engine with Parallel Execution
// tools/engine.gopackage toolsimport ( "context" "encoding/json" "fmt" "sync" anthropic "github.com/anthropics/anthropic-sdk-go" "golang.org/x/sync/errgroup")type ToolFunc func(ctx context.Context, input json.RawMessage) (string, error)type ToolEngine struct { tools map[string]ToolFunc defs []anthropic.ToolParam mu sync.RWMutex}func NewToolEngine() *ToolEngine { return &ToolEngine{ tools: make(map[string]ToolFunc), defs: make([]anthropic.ToolParam, 0), }}func (e *ToolEngine) Register(name, description string, inputSchema interface{}, fn ToolFunc) { e.mu.Lock() defer e.mu.Unlock() schemaBytes, _ := json.Marshal(inputSchema) e.tools[name] = fn e.defs = append(e.defs, anthropic.ToolParam{ Name: anthropic.F(name), Description: anthropic.F(description), InputSchema: anthropic.F(anthropic.ToolInputSchemaParam{ Type: anthropic.F(anthropic.ToolInputSchemaTypeObject), Properties: anthropic.Raw[interface{}](schemaBytes), }), })}func (e *ToolEngine) Definitions() []anthropic.ToolParam { e.mu.RLock() defer e.mu.RUnlock() return e.defs}// ExecuteParallel runs all tool calls concurrently using errgroupfunc (e *ToolEngine) ExecuteParallel(ctx context.Context, calls []anthropic.ToolUseBlock) ([]anthropic.ToolResultBlockParam, error) { e.mu.RLock() defer e.mu.RUnlock() results := make([]anthropic.ToolResultBlockParam, len(calls)) eg, ctx := errgroup.WithContext(ctx) for i, call := range calls { i, call := i, call // capture loop variables (required before Go 1.22) eg.Go(func() error { fn, ok := e.tools[call.Name] if \!ok { // Return the error to Claude, don't fail the whole batch results[i] = anthropic.NewToolResultBlock( call.ID, fmt.Sprintf("tool '%s' not registered", call.Name), true, ) return nil } output, err := fn(ctx, call.Input) if err \!= nil { results[i] = anthropic.NewToolResultBlock(call.ID, err.Error(), true) return nil } results[i] = anthropic.NewToolResultBlock(call.ID, output, false) return nil }) } if err := eg.Wait(); err \!= nil { return nil, err } return results, nil}
The key design decision: tool execution failures are returned to Claude via isError: true, not propagated as Go errors. This way, one failing tool doesn't abort the results from the others.
Complete Agent Loop
// agent/loop.gopackage agentimport ( "context" "fmt" anthropic "github.com/anthropics/anthropic-sdk-go" "your-module/tools")func Run(ctx context.Context, client *anthropic.Client, engine *tools.ToolEngine, userMsg string) (string, error) { messages := []anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(userMsg)), } const maxIterations = 10 // Always set a ceiling — runaway loops are expensive for i := 0; i < maxIterations; i++ { resp, err := client.Messages.New(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(4096)), Tools: anthropic.F(engine.Definitions()), Messages: anthropic.F(messages), }) if err \!= nil { return "", fmt.Errorf("iteration %d: API error: %w", i+1, err) } messages = append(messages, anthropic.NewAssistantMessage(resp.Content...)) if resp.StopReason == anthropic.StopReasonEndTurn { for _, block := range resp.Content { if block.Type == anthropic.ContentBlockTypeText { return block.Text, nil } } return "", nil } if resp.StopReason \!= anthropic.StopReasonToolUse { return "", fmt.Errorf("unexpected stop_reason: %s", resp.StopReason) } var toolCalls []anthropic.ToolUseBlock for _, block := range resp.Content { if block.Type == anthropic.ContentBlockTypeToolUse { toolCalls = append(toolCalls, block.AsToolUseBlock()) } } toolResults, err := engine.ExecuteParallel(ctx, toolCalls) if err \!= nil { return "", fmt.Errorf("tool execution error: %w", err) } resultBlocks := make([]anthropic.ContentBlockParamUnion, len(toolResults)) for j, r := range toolResults { resultBlocks[j] = r } messages = append(messages, anthropic.NewUserMessage(resultBlocks...)) } return "", fmt.Errorf("reached max iterations (%d)", maxIterations)}
Rate Limiting for Concurrent Services
When multiple users hit your Go service simultaneously, every request competes for Claude API quota. Here's a production-ready rate limiter using golang.org/x/time/rate.
// ratelimit/limiter.gopackage ratelimitimport ( "context" "fmt" "time" "golang.org/x/time/rate")type ClaudeLimiter struct { reqLimiter *rate.Limiter tokenLimiter *rate.Limiter}// NewClaudeLimiter creates a dual limiter for request count and token budget// reqPerMin: max requests per minute, tokensPerMin: max tokens per minutefunc NewClaudeLimiter(reqPerMin, tokensPerMin int) *ClaudeLimiter { return &ClaudeLimiter{ reqLimiter: rate.NewLimiter(rate.Every(time.Minute/time.Duration(reqPerMin)), reqPerMin/10), tokenLimiter: rate.NewLimiter(rate.Limit(float64(tokensPerMin))/60, tokensPerMin/10), }}// Wait blocks until the rate limits allow proceeding, or ctx is cancelledfunc (l *ClaudeLimiter) Wait(ctx context.Context, estimatedTokens int) error { if err := l.reqLimiter.Wait(ctx); err \!= nil { return fmt.Errorf("request limit wait cancelled: %w", err) } if err := l.tokenLimiter.WaitN(ctx, estimatedTokens); err \!= nil { return fmt.Errorf("token limit wait cancelled: %w", err) } return nil}
For Claude Sonnet 4.6, the quota is approximately 40,000 tokens/minute at standard tier. When estimating tokens for the limiter, a safe heuristic is: input character count × 1.5 (accounts for both input and expected output).
Common Pitfalls and Fixes
Pitfall 1: Context Timeout Too Short for Streaming
// BAD: 5s times out before streaming completesfunc badTimeout(client *anthropic.Client, msg string) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) defer cancel() stream := client.Messages.NewStreaming(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(2048)), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(msg)), }), }) defer stream.Close() // Most non-trivial responses take longer than 5s to stream fully}// GOOD: Give streaming enough runwayfunc goodTimeout(client *anthropic.Client, msg string) { ctx, cancel := context.WithTimeout(context.Background(), 5*time.Minute) defer cancel() stream := client.Messages.NewStreaming(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(2048)), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(msg)), }), }) defer stream.Close() // ...}
Pitfall 2: Type Assertion Panics on Union Types
// BAD: Panics if the block is a ToolUseBlock, not a TextBlockblock := resp.Content[0]text := block.AsTextBlock() // panic if block.Type \!= ContentBlockTypeText// GOOD: Always use a type switchfor _, block := range resp.Content { switch block.Type { case anthropic.ContentBlockTypeText: fmt.Println(block.Text) case anthropic.ContentBlockTypeToolUse: toolBlock := block.AsToolUseBlock() fmt.Printf("Tool: %s, ID: %s\n", toolBlock.Name, toolBlock.ID) default: // Handle future block types gracefully fmt.Printf("unknown block type: %s\n", block.Type) }}
Pitfall 3: Nginx Upstream Timeout Cutting Streaming Short
// BAD: All goroutines reference the last value of i and callfor i, call := range calls { go func() { process(i, call) // wrong values after loop ends }()}// GOOD: Shadow the loop variables inside the loop bodyfor i, call := range calls { i, call := i, call go func() { process(i, call) // correct }()}
Go 1.22 fixed this by default, but many production services still run on 1.21 or earlier.
Graceful Shutdown for Kubernetes
When Kubernetes rolls out a new Pod version, running Streaming requests need time to finish.
// main.gofunc main() { router := setupRouter() srv := &http.Server{ Addr: ":8080", Handler: router, ReadTimeout: 30 * time.Second, WriteTimeout: 10 * time.Minute, // Long enough for streaming responses IdleTimeout: 120 * time.Second, } quit := make(chan os.Signal, 1) signal.Notify(quit, syscall.SIGINT, syscall.SIGTERM) go func() { if err := srv.ListenAndServe(); err \!= nil && err \!= http.ErrServerClosed { log.Fatalf("server error: %v", err) } }() log.Println("Server started on :8080") <-quit log.Println("Shutting down — waiting for in-flight requests to complete") shutdownCtx, shutdownCancel := context.WithTimeout(context.Background(), 90*time.Second) defer shutdownCancel() if err := srv.Shutdown(shutdownCtx); err \!= nil { log.Printf("forced shutdown: %v", err) } log.Println("Shutdown complete")}
Set terminationGracePeriodSeconds in your Kubernetes deployment to match:
Tag integration tests separately so CI doesn't require an API key:
# Unit tests (no API key needed)go test ./...# Integration tests (requires ANTHROPIC_API_KEY)go test -tags=integration ./...
Where to Go From Here
Start by adding the Anthropic SDK to an existing Go service and implementing ConversationSession. Once you have Streaming and Tool Use working together in an agent loop, the fundamentals are solid.
Go's channel and context model turns out to be a genuinely good fit for LLM streaming — the concurrency primitives that Go developers already know map naturally onto the async, token-by-token nature of Claude's responses. Once the patterns click, integrating Claude into a Go microservice feels surprisingly clean.
Echo Framework Integration
If you're using Echo instead of Gin, the streaming setup is slightly different. Echo's Response() writer implements http.Flusher directly.
The Echo version calls c.Response().Flush() instead of a separate flusher variable — Echo's response writer wraps http.Flusher under the hood, so no type assertion is needed.
Configuration Management
Production services need externalized configuration. Hardcoding model names or token limits is a maintenance problem — every change requires a recompile.
// config/config.gopackage configimport ( "fmt" "os" "strconv" "time")type Config struct { APIKey string Model string MaxTokens int RequestTimeout time.Duration MaxRetries int ReqPerMin int TokensPerMin int Port int}func Load() (*Config, error) { apiKey := os.Getenv("ANTHROPIC_API_KEY") if apiKey == "" { return nil, fmt.Errorf("ANTHROPIC_API_KEY is not set") } maxTokens, _ := strconv.Atoi(getEnv("CLAUDE_MAX_TOKENS", "2048")) timeoutSec, _ := strconv.Atoi(getEnv("CLAUDE_TIMEOUT_SEC", "300")) reqPerMin, _ := strconv.Atoi(getEnv("CLAUDE_REQ_PER_MIN", "50")) tokensPerMin, _ := strconv.Atoi(getEnv("CLAUDE_TOKENS_PER_MIN", "40000")) port, _ := strconv.Atoi(getEnv("PORT", "8080")) return &Config{ APIKey: apiKey, Model: getEnv("CLAUDE_MODEL", "claude-sonnet-4-6"), MaxTokens: maxTokens, RequestTimeout: time.Duration(timeoutSec) * time.Second, MaxRetries: 3, ReqPerMin: reqPerMin, TokensPerMin: tokensPerMin, Port: port, }, nil}func getEnv(key, defaultVal string) string { if val := os.Getenv(key); val \!= "" { return val } return defaultVal}
For Kubernetes deployments, these values map cleanly to ConfigMap and Secret resources:
# k8s/configmap.yamlapiVersion: v1kind: ConfigMapmetadata: name: claude-service-configdata: CLAUDE_MODEL: "claude-sonnet-4-6" CLAUDE_MAX_TOKENS: "2048" CLAUDE_TIMEOUT_SEC: "300" CLAUDE_REQ_PER_MIN: "50" PORT: "8080"---# k8s/secret.yamlapiVersion: v1kind: Secretmetadata: name: claude-service-secretstype: OpaquestringData: ANTHROPIC_API_KEY: "your-api-key-here" # Use a secrets manager in production
Never put the API key in a ConfigMap. It belongs in a Secret, or better, pulled from a secrets manager like AWS Secrets Manager or HashiCorp Vault at startup.
Prompt Caching Integration
If your service sends the same long system prompt on every request, prompt caching can cut costs and latency significantly. The Go SDK supports it via the cache_control field.
// cache/cached_client.gopackage cacheimport ( "context" "fmt" anthropic "github.com/anthropics/anthropic-sdk-go")// CachedConversation wraps a fixed system prompt with cache_control enabledtype CachedConversation struct { client *anthropic.Client systemText string // Long, expensive system prompt to be cached}func NewCachedConversation(client *anthropic.Client, systemPrompt string) *CachedConversation { return &CachedConversation{ client: client, systemText: systemPrompt, }}func (c *CachedConversation) Ask(ctx context.Context, userMsg string) (string, error) { resp, err := c.client.Messages.New(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(2048)), System: anthropic.F([]anthropic.TextBlockParam{ { Type: anthropic.F(anthropic.TextBlockParamTypeText), Text: anthropic.F(c.systemText), CacheControl: anthropic.F(anthropic.CacheControlEphemeralParam{ Type: anthropic.F(anthropic.CacheControlEphemeralTypeEphemeral), }), }, }), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(userMsg)), }), }) if err \!= nil { return "", fmt.Errorf("cached request failed: %w", err) } var result string for _, block := range resp.Content { if block.Type == anthropic.ContentBlockTypeText { result += block.Text } } // Log cache hit/miss ratio for monitoring if resp.Usage.CacheReadInputTokens > 0 { fmt.Printf("Cache hit: %d tokens served from cache\n", resp.Usage.CacheReadInputTokens) } return result, nil}
Prompt caching is most valuable when your system prompt exceeds 1,024 tokens (the minimum cacheable size). For a document-analysis service where you prepend a large reference document to every request, this can reduce both cost and latency by up to 90%.
Observability: Logging and Metrics
A production service needs visibility into API usage, error rates, and latency. Here's a minimal but useful instrumentation setup using structured logging and Prometheus metrics.
// observability/metrics.gopackage observabilityimport ( "time" "github.com/prometheus/client_golang/prometheus" "github.com/prometheus/client_golang/prometheus/promauto")var ( APIRequestsTotal = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "claude_api_requests_total", Help: "Total number of Claude API requests", }, []string{"model", "status"}, // status: "success" | "error" | "rate_limited" ) APIRequestDuration = promauto.NewHistogramVec( prometheus.HistogramOpts{ Name: "claude_api_request_duration_seconds", Help: "Claude API request duration in seconds", Buckets: []float64{0.5, 1.0, 2.0, 5.0, 10.0, 30.0, 60.0}, }, []string{"model"}, ) TokensUsed = promauto.NewCounterVec( prometheus.CounterOpts{ Name: "claude_tokens_used_total", Help: "Total tokens consumed", }, []string{"model", "type"}, // type: "input" | "output" | "cache_read" ))// RecordRequest wraps an API call with metric collectionfunc RecordRequest(model string, fn func() error) error { start := time.Now() err := fn() duration := time.Since(start).Seconds() status := "success" if err \!= nil { status = "error" } APIRequestsTotal.WithLabelValues(model, status).Inc() APIRequestDuration.WithLabelValues(model).Observe(duration) return err}
Expose the /metrics endpoint and scrape it with Prometheus. The token usage metric is particularly useful for cost forecasting and quota planning.
Dockerfile for Production
# Build stageFROM golang:1.22-alpine AS builderWORKDIR /appCOPY go.mod go.sum ./RUN go mod downloadCOPY . .RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o claude-service ./cmd/server# Runtime stage — minimal imageFROM alpine:3.19RUN apk --no-cache add ca-certificates tzdataWORKDIR /root/COPY --from=builder /app/claude-service .# Never run as root in productionRUN addgroup -S appgroup && adduser -S appuser -G appgroupUSER appuserEXPOSE 8080CMD ["./claude-service"]
The multi-stage build keeps the final image small (typically under 20MB). Running as a non-root user is a Kubernetes security requirement in many organizations.
Putting It All Together
The patterns in this guide work well independently, but they're designed to compose. A production Claude service in Go typically looks like this:
config.Load() reads all settings from environment variables at startup
NewClaudeLimiter() enforces rate limits before every API call
ToolEngine registers domain-specific functions once at startup
ConversationSession holds per-request state
Gin or Echo routes expose HTTP/SSE endpoints
Metrics middleware wraps every API call for Prometheus scraping
Graceful shutdown gives in-flight streaming requests 90 seconds to complete
The best starting point is ConversationSession. Add it to an existing service, confirm that conversations work correctly, then layer in streaming, then Tool Use. Trying to add everything at once makes debugging much harder.
Once the integration is stable, the OpenTelemetry observability guide covers distributed tracing across multiple services — useful when your Claude service is one component in a larger system.
Go's concurrency model is, somewhat surprisingly, a natural fit for LLM APIs. The token-by-token stream maps cleanly to a channel, context cancellation propagates cleanly across goroutines, and errgroup handles parallel tool execution without much ceremony. The rough edges are real, but once you've hit each pitfall once, the patterns become second nature.
Exponential Backoff and Retry Logic
Rate limit errors (HTTP 429) and transient server errors (HTTP 529) happen in production. Rather than letting them surface as user-facing failures, implement retry logic with exponential backoff.
// retry/retry.gopackage retryimport ( "context" "errors" "fmt" "net/http" "time" anthropic "github.com/anthropics/anthropic-sdk-go")// isRetryable returns true for errors that warrant a retryfunc isRetryable(err error) bool { var apiErr *anthropic.Error if errors.As(err, &apiErr) { switch apiErr.StatusCode { case http.StatusTooManyRequests, // 429: rate limited http.StatusServiceUnavailable, // 503: temporary outage 529: // Claude-specific overload code return true } } return false}// WithRetry wraps an API call with retries and exponential backofffunc WithRetry(ctx context.Context, maxAttempts int, fn func() (*anthropic.Message, error)) (*anthropic.Message, error) { var lastErr error for attempt := 0; attempt < maxAttempts; attempt++ { msg, err := fn() if err == nil { return msg, nil } lastErr = err if \!isRetryable(err) { return nil, err // Non-retryable: fail immediately } if attempt == maxAttempts-1 { break // Last attempt failed } // Exponential backoff: 1s, 2s, 4s, ... backoff := time.Duration(1<<uint(attempt)) * time.Second select { case <-time.After(backoff): case <-ctx.Done(): return nil, fmt.Errorf("retry cancelled: %w", ctx.Err()) } } return nil, fmt.Errorf("all %d attempts failed, last error: %w", maxAttempts, lastErr)}
Three retry attempts with exponential backoff handles the vast majority of transient failures without adding significant latency for successful calls.
A Note from an Indie Developer
Structured Output Parsing
When you need Claude to return structured data rather than free-form text, combining a JSON schema instruction in the system prompt with Go's encoding/json gives reliable results.
// structured/parser.gopackage structuredimport ( "context" "encoding/json" "fmt" anthropic "github.com/anthropics/anthropic-sdk-go")// ExtractJSON sends a prompt instructing Claude to return valid JSON,// then unmarshals the response into the target structfunc ExtractJSON[T any](ctx context.Context, client *anthropic.Client, userPrompt string) (*T, error) { systemPrompt := `You are a data extraction assistant. Always respond with valid JSON only.Do not include markdown code fences, explanations, or any text outside the JSON object.` resp, err := client.Messages.New(ctx, anthropic.MessageNewParams{ Model: anthropic.F(anthropic.ModelClaude_Sonnet_4_6), MaxTokens: anthropic.F(int64(1024)), System: anthropic.F([]anthropic.TextBlockParam{ anthropic.NewTextBlock(systemPrompt), }), Messages: anthropic.F([]anthropic.MessageParam{ anthropic.NewUserMessage(anthropic.NewTextBlock(userPrompt)), }), }) if err \!= nil { return nil, fmt.Errorf("API call failed: %w", err) } var rawJSON string for _, block := range resp.Content { if block.Type == anthropic.ContentBlockTypeText { rawJSON = block.Text break } } var result T if err := json.Unmarshal([]byte(rawJSON), &result); err \!= nil { return nil, fmt.Errorf("JSON unmarshal failed (response: %q): %w", rawJSON, err) } return &result, nil}// Example usage:type ProductReview struct { Sentiment string `json:"sentiment"` // "positive" | "neutral" | "negative" Score int `json:"score"` // 1-10 Keywords []string `json:"keywords"` Summary string `json:"summary"`}func AnalyzeReview(ctx context.Context, client *anthropic.Client, reviewText string) (*ProductReview, error) { prompt := fmt.Sprintf(`Analyze the following product review and extract structured data:%sReturn a JSON object with these fields:- sentiment: "positive", "neutral", or "negative"- score: integer 1-10 (10 = most positive)- keywords: array of key terms from the review- summary: one sentence summary`, reviewText) return ExtractJSON[ProductReview](ctx, client, prompt)}
The generic ExtractJSON[T] function works for any struct that can be represented as JSON. In practice, I add a validation step after unmarshaling to check required fields — Claude occasionally returns partial JSON under heavy load or when the schema is ambiguous.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.