●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills●WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27●BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly credit●OUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retries●DYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verification●ULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflow●OPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Building Enterprise AI Backends with Claude API and NestJS: Production
A complete production guide to integrating Claude API into NestJS using dependency injection, TypeORM, SSE streaming, JWT auth, and Bull queues—with working code you can deploy today.
There's a specific moment in every backend codebase when things start going wrong. The Express /chat endpoint that started at 50 lines gradually absorbs auth logic, conversation history management, streaming, and rate limiting until it becomes a 1,000-line file that nobody wants to touch. New team members don't know where to add things. Tests are hard to write because dependencies are implicit. Everyone works around the problem instead of through it.
NestJS was designed to prevent exactly this. Its dependency injection system, module boundaries, and decorator-based patterns force the kind of structure that makes large codebases navigable. When you integrate Claude API into this structure, you end up with code that's easier to test, easier to hand off, and easier to extend.
This guide walks through building a production-grade Claude API backend with NestJS from first principles—covering every layer from the DI container to Docker Compose deployment.
Why NestJS Over Express or Hono
Hono and Express remain excellent choices for lightweight APIs, edge workers, and rapid prototypes. The case for NestJS is more specific: it pays off when teams grow and codebases need to be maintained long-term.
The cross-cutting concern problem: In Express, where to put middleware for auth, logging, and validation is an implicit convention that new team members have to learn by reading existing code. NestJS Guards, Interceptors, and Pipes have explicit, documented roles. Code review conversations shift from "where does this go?" to "is this the right implementation?"
Claude API client instance management: Calling new Anthropic() in multiple files means configuration changes have to be made in multiple places and mocking in tests becomes difficult. Registering the client in NestJS's DI container means every service gets the same configured instance, and tests can swap it out with a single provider override.
Extensibility: Adding a Bull queue, WebSocket gateway, or gRPC service to a NestJS app means creating a new module. The existing code doesn't change. In an unstructured Express app, the same additions often require refactoring existing files.
A practical decision framework: choose NestJS when your team is 5 or more people, when you need testable code, and when the service will be maintained for more than a year. For edge deployments, single-purpose microservices, or prototypes, Hono or Express remains the right call.
The key architectural decision here is the direction of dependencies: ai/ depends on conversation/, but not the reverse. Claude API call logic is contained in ai.service.ts. When you switch models or providers in the future, the blast radius is limited to that one service.
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦If you've been managing Claude API integrations in Express and finding them increasingly hard to maintain as your team grows, you'll come away with a NestJS module structure that makes ownership and testing obvious from day one
✦You'll get copy-paste-ready implementations for TypeORM conversation history, SSE streaming with proper disconnect handling, and Bull queue-based async processing—all wired together and explained
✦You'll leave with a deployable Docker Compose stack including JWT auth, throttling, and health checks that you can adapt directly to your own product
Secure payment via Stripe · Cancel anytime
Registering the Anthropic SDK in the DI Container
The most important architectural decision in any NestJS + Claude API integration is how the SDK client is provided. Creating it in individual services makes the app impossible to test and hard to configure consistently.
// src/ai/ai.module.tsimport { Module } from '@nestjs/common';import { ConfigModule, ConfigService } from '@nestjs/config';import Anthropic from '@anthropic-ai/sdk';import { AiService } from './ai.service';import { AiController } from './ai.controller';export const ANTHROPIC_CLIENT = 'ANTHROPIC_CLIENT';@Module({ imports: [ConfigModule], providers: [ { provide: ANTHROPIC_CLIENT, inject: [ConfigService], useFactory: (config: ConfigService) => { const apiKey = config.get<string>('anthropic.apiKey'); if (\!apiKey) { // Fail fast at startup—catch configuration mistakes before production throw new Error('ANTHROPIC_API_KEY is not configured'); } return new Anthropic({ apiKey }); }, }, AiService, ], controllers: [AiController], exports: [ANTHROPIC_CLIENT, AiService],})export class AiModule {}
The ANTHROPIC_CLIENT string token is the key to testability. In unit tests, you can override this provider with a mock and all services that inject it will receive the mock automatically.
Returning Observable<StreamChunk> rather than an AsyncGenerator makes the SSE controller simpler—NestJS's @Sse() decorator works directly with Observables and handles the SSE protocol framing automatically.
TypeORM Conversation History Persistence
Persisting conversation history to PostgreSQL enables cross-session continuity and per-user cost tracking—both requirements in any production AI service.
NestJS's @Sse() decorator, combined with an Observable, handles SSE protocol framing automatically. The controller's job is to map StreamChunk objects to the MessageEvent format NestJS expects.
Combining @Sse() with @Post() may look unusual—SSE conventionally uses GET—but NestJS supports POST for SSE when you need to send a request body. This is the right pattern for passing message content and conversation IDs.
On the client side, connect using the Fetch API's streaming support:
async function streamChat(message, token) { const response = await fetch('/ai/stream', { method: 'POST', headers: { 'Content-Type': 'application/json', 'Authorization': `Bearer ${token}`, }, body: JSON.stringify({ message }), }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; const text = decoder.decode(value); const lines = text.split('\n').filter(line => line.startsWith('data: ')); for (const line of lines) { const data = JSON.parse(line.slice(6)); if (data.delta) process.stdout.write(data.delta); if (data.done) console.log('\n[complete]'); if (data.error) console.error('Error:', data.error); } }}
Batch processing, long document summarization, and analysis tasks that take more than a few seconds should run through a job queue rather than blocking HTTP connections.
The exponential backoff with a 2-second base delay handles Claude API rate limit errors (429) gracefully without hammering the API during periods of high load.
Five Common Pitfalls and How to Avoid Them
These are issues that surface in real production deployments, not edge cases.
1. DI Scope and Anthropic Client Lifetime
Providing the ANTHROPIC_CLIENT with REQUEST scope creates a new SDK instance per request, exhausting connection pools under load. Keep it as a singleton (the default).
// Wrong: creates a new instance per request{ provide: ANTHROPIC_CLIENT, scope: Scope.REQUEST, useFactory: ... }// Correct: one instance for the application lifetime{ provide: ANTHROPIC_CLIENT, useFactory: ... }
2. Subject Memory Leaks on Client Disconnect
When a client disconnects mid-stream, the Subject won't complete unless you handle the disconnect explicitly. The controller implementation above handles this via socket.on('close'). Without this, each abandoned request leaves a Subject open until the stream finishes.
3. TypeORM Connection Pool Exhaustion
The default connection pool limit is 10. With Bull workers processing jobs in parallel, this limit is easily exceeded. Increase it in your TypeORM configuration:
NestJS inherits Node.js's default keepAliveTimeout of 5 seconds, which will terminate long-running streams. Update main.ts:
const app = await NestFactory.create(AppModule);const server = app.getHttpServer();server.keepAliveTimeout = 120 * 1000;server.headersTimeout = 125 * 1000; // Must be slightly longer than keepAliveTimeoutawait app.listen(3000);
5. ConfigService Returning Undefined Silently
config.get<string>('anthropic.apiKey') will return undefined if the env var isn't set, but TypeScript won't catch this at compile time. Use getOrThrow (NestJS 9.4+) or the explicit startup check in useFactory:
// Safe: throws at startup if the key is missingconst apiKey = this.config.getOrThrow<string>('anthropic.apiKey');
The condition: service_healthy in depends_on prevents the race condition where the API container starts before PostgreSQL is ready to accept connections. Without this, deployments frequently fail with transient DB connection errors on startup.
Implement the health endpoint using @nestjs/terminus:
// src/health/health.controller.tsimport { Controller, Get } from '@nestjs/common';import { HealthCheck, HealthCheckService, TypeOrmHealthIndicator, MicroserviceHealthIndicator,} from '@nestjs/terminus';@Controller('health')export class HealthController { constructor( private health: HealthCheckService, private db: TypeOrmHealthIndicator, ) {} @Get() @HealthCheck() check() { return this.health.check([ () => this.db.pingCheck('database'), ]); }}
Cost Monitoring and Observability
Storing token counts on Message entities enables per-user cost attribution and budget alerts. Add an Interceptor to log latency and token usage across all AI endpoints:
For prompt caching strategies that can reduce costs by 50-80% when system prompts are static, see Claude API cost optimization patterns. Combining caching with NestJS's singleton DI pattern for the Anthropic client gives you optimal performance with minimal configuration.
The immediate next step after deploying this stack is to hook the /health endpoint into an uptime monitor. That single change cuts incident response time significantly—you'll know something is broken before a user reports it, and you'll know whether the issue is in the app, the database, or Redis.
NestJS's conventions feel like overhead at first. The payoff comes when a new team member joins and can navigate the codebase without a guided tour, when a test needs to mock the Claude API and it's a three-line provider override, and when adding a WebSocket gateway means creating a new module rather than restructuring existing files. That's the trade-off this architecture is making—and for production AI services maintained by growing teams, it's consistently the right one.
Unit Testing with DI Mocking
The DI architecture pays its most obvious dividend in tests. Swapping out the real Anthropic client for a mock is a single provider override:
// src/ai/ai.service.spec.tsimport { Test, TestingModule } from '@nestjs/testing';import { ConfigService } from '@nestjs/config';import Anthropic from '@anthropic-ai/sdk';import { AiService } from './ai.service';import { ANTHROPIC_CLIENT } from './ai.module';const mockAnthropicClient = { messages: { create: jest.fn(), stream: jest.fn(), },};describe('AiService', () => { let service: AiService; beforeEach(async () => { const module: TestingModule = await Test.createTestingModule({ providers: [ AiService, { provide: ANTHROPIC_CLIENT, // The real client is replaced entirely—no HTTP calls in tests useValue: mockAnthropicClient, }, { provide: ConfigService, useValue: { get: (key: string) => { const config: Record<string, string | number> = { 'anthropic.defaultModel': 'claude-sonnet-4-6', 'anthropic.maxTokens': 1024, }; return config[key]; }, }, }, ], }).compile(); service = module.get<AiService>(AiService); }); afterEach(() => { jest.clearAllMocks(); }); it('should return text content from Claude API', async () => { mockAnthropicClient.messages.create.mockResolvedValueOnce({ content: [{ type: 'text', text: 'Hello from Claude' }], }); const result = await service.chat([ { role: 'user', content: 'Hello' }, ]); expect(result).toBe('Hello from Claude'); expect(mockAnthropicClient.messages.create).toHaveBeenCalledWith( expect.objectContaining({ model: 'claude-sonnet-4-6', max_tokens: 1024, messages: [{ role: 'user', content: 'Hello' }], }), ); }); it('should propagate API errors', async () => { mockAnthropicClient.messages.create.mockRejectedValueOnce( new Error('Rate limit exceeded'), ); await expect( service.chat([{ role: 'user', content: 'Hello' }]), ).rejects.toThrow('Rate limit exceeded'); });});
This test pattern works because the ANTHROPIC_CLIENT token acts as a seam—real behavior in production, mock behavior in tests, with zero changes to AiService itself. The same pattern applies to ConversationService: swap the TypeORM repositories for in-memory mocks in the test module and all service logic can be tested without a database connection.
Integration Testing with a Real Database
For integration tests, use a test PostgreSQL instance and TypeORM's synchronize: true to create the schema fresh for each test run:
// src/conversation/conversation.service.integration-spec.tsimport { Test } from '@nestjs/testing';import { TypeOrmModule } from '@nestjs/typeorm';import { ConversationService } from './conversation.service';import { Conversation } from './entities/conversation.entity';import { Message } from './entities/message.entity';describe('ConversationService (integration)', () => { let service: ConversationService; beforeAll(async () => { const module = await Test.createTestingModule({ imports: [ TypeOrmModule.forRoot({ type: 'postgres', url: process.env.TEST_DATABASE_URL, entities: [Conversation, Message], synchronize: true, // OK in test; never in production dropSchema: true, // Start fresh each run }), TypeOrmModule.forFeature([Conversation, Message]), ], providers: [ConversationService], }).compile(); service = module.get<ConversationService>(ConversationService); }); it('should create and retrieve a conversation', async () => { const conv = await service.createConversation('user-123', 'You are helpful.'); expect(conv.id).toBeDefined(); await service.appendMessage(conv.id, 'user', 'Hello'); await service.appendMessage(conv.id, 'assistant', 'Hi there\!'); const retrieved = await service.getConversationWithMessages(conv.id, 'user-123'); expect(retrieved.messages).toHaveLength(2); expect(retrieved.messages[0].role).toBe('user'); });});
Monitoring and Alerting in Production
Structured logging from the AiLoggingInterceptor becomes useful for anomaly detection. Pipe logs to a centralized service (Datadog, Grafana Loki, or CloudWatch) and set up alerts for:
P99 latency exceeding 10 seconds on the /ai/stream endpoint
Error rate exceeding 1% over a 5-minute window
Monthly token usage crossing 80% of your budget threshold
The getMonthlyTokenUsage method on ConversationService gives you the data for that last alert. A simple cron job that checks usage daily and sends a notification when the threshold is crossed is enough for early-stage products:
// src/billing/billing.scheduler.tsimport { Injectable } from '@nestjs/common';import { Cron, CronExpression } from '@nestjs/schedule';import { ConversationService } from '../conversation/conversation.service';@Injectable()export class BillingScheduler { constructor(private conversationService: ConversationService) {} @Cron(CronExpression.EVERY_DAY_AT_9AM) async checkBudgetUsage() { // Aggregate across all users and compare against monthly budget // Send alert if projected monthly spend exceeds threshold // Implementation depends on your notification infrastructure }}
Deployment Options Beyond Docker Compose
Docker Compose works well for self-hosted deployments on a single VPS. For managed deployments, the same containerized app deploys directly to Railway, Fly.io, or Render with minimal configuration changes.
Railway in particular works well with this stack: it can provision a PostgreSQL database and Redis instance alongside your app container, inject their connection strings as environment variables, and run health checks automatically. The Procfile for Railway is simply:
Separate the Bull queue processor into its own process (queue-worker.ts) so it can scale independently from the HTTP server. Under load, you might run 2 HTTP replicas and 4 worker replicas without changing any application code.
When This Architecture Is (and Isn't) Worth It
This setup—NestJS, TypeORM, Bull, JWT—has real upfront cost. You're writing more files and more configuration than you would in Express.
The architecture earns its keep when: the service handles money or user data (the auth and validation patterns matter), the team has more than 2 people working on the backend simultaneously, or the service is expected to last more than 6 months.
It's overkill when: you're building a prototype to validate an idea, the service will be replaced when you find product-market fit, or you're the only engineer and don't need the conventions.
For teams in the first category, the patterns in this guide—DI-based client registration, TypeORM entities for conversation history, Observable-based SSE streaming, and Bull for async jobs—represent a production-ready baseline. Add the monitoring, health checks, and test coverage, and you have an AI backend that can scale with your product.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.