⬡ API & SDK/2026-04-13Advanced

Building Enterprise AI Backends with Claude API and NestJS: Production

A complete production guide to integrating Claude API into NestJS using dependency injection, TypeORM, SSE streaming, JWT auth, and Bull queues—with working code you can deploy today.

NestJS TypeScript²⁴ Claude API¹¹⁵ TypeORM Enterprise⁴ Backend⁴ SSE⁴ Docker⁵

✦ Premium Article

There's a specific moment in every backend codebase when things start going wrong. The Express /chat endpoint that started at 50 lines gradually absorbs auth logic, conversation history management, streaming, and rate limiting until it becomes a 1,000-line file that nobody wants to touch. New team members don't know where to add things. Tests are hard to write because dependencies are implicit. Everyone works around the problem instead of through it.

NestJS was designed to prevent exactly this. Its dependency injection system, module boundaries, and decorator-based patterns force the kind of structure that makes large codebases navigable. When you integrate Claude API into this structure, you end up with code that's easier to test, easier to hand off, and easier to extend.

This guide walks through building a production-grade Claude API backend with NestJS from first principles—covering every layer from the DI container to Docker Compose deployment.

Why NestJS Over Express or Hono

Hono and Express remain excellent choices for lightweight APIs, edge workers, and rapid prototypes. The case for NestJS is more specific: it pays off when teams grow and codebases need to be maintained long-term.

The cross-cutting concern problem: In Express, where to put middleware for auth, logging, and validation is an implicit convention that new team members have to learn by reading existing code. NestJS Guards, Interceptors, and Pipes have explicit, documented roles. Code review conversations shift from "where does this go?" to "is this the right implementation?"

Claude API client instance management: Calling new Anthropic() in multiple files means configuration changes have to be made in multiple places and mocking in tests becomes difficult. Registering the client in NestJS's DI container means every service gets the same configured instance, and tests can swap it out with a single provider override.

Extensibility: Adding a Bull queue, WebSocket gateway, or gRPC service to a NestJS app means creating a new module. The existing code doesn't change. In an unstructured Express app, the same additions often require refactoring existing files.

A practical decision framework: choose NestJS when your team is 5 or more people, when you need testable code, and when the service will be maintained for more than a year. For edge deployments, single-purpose microservices, or prototypes, Hono or Express remains the right call.

Project Architecture: Domain-Oriented Module Structure

src/
├── app.module.ts
├── main.ts
├── config/
│   └── anthropic.config.ts
├── ai/
│   ├── ai.module.ts
│   ├── ai.service.ts
│   ├── ai.controller.ts
│   └── dto/
│       ├── chat.dto.ts
│       └── stream-chat.dto.ts
├── conversation/
│   ├── conversation.module.ts
│   ├── conversation.service.ts
│   └── entities/
│       ├── conversation.entity.ts
│       └── message.entity.ts
├── auth/
│   ├── auth.module.ts
│   ├── auth.guard.ts
│   └── current-user.decorator.ts
└── health/
    └── health.controller.ts

The key architectural decision here is the direction of dependencies: ai/ depends on conversation/, but not the reverse. Claude API call logic is contained in ai.service.ts. When you switch models or providers in the future, the blast radius is limited to that one service.

Install dependencies:

npm i -g @nestjs/cli
nest new claude-enterprise-api
cd claude-enterprise-api
npm install @anthropic-ai/sdk @nestjs/config @nestjs/typeorm typeorm pg
npm install @nestjs/bull bull @nestjs/jwt @nestjs/throttler @nestjs/terminus
npm install -D @types/bull

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If you've been managing Claude API integrations in Express and finding them increasingly hard to maintain as your team grows, you'll come away with a NestJS module structure that makes ownership and testing obvious from day one

✦You'll get copy-paste-ready implementations for TypeORM conversation history, SSE streaming with proper disconnect handling, and Bull queue-based async processing—all wired together and explained

✦You'll leave with a deployable Docker Compose stack including JWT auth, throttling, and health checks that you can adapt directly to your own product

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Registering the Anthropic SDK in the DI Container

The most important architectural decision in any NestJS + Claude API integration is how the SDK client is provided. Creating it in individual services makes the app impossible to test and hard to configure consistently.

// src/config/anthropic.config.ts
import { registerAs } from '@nestjs/config';
 
export default registerAs('anthropic', () => ({
  apiKey: process.env.ANTHROPIC_API_KEY,
  defaultModel: process.env.ANTHROPIC_DEFAULT_MODEL || 'claude-sonnet-4-6',
  maxTokens: parseInt(process.env.ANTHROPIC_MAX_TOKENS || '4096', 10),
}));

// src/ai/ai.module.ts
import { Module } from '@nestjs/common';
import { ConfigModule, ConfigService } from '@nestjs/config';
import Anthropic from '@anthropic-ai/sdk';
import { AiService } from './ai.service';
import { AiController } from './ai.controller';
 
export const ANTHROPIC_CLIENT = 'ANTHROPIC_CLIENT';
 
@Module({
  imports: [ConfigModule],
  providers: [
    {
      provide: ANTHROPIC_CLIENT,
      inject: [ConfigService],
      useFactory: (config: ConfigService) => {
        const apiKey = config.get<string>('anthropic.apiKey');
        if (\!apiKey) {
          // Fail fast at startup—catch configuration mistakes before production
          throw new Error('ANTHROPIC_API_KEY is not configured');
        }
        return new Anthropic({ apiKey });
      },
    },
    AiService,
  ],
  controllers: [AiController],
  exports: [ANTHROPIC_CLIENT, AiService],
})
export class AiModule {}

The ANTHROPIC_CLIENT string token is the key to testability. In unit tests, you can override this provider with a mock and all services that inject it will receive the mock automatically.

Complete AiService with Streaming

// src/ai/ai.service.ts
import { Injectable, Inject, Logger } from '@nestjs/common';
import Anthropic from '@anthropic-ai/sdk';
import { ConfigService } from '@nestjs/config';
import { Observable, Subject } from 'rxjs';
import { ANTHROPIC_CLIENT } from './ai.module';
 
export interface StreamChunk {
  type: 'delta' | 'done' | 'error';
  content?: string;
  error?: string;
}
 
@Injectable()
export class AiService {
  private readonly logger = new Logger(AiService.name);
 
  constructor(
    @Inject(ANTHROPIC_CLIENT) private readonly client: Anthropic,
    private readonly config: ConfigService,
  ) {}
 
  async chat(
    messages: Anthropic.MessageParam[],
    systemPrompt?: string,
  ): Promise<string> {
    const model = this.config.get<string>('anthropic.defaultModel');
    const maxTokens = this.config.get<number>('anthropic.maxTokens');
 
    try {
      const response = await this.client.messages.create({
        model,
        max_tokens: maxTokens,
        ...(systemPrompt ? { system: systemPrompt } : {}),
        messages,
      });
 
      const content = response.content[0];
      if (content.type \!== 'text') {
        throw new Error(`Unexpected content type: ${content.type}`);
      }
      return content.text;
    } catch (error) {
      this.logger.error('Claude API call failed', { error });
      throw error;
    }
  }
 
  streamChat(
    messages: Anthropic.MessageParam[],
    systemPrompt?: string,
  ): Observable<StreamChunk> {
    // Subject bridges async iteration to the Observable world
    const subject = new Subject<StreamChunk>();
    const model = this.config.get<string>('anthropic.defaultModel');
    const maxTokens = this.config.get<number>('anthropic.maxTokens');
 
    (async () => {
      try {
        const stream = await this.client.messages.stream({
          model,
          max_tokens: maxTokens,
          ...(systemPrompt ? { system: systemPrompt } : {}),
          messages,
        });
 
        for await (const event of stream) {
          if (
            event.type === 'content_block_delta' &&
            event.delta.type === 'text_delta'
          ) {
            subject.next({ type: 'delta', content: event.delta.text });
          } else if (event.type === 'message_stop') {
            subject.next({ type: 'done' });
            subject.complete();
          }
        }
      } catch (error) {
        const message =
          error instanceof Error ? error.message : 'Unknown error';
        subject.next({ type: 'error', error: message });
        subject.error(error);
      }
    })();
 
    return subject.asObservable();
  }
}

Returning Observable<StreamChunk> rather than an AsyncGenerator makes the SSE controller simpler—NestJS's @Sse() decorator works directly with Observables and handles the SSE protocol framing automatically.

TypeORM Conversation History Persistence

Persisting conversation history to PostgreSQL enables cross-session continuity and per-user cost tracking—both requirements in any production AI service.

// src/conversation/entities/conversation.entity.ts
import {
  Entity, PrimaryGeneratedColumn, Column,
  CreateDateColumn, UpdateDateColumn, OneToMany,
} from 'typeorm';
import { Message } from './message.entity';
 
@Entity('conversations')
export class Conversation {
  @PrimaryGeneratedColumn('uuid')
  id: string;
 
  @Column()
  userId: string;
 
  @Column({ nullable: true })
  title: string;
 
  @Column({ default: 'claude-sonnet-4-6' })
  model: string;
 
  @Column({ nullable: true, type: 'text' })
  systemPrompt: string;
 
  @OneToMany(() => Message, (message) => message.conversation, {
    cascade: true,
  })
  messages: Message[];
 
  @CreateDateColumn()
  createdAt: Date;
 
  @UpdateDateColumn()
  updatedAt: Date;
}

// src/conversation/entities/message.entity.ts
import {
  Entity, PrimaryGeneratedColumn, Column,
  CreateDateColumn, ManyToOne, JoinColumn,
} from 'typeorm';
import { Conversation } from './conversation.entity';
 
@Entity('messages')
export class Message {
  @PrimaryGeneratedColumn('uuid')
  id: string;
 
  @ManyToOne(() => Conversation, (conv) => conv.messages, {
    onDelete: 'CASCADE',
  })
  @JoinColumn({ name: 'conversation_id' })
  conversation: Conversation;
 
  @Column()
  conversationId: string;
 
  @Column({ type: 'enum', enum: ['user', 'assistant'] })
  role: 'user' | 'assistant';
 
  @Column({ type: 'text' })
  content: string;
 
  // Store token counts for cost monitoring
  @Column({ nullable: true, type: 'int' })
  inputTokens: number;
 
  @Column({ nullable: true, type: 'int' })
  outputTokens: number;
 
  @CreateDateColumn()
  createdAt: Date;
}

// src/conversation/conversation.service.ts
import { Injectable, NotFoundException } from '@nestjs/common';
import { InjectRepository } from '@nestjs/typeorm';
import { Repository } from 'typeorm';
import Anthropic from '@anthropic-ai/sdk';
import { Conversation } from './entities/conversation.entity';
import { Message } from './entities/message.entity';
 
@Injectable()
export class ConversationService {
  constructor(
    @InjectRepository(Conversation)
    private conversationRepo: Repository<Conversation>,
    @InjectRepository(Message)
    private messageRepo: Repository<Message>,
  ) {}
 
  async createConversation(
    userId: string,
    systemPrompt?: string,
  ): Promise<Conversation> {
    return this.conversationRepo.save(
      this.conversationRepo.create({ userId, systemPrompt }),
    );
  }
 
  async getConversationWithMessages(
    id: string,
    userId: string,
  ): Promise<Conversation> {
    const conv = await this.conversationRepo.findOne({
      where: { id, userId },
      relations: ['messages'],
      order: { messages: { createdAt: 'ASC' } },
    });
    if (\!conv) throw new NotFoundException(`Conversation ${id} not found`);
    return conv;
  }
 
  toAnthropicMessages(messages: Message[]): Anthropic.MessageParam[] {
    return messages.map((m) => ({ role: m.role, content: m.content }));
  }
 
  async appendMessage(
    conversationId: string,
    role: 'user' | 'assistant',
    content: string,
    tokenUsage?: { input?: number; output?: number },
  ): Promise<Message> {
    return this.messageRepo.save(
      this.messageRepo.create({
        conversationId,
        role,
        content,
        inputTokens: tokenUsage?.input,
        outputTokens: tokenUsage?.output,
      }),
    );
  }
 
  // For long conversations: fetch only recent messages to stay within context limits
  async getRecentMessages(
    conversationId: string,
    limit = 50,
  ): Promise<Message[]> {
    const messages = await this.messageRepo
      .createQueryBuilder('m')
      .where('m.conversationId = :id', { id: conversationId })
      .orderBy('m.createdAt', 'DESC')
      .take(limit)
      .getMany();
    return messages.reverse();
  }
 
  async getMonthlyTokenUsage(userId: string): Promise<number> {
    const startOfMonth = new Date();
    startOfMonth.setDate(1);
    startOfMonth.setHours(0, 0, 0, 0);
 
    const result = await this.messageRepo
      .createQueryBuilder('m')
      .innerJoin('m.conversation', 'c')
      .select('SUM(m.input_tokens + m.output_tokens)', 'total')
      .where('c.user_id = :userId', { userId })
      .andWhere('m.created_at >= :start', { start: startOfMonth })
      .getRawOne<{ total: string }>();
 
    return parseInt(result?.total || '0', 10);
  }
}

Implementing SSE Streaming in the Controller

NestJS's @Sse() decorator, combined with an Observable, handles SSE protocol framing automatically. The controller's job is to map StreamChunk objects to the MessageEvent format NestJS expects.

// src/ai/dto/stream-chat.dto.ts
import { IsString, IsOptional, IsUUID } from 'class-validator';
 
export class StreamChatDto {
  @IsString()
  message: string;
 
  @IsOptional()
  @IsString()
  systemPrompt?: string;
 
  @IsOptional()
  @IsUUID()
  conversationId?: string;
}

// src/ai/ai.controller.ts
import {
  Controller, Post, Body, Sse,
  UseGuards, Req, HttpCode,
} from '@nestjs/common';
import { Observable, map, catchError, of } from 'rxjs';
import { MessageEvent } from '@nestjs/common';
import { Throttle } from '@nestjs/throttler';
import { JwtAuthGuard } from '../auth/auth.guard';
import { AiService, StreamChunk } from './ai.service';
import { StreamChatDto } from './dto/stream-chat.dto';
 
@Controller('ai')
@UseGuards(JwtAuthGuard)
export class AiController {
  constructor(private readonly aiService: AiService) {}
 
  @Sse('stream')
  @Post()
  @HttpCode(200)
  @Throttle({ default: { limit: 10, ttl: 60000 } })
  streamChat(
    @Body() dto: StreamChatDto,
    @Req() req: { user: { id: string }; socket?: { on: Function } },
  ): Observable<MessageEvent> {
    const source$ = this.aiService.streamChat(
      [{ role: 'user', content: dto.message }],
      dto.systemPrompt,
    );
 
    // Wrap to handle client disconnect cleanly
    return new Observable<StreamChunk>((observer) => {
      const sub = source$.subscribe(observer);
      req.socket?.on('close', () => {
        sub.unsubscribe();
      });
      return () => sub.unsubscribe();
    }).pipe(
      map((chunk): MessageEvent => ({
        data: JSON.stringify(
          chunk.type === 'delta'
            ? { delta: chunk.content }
            : chunk.type === 'done'
            ? { done: true }
            : { error: chunk.error },
        ),
      })),
      catchError((error) =>
        of({
          data: JSON.stringify({
            error: 'Stream interrupted',
            message: error instanceof Error ? error.message : 'Unknown error',
          }),
        }),
      ),
    );
  }
}

Combining @Sse() with @Post() may look unusual—SSE conventionally uses GET—but NestJS supports POST for SSE when you need to send a request body. This is the right pattern for passing message content and conversation IDs.

On the client side, connect using the Fetch API's streaming support:

async function streamChat(message, token) {
  const response = await fetch('/ai/stream', {
    method: 'POST',
    headers: {
      'Content-Type': 'application/json',
      'Authorization': `Bearer ${token}`,
    },
    body: JSON.stringify({ message }),
  });
 
  const reader = response.body.getReader();
  const decoder = new TextDecoder();
 
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
 
    const text = decoder.decode(value);
    const lines = text.split('\n').filter(line => line.startsWith('data: '));
    for (const line of lines) {
      const data = JSON.parse(line.slice(6));
      if (data.delta) process.stdout.write(data.delta);
      if (data.done) console.log('\n[complete]');
      if (data.error) console.error('Error:', data.error);
    }
  }
}

JWT Authentication and Throttling

// src/auth/auth.guard.ts
import {
  Injectable, CanActivate, ExecutionContext, UnauthorizedException,
} from '@nestjs/common';
import { JwtService } from '@nestjs/jwt';
import { Request } from 'express';
 
@Injectable()
export class JwtAuthGuard implements CanActivate {
  constructor(private jwtService: JwtService) {}
 
  async canActivate(context: ExecutionContext): Promise<boolean> {
    const request = context.switchToHttp().getRequest<Request>();
    const token = this.extractToken(request);
 
    if (\!token) {
      throw new UnauthorizedException('No token provided');
    }
 
    try {
      const payload = await this.jwtService.verifyAsync<{
        sub: string;
        email: string;
      }>(token, { secret: process.env.JWT_SECRET });
 
      request['user'] = { id: payload.sub, email: payload.email };
      return true;
    } catch {
      throw new UnauthorizedException('Invalid or expired token');
    }
  }
 
  private extractToken(request: Request): string | undefined {
    const [type, token] = request.headers.authorization?.split(' ') ?? [];
    return type === 'Bearer' ? token : undefined;
  }
}

Using verifyAsync rather than the synchronous verify prepares the codebase for RS256 or other async algorithms without refactoring.

Configure multi-tier throttling in AppModule to protect API cost from abuse:

ThrottlerModule.forRoot([
  { name: 'burst', ttl: 1000, limit: 3 },     // 3 req/second
  { name: 'medium', ttl: 10000, limit: 20 },   // 20 req/10 seconds
  { name: 'hourly', ttl: 3600000, limit: 200 }, // 200 req/hour
]),

Async Task Processing with Bull and Redis

Batch processing, long document summarization, and analysis tasks that take more than a few seconds should run through a job queue rather than blocking HTTP connections.

// src/ai/ai-queue.processor.ts
import { Process, Processor } from '@nestjs/bull';
import { Logger } from '@nestjs/common';
import { Job } from 'bull';
import { AiService } from './ai.service';
import { ConversationService } from '../conversation/conversation.service';
 
export interface AiJobData {
  conversationId: string;
  userId: string;
  userMessage: string;
  systemPrompt?: string;
}
 
@Processor('ai-tasks')
export class AiQueueProcessor {
  private readonly logger = new Logger(AiQueueProcessor.name);
 
  constructor(
    private readonly aiService: AiService,
    private readonly conversationService: ConversationService,
  ) {}
 
  @Process('chat-completion')
  async handleChatCompletion(job: Job<AiJobData>): Promise<string> {
    const { conversationId, userId, userMessage, systemPrompt } = job.data;
    this.logger.log(`Processing job ${job.id}`);
 
    try {
      const conversation = await this.conversationService
        .getConversationWithMessages(conversationId, userId);
 
      const messages = this.conversationService
        .toAnthropicMessages(conversation.messages);
      messages.push({ role: 'user', content: userMessage });
 
      const response = await this.aiService.chat(messages, systemPrompt);
 
      await Promise.all([
        this.conversationService.appendMessage(conversationId, 'user', userMessage),
        this.conversationService.appendMessage(conversationId, 'assistant', response),
      ]);
 
      await job.progress(100);
      return response;
    } catch (error) {
      this.logger.error(`Job ${job.id} failed`, error);
      throw error; // Bull retries automatically (3 attempts by default)
    }
  }
}

Add jobs with retry configuration:

await this.aiQueue.add('chat-completion', jobData, {
  attempts: 3,
  backoff: { type: 'exponential', delay: 2000 },
  removeOnComplete: 100,
  removeOnFail: 50,
});

The exponential backoff with a 2-second base delay handles Claude API rate limit errors (429) gracefully without hammering the API during periods of high load.

Five Common Pitfalls and How to Avoid Them

These are issues that surface in real production deployments, not edge cases.

1. DI Scope and Anthropic Client Lifetime

Providing the ANTHROPIC_CLIENT with REQUEST scope creates a new SDK instance per request, exhausting connection pools under load. Keep it as a singleton (the default).

// Wrong: creates a new instance per request
{ provide: ANTHROPIC_CLIENT, scope: Scope.REQUEST, useFactory: ... }
 
// Correct: one instance for the application lifetime
{ provide: ANTHROPIC_CLIENT, useFactory: ... }

2. Subject Memory Leaks on Client Disconnect

When a client disconnects mid-stream, the Subject won't complete unless you handle the disconnect explicitly. The controller implementation above handles this via socket.on('close'). Without this, each abandoned request leaves a Subject open until the stream finishes.

3. TypeORM Connection Pool Exhaustion

The default connection pool limit is 10. With Bull workers processing jobs in parallel, this limit is easily exceeded. Increase it in your TypeORM configuration:

TypeOrmModule.forRoot({
  type: 'postgres',
  url: process.env.DATABASE_URL,
  entities: [Conversation, Message],
  synchronize: process.env.NODE_ENV \!== 'production',
  extra: { max: 30, connectionTimeoutMillis: 5000 },
})

4. HTTP Server Timeout vs. Streaming Duration

NestJS inherits Node.js's default keepAliveTimeout of 5 seconds, which will terminate long-running streams. Update main.ts:

const app = await NestFactory.create(AppModule);
const server = app.getHttpServer();
server.keepAliveTimeout = 120 * 1000;
server.headersTimeout = 125 * 1000; // Must be slightly longer than keepAliveTimeout
await app.listen(3000);

5. ConfigService Returning Undefined Silently

config.get<string>('anthropic.apiKey') will return undefined if the env var isn't set, but TypeScript won't catch this at compile time. Use getOrThrow (NestJS 9.4+) or the explicit startup check in useFactory:

// Safe: throws at startup if the key is missing
const apiKey = this.config.getOrThrow<string>('anthropic.apiKey');

Docker Compose Production Configuration

services:
  api:
    build: .
    ports:
      - "3000:3000"
    environment:
      NODE_ENV: production
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
      DATABASE_URL: postgresql://postgres:${POSTGRES_PASSWORD}@postgres:5432/claude_api
      REDIS_URL: redis://redis:6379
      JWT_SECRET: ${JWT_SECRET}
    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
 
  postgres:
    image: postgres:16-alpine
    environment:
      POSTGRES_DB: claude_api
      POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 10s
      timeout: 5s
      retries: 5
 
  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 5
 
volumes:
  postgres_data:
  redis_data:

The condition: service_healthy in depends_on prevents the race condition where the API container starts before PostgreSQL is ready to accept connections. Without this, deployments frequently fail with transient DB connection errors on startup.

Implement the health endpoint using @nestjs/terminus:

// src/health/health.controller.ts
import { Controller, Get } from '@nestjs/common';
import {
  HealthCheck, HealthCheckService,
  TypeOrmHealthIndicator, MicroserviceHealthIndicator,
} from '@nestjs/terminus';
 
@Controller('health')
export class HealthController {
  constructor(
    private health: HealthCheckService,
    private db: TypeOrmHealthIndicator,
  ) {}
 
  @Get()
  @HealthCheck()
  check() {
    return this.health.check([
      () => this.db.pingCheck('database'),
    ]);
  }
}

Cost Monitoring and Observability

Storing token counts on Message entities enables per-user cost attribution and budget alerts. Add an Interceptor to log latency and token usage across all AI endpoints:

// src/ai/ai-logging.interceptor.ts
import {
  Injectable, NestInterceptor, ExecutionContext,
  CallHandler, Logger,
} from '@nestjs/common';
import { Observable, tap } from 'rxjs';
 
@Injectable()
export class AiLoggingInterceptor implements NestInterceptor {
  private readonly logger = new Logger('AiMetrics');
 
  intercept(context: ExecutionContext, next: CallHandler): Observable<unknown> {
    const start = Date.now();
    const request = context.switchToHttp().getRequest();
 
    return next.handle().pipe(
      tap(() => {
        const duration = Date.now() - start;
        this.logger.log('AI request completed', {
          userId: request.user?.id,
          path: request.path,
          durationMs: duration,
        });
      }),
    );
  }
}

For prompt caching strategies that can reduce costs by 50-80% when system prompts are static, see Claude API cost optimization patterns. Combining caching with NestJS's singleton DI pattern for the Anthropic client gives you optimal performance with minimal configuration.

The immediate next step after deploying this stack is to hook the /health endpoint into an uptime monitor. That single change cuts incident response time significantly—you'll know something is broken before a user reports it, and you'll know whether the issue is in the app, the database, or Redis.

NestJS's conventions feel like overhead at first. The payoff comes when a new team member joins and can navigate the codebase without a guided tour, when a test needs to mock the Claude API and it's a three-line provider override, and when adding a WebSocket gateway means creating a new module rather than restructuring existing files. That's the trade-off this architecture is making—and for production AI services maintained by growing teams, it's consistently the right one.

Unit Testing with DI Mocking

The DI architecture pays its most obvious dividend in tests. Swapping out the real Anthropic client for a mock is a single provider override:

// src/ai/ai.service.spec.ts
import { Test, TestingModule } from '@nestjs/testing';
import { ConfigService } from '@nestjs/config';
import Anthropic from '@anthropic-ai/sdk';
import { AiService } from './ai.service';
import { ANTHROPIC_CLIENT } from './ai.module';
 
const mockAnthropicClient = {
  messages: {
    create: jest.fn(),
    stream: jest.fn(),
  },
};
 
describe('AiService', () => {
  let service: AiService;
 
  beforeEach(async () => {
    const module: TestingModule = await Test.createTestingModule({
      providers: [
        AiService,
        {
          provide: ANTHROPIC_CLIENT,
          // The real client is replaced entirely—no HTTP calls in tests
          useValue: mockAnthropicClient,
        },
        {
          provide: ConfigService,
          useValue: {
            get: (key: string) => {
              const config: Record<string, string | number> = {
                'anthropic.defaultModel': 'claude-sonnet-4-6',
                'anthropic.maxTokens': 1024,
              };
              return config[key];
            },
          },
        },
      ],
    }).compile();
 
    service = module.get<AiService>(AiService);
  });
 
  afterEach(() => {
    jest.clearAllMocks();
  });
 
  it('should return text content from Claude API', async () => {
    mockAnthropicClient.messages.create.mockResolvedValueOnce({
      content: [{ type: 'text', text: 'Hello from Claude' }],
    });
 
    const result = await service.chat([
      { role: 'user', content: 'Hello' },
    ]);
 
    expect(result).toBe('Hello from Claude');
    expect(mockAnthropicClient.messages.create).toHaveBeenCalledWith(
      expect.objectContaining({
        model: 'claude-sonnet-4-6',
        max_tokens: 1024,
        messages: [{ role: 'user', content: 'Hello' }],
      }),
    );
  });
 
  it('should propagate API errors', async () => {
    mockAnthropicClient.messages.create.mockRejectedValueOnce(
      new Error('Rate limit exceeded'),
    );
 
    await expect(
      service.chat([{ role: 'user', content: 'Hello' }]),
    ).rejects.toThrow('Rate limit exceeded');
  });
});

This test pattern works because the ANTHROPIC_CLIENT token acts as a seam—real behavior in production, mock behavior in tests, with zero changes to AiService itself. The same pattern applies to ConversationService: swap the TypeORM repositories for in-memory mocks in the test module and all service logic can be tested without a database connection.

Integration Testing with a Real Database

For integration tests, use a test PostgreSQL instance and TypeORM's synchronize: true to create the schema fresh for each test run:

// src/conversation/conversation.service.integration-spec.ts
import { Test } from '@nestjs/testing';
import { TypeOrmModule } from '@nestjs/typeorm';
import { ConversationService } from './conversation.service';
import { Conversation } from './entities/conversation.entity';
import { Message } from './entities/message.entity';
 
describe('ConversationService (integration)', () => {
  let service: ConversationService;
 
  beforeAll(async () => {
    const module = await Test.createTestingModule({
      imports: [
        TypeOrmModule.forRoot({
          type: 'postgres',
          url: process.env.TEST_DATABASE_URL,
          entities: [Conversation, Message],
          synchronize: true,  // OK in test; never in production
          dropSchema: true,   // Start fresh each run
        }),
        TypeOrmModule.forFeature([Conversation, Message]),
      ],
      providers: [ConversationService],
    }).compile();
 
    service = module.get<ConversationService>(ConversationService);
  });
 
  it('should create and retrieve a conversation', async () => {
    const conv = await service.createConversation('user-123', 'You are helpful.');
    expect(conv.id).toBeDefined();
 
    await service.appendMessage(conv.id, 'user', 'Hello');
    await service.appendMessage(conv.id, 'assistant', 'Hi there\!');
 
    const retrieved = await service.getConversationWithMessages(conv.id, 'user-123');
    expect(retrieved.messages).toHaveLength(2);
    expect(retrieved.messages[0].role).toBe('user');
  });
});

Monitoring and Alerting in Production

Structured logging from the AiLoggingInterceptor becomes useful for anomaly detection. Pipe logs to a centralized service (Datadog, Grafana Loki, or CloudWatch) and set up alerts for:

P99 latency exceeding 10 seconds on the /ai/stream endpoint
Error rate exceeding 1% over a 5-minute window
Monthly token usage crossing 80% of your budget threshold

The getMonthlyTokenUsage method on ConversationService gives you the data for that last alert. A simple cron job that checks usage daily and sends a notification when the threshold is crossed is enough for early-stage products:

// src/billing/billing.scheduler.ts
import { Injectable } from '@nestjs/common';
import { Cron, CronExpression } from '@nestjs/schedule';
import { ConversationService } from '../conversation/conversation.service';
 
@Injectable()
export class BillingScheduler {
  constructor(private conversationService: ConversationService) {}
 
  @Cron(CronExpression.EVERY_DAY_AT_9AM)
  async checkBudgetUsage() {
    // Aggregate across all users and compare against monthly budget
    // Send alert if projected monthly spend exceeds threshold
    // Implementation depends on your notification infrastructure
  }
}

Deployment Options Beyond Docker Compose

Docker Compose works well for self-hosted deployments on a single VPS. For managed deployments, the same containerized app deploys directly to Railway, Fly.io, or Render with minimal configuration changes.

Railway in particular works well with this stack: it can provision a PostgreSQL database and Redis instance alongside your app container, inject their connection strings as environment variables, and run health checks automatically. The Procfile for Railway is simply:

web: node dist/main.js
worker: node dist/queue-worker.js

Separate the Bull queue processor into its own process (queue-worker.ts) so it can scale independently from the HTTP server. Under load, you might run 2 HTTP replicas and 4 worker replicas without changing any application code.

When This Architecture Is (and Isn't) Worth It

This setup—NestJS, TypeORM, Bull, JWT—has real upfront cost. You're writing more files and more configuration than you would in Express.

The architecture earns its keep when: the service handles money or user data (the auth and validation patterns matter), the team has more than 2 people working on the backend simultaneously, or the service is expected to last more than 6 months.

It's overkill when: you're building a prototype to validate an idea, the service will be replaced when you find product-market fit, or you're the only engineer and don't need the conventions.

For teams in the first category, the patterns in this guide—DI-based client registration, TypeORM entities for conversation history, Observable-based SSE streaming, and Bull for async jobs—represent a production-ready baseline. Add the monitoring, health checks, and test coverage, and you have an AI backend that can scale with your product.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.