⬡ API & SDK/2026-04-10Advanced

Designing Production Architecture for Claude Managed Agents — Sandboxed Execution, Persistent Memory, Credential Management, and Cost Optimization Patterns

A practical guide to designing production-grade architectures with Claude Managed Agents. Covers sandboxed execution, persistent memory, credential management, multi-agent orchestration, and cost optimization.

managed-agents⁵ production¹¹¹ architecture¹⁰ api³⁹ enterprise⁵ security¹³

✦ Premium Article

Taking Claude Managed Agents to Production

Claude Managed Agents, released as a public beta in April 2026, is a cloud-hosted platform for building and running agents. With built-in sandboxed execution, persistent memory, credential management, and end-to-end tracing, it's a platform that compresses months of agent development into weeks.

However, the transition from prototype to production involves numerous critical design decisions. How do you meet security requirements? How should persistent memory be architected? How do you orchestrate multiple agents? And how do you optimize runtime billing? This article systematically walks through practical design patterns for each of these challenges.

For an introduction to Managed Agents concepts and setup, check out our "Claude Managed Agents Complete Guide" first.

Sandbox Execution Environment Design Patterns

Managed Agents runs each agent in a fully isolated sandbox environment. This minimizes the risk of agents causing unintended side effects on external systems while providing fine-grained control over tool and resource access.

Execution Environment Components

Each sandbox includes the following components:

Code execution runtime: Supports Python, Node.js, and shell scripts
Filesystem: Agent-specific temporary storage (automatically cleaned up when the session ends)
Network access: Allowlist-based external API calls
Tool bindings: Dynamic connections to MCP servers and custom tools

In production, explicitly configuring security policies for each of these is essential.

// Sandbox configuration when creating an agent
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const agent = await client.agents.create({
  name: "data-processor",
  model: "claude-sonnet-4-6",
  instructions: "Data processing agent. External API communication restricted to allowlist only.",
  sandbox: {
    // Network access allowlist
    allowed_domains: [
      "api.internal.example.com",
      "storage.googleapis.com"
    ],
    // Filesystem restrictions
    filesystem: {
      max_size_mb: 512,
      writable_paths: ["/workspace", "/tmp"],
      read_only_paths: ["/config"]
    },
    // Maximum runtime (seconds)
    max_runtime_seconds: 3600,
    // Memory limit
    max_memory_mb: 2048
  },
  tools: [
    { type: "code_execution" },
    { type: "mcp", server_url: "https://mcp.internal.example.com/data" }
  ]
});
 
console.log(`Agent created: ${agent.id}`);
// Output: Agent created: agent_01JZ8K...

Production Security Layers

For production deployments, we recommend a three-layer security architecture.

Layer 1 — Agent-level restrictions use sandbox configuration (as shown above) to limit network access, filesystem operations, and runtime duration. Layer 2 — Minimized authentication scopes use the credential management features (covered below) to grant each agent only the permissions it needs. Layer 3 — Monitoring and alerting leverage OpenTelemetry traces and log integration to detect anomalous behavior immediately.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If you've been struggling with authentication and memory design for Managed Agents in production, you'll be able to build a secure, scalable architecture right away

✦You'll understand how sandbox execution environments and checkpoint mechanisms work, enabling you to build recoverable agents that never lose state during failures

✦You'll master the agent runtime billing model ($0.08/h) and learn concrete techniques for idle cost reduction and batch processing optimization that cut monthly costs by 40–60%

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Persistent Memory Design and Implementation

One of the most powerful features of Managed Agents is persistent memory that retains context across sessions. Agents can remember past interactions and learning outcomes, applying them in future sessions.

Memory Architecture Design

A practical approach is to design persistent memory in three tiers:

// Memory architecture pattern
const agent = await client.agents.create({
  name: "customer-support-agent",
  model: "claude-sonnet-4-6",
  instructions: `
    You are a customer support agent.
    Reference past conversation history to understand the customer's context
    before responding.
  `,
  memory: {
    // Short-term: conversation context within the current session
    short_term: {
      enabled: true,
      max_turns: 50
    },
    // Medium-term: user-specific state (tickets, preferences, interaction patterns)
    medium_term: {
      enabled: true,
      storage: "per_user",
      ttl_days: 90,
      max_entries: 1000
    },
    // Long-term: organization-level knowledge (FAQ, policies, product info)
    long_term: {
      enabled: true,
      storage: "shared",
      source: "knowledge_base",
      refresh_interval_hours: 24
    }
  }
});

State Recovery with Checkpoints

For long-running agent tasks, the checkpoint mechanism enables recovery from interruptions. This benefits both disaster recovery and cost optimization (explained in the idle time reduction section below).

// Task execution with checkpoints
const session = await client.agents.sessions.create({
  agent_id: agent.id,
  checkpoint: {
    enabled: true,
    interval_seconds: 300, // Checkpoint every 5 minutes
    max_checkpoints: 10
  }
});
 
// Execute the task
const result = await client.agents.sessions.run(session.id, {
  messages: [
    {
      role: "user",
      content: "Analyze the past 30 days of sales data and generate a report"
    }
  ]
});
 
// Resume from interruption (after failure or timeout)
if (result.status === "interrupted") {
  const resumed = await client.agents.sessions.resume(session.id, {
    from_checkpoint: "latest"
  });
  console.log(`Resumed from checkpoint: ${resumed.checkpoint_id}`);
  // Output: Resumed from checkpoint: ckpt_01JZ9M...
}

Memory Design Anti-Patterns

Here are common pitfalls we've seen in production deployments.

Anti-pattern 1: Unbounded memory accumulation. Running without setting max_entries on medium-term memory leads to growing token consumption in the context window, causing response latency and cost increases. Always set TTL and entry limits.

Anti-pattern 2: Storing all data in memory. Structured data (product catalogs, inventory, etc.) should live in external databases accessed via MCP tools. Memory should only store contextual information like "this user asked about X last time."

Anti-pattern 3: Inappropriate checkpoint intervals. Too frequent creates I/O overhead; too infrequent risks significant data loss during failures. A 5–15 minute range works well for most task types.

Credential Management and Scoped Permissions

The most critical design consideration for production agents is credential management. Managed Agents includes built-in credential management for securely passing API keys and OAuth tokens to agents.

Registering and Using Credentials

// Register secrets (admin pre-configuration)
await client.agents.secrets.create({
  agent_id: agent.id,
  name: "GITHUB_TOKEN",
  value: process.env.GITHUB_PAT, // From environment variable
  scope: "session", // Accessible only within a session
  permissions: ["read"] // Read-only
});
 
await client.agents.secrets.create({
  agent_id: agent.id,
  name: "DATABASE_URL",
  value: process.env.DATABASE_URL,
  scope: "agent", // Shared across the agent
  permissions: ["read"]
});
 
// Secrets are available as environment variables ($GITHUB_TOKEN)
// inside the code execution tool.
// Automatic filtering prevents secret leakage in agent responses.

Implementing the Principle of Least Privilege

In enterprise environments, strictly limiting what resources each agent can access through "scoped permissions" is critical.

// Permission design example: sales report agent
const salesAgent = await client.agents.create({
  name: "sales-report-agent",
  model: "claude-sonnet-4-6",
  instructions: "Agent for generating monthly sales reports",
  permissions: {
    // Data source access control
    data_access: [
      {
        resource: "crm_api",
        operations: ["read"],
        filters: { department: "sales" }
      },
      {
        resource: "analytics_db",
        operations: ["read"],
        filters: { tables: ["revenue", "pipeline", "forecasts"] }
      }
    ],
    // Output destination control
    output_destinations: [
      { type: "file", path: "/reports/" },
      { type: "api", endpoint: "https://slack.internal/webhook" }
    ],
    // Inter-agent communication control
    agent_communication: {
      can_spawn: false, // Cannot spawn sub-agents
      can_message: ["analytics-agent"] // Can only communicate with analytics agent
    }
  }
});

Multi-Agent Orchestration Design Patterns

As a Research Preview feature, Managed Agents supports dynamically spawning agents from within other agents. This enables breaking complex tasks into specialized sub-agents.

Orchestrator Pattern

The most common pattern uses a single orchestrator agent to coordinate multiple specialized agents.

// Orchestrator agent design
const orchestrator = await client.agents.create({
  name: "project-orchestrator",
  model: "claude-sonnet-4-6",
  instructions: `
    You are a project management orchestrator.
    Analyze user requests and delegate work to the appropriate
    specialized agents.
    
    Available specialized agents:
    - code-reviewer: Code review and quality checks
    - test-generator: Automated test code generation
    - doc-writer: Documentation and API reference creation
    - security-auditor: Security vulnerability detection
    
    Integrate results from each agent into a final report.
  `,
  sub_agents: {
    enabled: true,
    max_concurrent: 3,
    allowed_agents: [
      "code-reviewer",
      "test-generator",
      "doc-writer",
      "security-auditor"
    ],
    timeout_seconds: 600
  }
});
 
// Submit a task to the orchestrator
const result = await client.agents.sessions.run(
  orchestratorSession.id,
  {
    messages: [{
      role: "user",
      content: `
        Please review the following PR: https://github.com/org/repo/pull/42
        Run code review, test generation, documentation updates, and
        security audit in parallel, then produce a consolidated report.
      `
    }]
  }
);
 
// Access individual sub-agent results via result.sub_agent_results

Pipeline Pattern

For data processing workflows, a sequential pipeline pattern is highly effective.

// Pipeline-style multi-agent setup
const pipeline = await client.agents.pipelines.create({
  name: "data-etl-pipeline",
  stages: [
    {
      agent: "data-extractor",
      input: "raw_data_url",
      output: "extracted_data",
      retry: { max_attempts: 3, backoff: "exponential" }
    },
    {
      agent: "data-transformer",
      input: "extracted_data",
      output: "transformed_data",
      checkpoint: true // Save checkpoint between stages
    },
    {
      agent: "data-validator",
      input: "transformed_data",
      output: "validated_data",
      on_failure: "pause" // Pause on validation failure
    },
    {
      agent: "report-generator",
      input: "validated_data",
      output: "final_report"
    }
  ]
});
 
// Async pipeline execution
const run = await client.agents.pipelines.run(pipeline.id, {
  raw_data_url: "https://storage.example.com/sales-2026-q1.csv"
});
 
// Poll for stage progress
const status = await client.agents.pipelines.get_status(run.id);
console.log(`Current stage: ${status.current_stage}`);
console.log(`Progress: ${status.completed_stages}/${status.total_stages}`);
// Output: Current stage: data-transformer
// Output: Progress: 1/4

Error Handling in Multi-Agent Systems

The most critical concern in multi-agent environments is resilience against partial failures.

// Orchestration with error recovery
const runWithRecovery = async (orchestratorSessionId, task) => {
  try {
    const result = await client.agents.sessions.run(
      orchestratorSessionId,
      { messages: [{ role: "user", content: task }] }
    );
    
    // Check each sub-agent result individually
    for (const subResult of result.sub_agent_results || []) {
      if (subResult.status === "failed") {
        console.warn(
          `Sub-agent ${subResult.agent_name} failed: ${subResult.error}`
        );
        // Retry the failed sub-agent's task
        await client.agents.sessions.retry_sub_agent(
          orchestratorSessionId,
          subResult.agent_name,
          { max_retries: 2 }
        );
      }
    }
    
    return result;
  } catch (error) {
    if (error.status === 429) {
      // Rate limit: exponential backoff
      const waitMs = Math.pow(2, retryCount) * 1000;
      await new Promise(r => setTimeout(r, waitMs));
      return runWithRecovery(orchestratorSessionId, task);
    }
    throw error;
  }
};

Observability with OpenTelemetry

Visibility into agent behavior is essential for production systems. Managed Agents includes native OpenTelemetry integration.

Trace Configuration

// OpenTelemetry trace setup
const agent = await client.agents.create({
  name: "monitored-agent",
  model: "claude-sonnet-4-6",
  instructions: "Production-monitored agent",
  observability: {
    tracing: {
      enabled: true,
      exporter: {
        type: "otlp",
        endpoint: "https://otel-collector.internal:4317",
        headers: {
          "Authorization": "Bearer ${OTEL_TOKEN}"
        }
      },
      // Events to trace
      events: [
        "session.start",
        "session.end",
        "tool.call",
        "tool.result",
        "sub_agent.spawn",
        "sub_agent.complete",
        "checkpoint.save",
        "checkpoint.restore",
        "error"
      ],
      // Sampling rate (10–20% recommended for production)
      sampling_rate: 0.1
    },
    // Metrics collection
    metrics: {
      enabled: true,
      interval_seconds: 60,
      custom_metrics: [
        "tokens_used",
        "runtime_seconds",
        "tools_called",
        "sub_agents_spawned"
      ]
    }
  }
});

Production Monitoring Dashboard Design

For agent systems, prioritize tracking these metrics:

Response latency: P50 / P95 / P99 distributions. Sudden increases signal memory bloat or tool failures
Token consumption: Input/output tokens per session. Directly impacts cost forecasting and anomaly detection
Runtime duration: Active agent time. The most critical metric since it directly affects billing
Error rate: Frequency of tool call failures, timeouts, and rate limits
Checkpoint restore rate: Indicates whether automatic failure recovery is functioning properly

Cost Optimization Strategies

Managed Agents billing has a dual structure: Claude model usage fees plus agent runtime at $0.08/hour. Production environments need to optimize both.

Reducing Runtime Costs

// Cost optimization pattern 1: Minimize idle time
const session = await client.agents.sessions.create({
  agent_id: agent.id,
  lifecycle: {
    // Idle detection: auto-suspend after specified inactivity
    idle_timeout_seconds: 120,
    // Suspend with checkpoint to stop billing
    on_idle: "suspend_with_checkpoint",
    // Auto-resume when new messages arrive
    on_resume: "restore_from_checkpoint"
  }
});
 
// Cost optimization pattern 2: Batch processing to consolidate runtime
const batchRun = await client.agents.batch.create({
  agent_id: agent.id,
  tasks: [
    { id: "task-1", content: "Generate Report A" },
    { id: "task-2", content: "Generate Report B" },
    { id: "task-3", content: "Generate Report C" }
  ],
  // Sequential processing in same session (one session startup cost)
  execution: "sequential_in_session",
  // Lower priority for cost savings (potential 50% off)
  priority: "low"
});

Reducing Token Costs

// Cost optimization pattern 3: Leverage prompt caching
const agent = await client.agents.create({
  name: "cost-optimized-agent",
  model: "claude-sonnet-4-6",
  instructions: "Cost-optimized agent",
  optimization: {
    // Prompt caching: cache common system prompts
    prompt_caching: {
      enabled: true,
      // Cache TTL (90% reduction when reused within 5 minutes)
      ttl_seconds: 300
    },
    // Context compression: auto-summarize long conversation histories
    context_compression: {
      enabled: true,
      trigger_tokens: 50000, // Start compressing above 50K tokens
      target_tokens: 20000   // Compress to 20K tokens
    },
    // Model selection optimization
    model_routing: {
      // Use Haiku for simple tasks
      simple_tasks: "claude-haiku-4-5",
      // Use Sonnet for complex tasks
      complex_tasks: "claude-sonnet-4-6",
      // Complexity threshold
      complexity_threshold: 0.7
    }
  }
});

Concrete Cost Estimation

Let's estimate monthly costs for a sales report agent that runs 10 times per day with an average of 15 minutes per run.

Runtime cost: 10 runs × 0.25 hours × $0.08/h × 30 days = $6/month
Model usage (Sonnet 4.6, ~10K tokens/run): 10 runs × $0.03 × 30 days = $9/month
Total: approximately $15/month

With idle time minimization and batch processing, you can cut runtime by about 40%. Adding Haiku routing for simple tasks reduces model costs by roughly 50%, bringing the total down to $8–9/month.

Enterprise Integration Patterns

Here are practical patterns for integrating Managed Agents into existing enterprise systems.

Identity Provider Integration

// OAuth 2.0 / SAML integration example
const enterpriseAgent = await client.agents.create({
  name: "enterprise-assistant",
  model: "claude-sonnet-4-6",
  instructions: "Internal company assistant agent",
  auth: {
    // Integrate with corporate IdP
    provider: "oauth2",
    issuer: "https://auth.example.com",
    audience: "managed-agents",
    // Propagate user tokens to agent sessions
    user_token_propagation: true,
    // RBAC: restrict agent permissions based on user roles
    role_mapping: {
      "admin": { data_access: "full", can_spawn_agents: true },
      "manager": { data_access: "department", can_spawn_agents: false },
      "member": { data_access: "own", can_spawn_agents: false }
    }
  }
});

Audit Logging and Compliance

Regulated industries like finance, healthcare, and government require audit logging of all agent actions.

// Audit logging configuration
const complianceAgent = await client.agents.create({
  name: "compliance-agent",
  model: "claude-sonnet-4-6",
  instructions: "Compliance-ready agent",
  audit: {
    enabled: true,
    // Encrypt and store all inputs/outputs
    log_messages: true,
    log_tool_calls: true,
    log_tool_results: true,
    // Storage destination (S3-compatible)
    destination: {
      type: "s3",
      bucket: "audit-logs-managed-agents",
      prefix: "compliance/",
      encryption: "AES-256"
    },
    // Retention period (set to match regulatory requirements)
    retention_days: 2555, // 7 years
    // PII masking
    pii_masking: {
      enabled: true,
      fields: ["email", "phone", "ssn", "credit_card"]
    }
  }
});

Looking back

Running Claude Managed Agents in production requires systematically designing five pillars: sandbox configuration, persistent memory architecture, credential management, multi-agent orchestration, and cost optimization.

The most critical aspects are building in security and observability from the start, and leveraging checkpoint mechanisms to create failure-resilient systems. On the cost side, combining idle time minimization, batch processing, and model routing can reduce monthly expenses by 40–60%.

While Managed Agents is still in public beta, major services like Notion, Rakuten, and Asana have already begun adoption. Building architectural expertise now will give you a competitive advantage when the platform reaches general availability.

For those looking to deepen their understanding of agent system architecture design,

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.