CLAUDE LABJP
WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skillsWWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Articles/API & SDK
API & SDK/2026-05-08Intermediate

Type-Safe Claude API Tool Calling with Zod: Patterns for TypeScript Developers

How to implement Claude API tool calling with TypeScript and Zod for full type safety. Covers schema-to-API conversion, runtime validation, and three common pitfalls with practical code examples.

claude-api71typescript18tool-use26zod2type-safe

When I first started implementing Claude API tool calling, my codebase was full of as any casts.

The input field in a tool_use block arrives as Record<string, unknown>, and every handler I wrote needed repeated type assertions to get anything useful out of it. TypeScript was technically in use, but the parts that mattered most—where the LLM returns structured data—were effectively untyped.

The real problem wasn't boilerplate. It was that validation errors only surfaced at runtime, in production, after the LLM returned something unexpected.

Using Zod solved this. A single schema definition gives you three things at once: the Claude API tool definition, the TypeScript type, and runtime validation. This article walks through that implementation, along with three specific pitfalls that tripped me up along the way.

Why Zod Works Here

TypeScript's type system operates at compile time. LLM outputs are generated at runtime—so TypeScript types alone can't protect you from malformed tool inputs.

Zod bridges that gap. With z.infer<typeof Schema>, TypeScript types are derived automatically from the schema, eliminating the need to maintain them separately. And with safeParse, you can validate the actual value the LLM returns before acting on it.

The other benefit: using zod-to-json-schema, the same Zod schema can be converted into the JSON Schema format that Claude API's input_schema field requires. One definition, three artifacts.

Installation and Basic Setup

npm install zod zod-to-json-schema @anthropic-ai/sdk

Here's a minimal working example with a weather tool:

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Define tool input with Zod
const GetWeatherSchema = z.object({
  location: z.string().describe("City name (e.g., Tokyo, New York)"),
  unit: z
    .enum(["celsius", "fahrenheit"])
    .optional()
    .describe("Temperature unit. Defaults to celsius"),
});
 
// TypeScript type derived automatically—no duplication
type GetWeatherInput = z.infer<typeof GetWeatherSchema>;
 
// Convert to Claude API's expected format
const getWeatherTool: Anthropic.Tool = {
  name: "get_weather",
  description: "Get the current weather for a given city",
  input_schema: zodToJsonSchema(GetWeatherSchema, {
    $refStrategy: "none", // Critical: inline-expand all $ref nodes
  }) as Anthropic.Tool["input_schema"],
};

The $refStrategy: "none" option is important. By default, zod-to-json-schema uses $ref references in its output. The Claude API JSON Schema parser can fail to resolve these correctly, silently falling back to untyped behavior. Setting it to "none" forces full inline expansion, which is reliably handled.

Runtime Validation of Tool Call Results

When the LLM invokes a tool, tool_use.input is typed as Record<string, unknown>. Validation should happen before any handler logic runs:

async function processToolCall(
  toolUseBlock: Anthropic.ToolUseBlock
): Promise<string> {
  if (toolUseBlock.name === "get_weather") {
    const result = GetWeatherSchema.safeParse(toolUseBlock.input);
 
    if (!result.success) {
      // Log structured validation errors for debugging
      console.error("Tool input validation failed:", result.error.format());
      return JSON.stringify({
        error: "Invalid input format",
        details: result.error.format(),
      });
    }
 
    // result.data is now fully typed as GetWeatherInput
    const { location, unit = "celsius" } = result.data;
    return JSON.stringify(await fetchWeather(location, unit));
  }
 
  return JSON.stringify({ error: "Unknown tool" });
}

safeParse vs parse: parse throws a ZodError on failure. In tool execution contexts where you want to return a structured error response rather than crash, safeParse is easier to work with. The result.error.format() output is also useful for understanding exactly what value the LLM passed—handy for prompt debugging.

Pitfall #1: optional and nullable Are Different

When Claude calls a tool with an optional field, it sometimes passes null rather than omitting the field entirely. Zod's optional() allows undefined but not null—so null from the LLM will fail validation.

// ⚠️ Fails when LLM passes null for optional fields
const UnsafeSchema = z.object({
  description: z.string().optional(),
});
 
// ✅ Handle both undefined and null
const SafeSchema = z.object({
  description: z.string().optional().nullable().describe("Optional description"),
});

For any field that might be omitted, .optional().nullable() is the safe default. I discovered this the hard way after intermittent validation failures that were hard to reproduce locally—LLMs don't consistently choose between omission and null.

Pitfall #2: Nested union Types Are Fragile

Schemas with z.union() often convert to anyOf in JSON Schema, and LLMs can return values that partially match multiple branches, causing validation failures.

// ⚠️ LLMs sometimes return values that don't cleanly match either branch
const ProblematicSchema = z.object({
  action: z.union([
    z.object({ type: z.literal("search"), query: z.string() }),
    z.object({ type: z.literal("fetch"), id: z.number() }),
  ]),
});
 
// ✅ Flatten into a single object with optional fields
const StableSchema = z.object({
  type: z.enum(["search", "fetch"]).describe("Action to perform"),
  query: z
    .string()
    .optional()
    .nullable()
    .describe("Search query (required when type=search)"),
  id: z
    .number()
    .optional()
    .nullable()
    .describe("Target ID (required when type=fetch)"),
});

Flattening union types into optional fields on a single object removes the ambiguity. The LLM just fills in the fields relevant to the selected action type.

Pitfall #3: $ref Not Inline-Expanded

This one is easy to miss because the error isn't always obvious. If you forget $refStrategy: "none" and your schema has any internal references (common with reused sub-schemas), the tool definition may silently pass validation but behave unexpectedly during tool use.

Always include this option:

zodToJsonSchema(YourSchema, { $refStrategy: "none" })

A Reusable Tool Factory

As the number of tools grows, keeping schema definitions and handlers in sync becomes error-prone. A typed factory function makes the relationship explicit:

type ToolHandler<T extends z.ZodTypeAny> = (
  input: z.infer<T>
) => Promise<string>;
 
function defineTool<T extends z.ZodObject<z.ZodRawShape>>(config: {
  name: string;
  description: string;
  schema: T;
  handler: ToolHandler<T>;
}) {
  return {
    definition: {
      name: config.name,
      description: config.description,
      input_schema: zodToJsonSchema(config.schema, {
        $refStrategy: "none",
      }) as Anthropic.Tool["input_schema"],
    },
    execute: async (rawInput: unknown): Promise<string> => {
      const result = config.schema.safeParse(rawInput);
      if (!result.success) {
        console.error(`[${config.name}] Validation error:`, result.error.format());
        return JSON.stringify({ error: "Invalid input", details: result.error.format() });
      }
      return config.handler(result.data);
    },
  };
}
 
// Define each tool in one place
const weatherTool = defineTool({
  name: "get_weather",
  description: "Get current weather for a city",
  schema: GetWeatherSchema,
  handler: async ({ location, unit }) =>
    JSON.stringify(await fetchWeather(location, unit ?? "celsius")),
});
 
// Wire everything together for the agent loop
const tools = [weatherTool];
const toolMap = new Map(tools.map((t) => [t.definition.name, t]));
 
async function runAgent(userMessage: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      tools: tools.map((t) => t.definition),
      messages,
    });
 
    if (response.stop_reason === "end_turn") {
      const text = response.content.find((b) => b.type === "text");
      return text?.text ?? "";
    }
 
    if (response.stop_reason === "tool_use") {
      messages.push({ role: "assistant", content: response.content });
 
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type !== "tool_use") continue;
 
        const tool = toolMap.get(block.name);
        const result = tool
          ? await tool.execute(block.input)
          : JSON.stringify({ error: `Unknown tool: ${block.name}` });
 
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: result });
      }
 
      messages.push({ role: "user", content: toolResults });
    }
  }
}

The handler in defineTool receives a fully typed, already-validated value. It can be unit tested without mocking the LLM—just call handler directly with a test input.

What Changed After Adopting This Pattern

The main win wasn't eliminating runtime errors, though that improved. It was the debugging experience. When validation fails, result.error.format() tells you exactly what the LLM passed and why it didn't match the schema. That information is useful both for diagnosing broken behavior and for refining prompts.

Getting Started

If you have existing tool definitions, the lowest-friction path is converting them one at a time: replace the input_schema object with a Zod schema, add safeParse to the handler, and let z.infer replace your manually written input types.

The defineTool factory is worth setting up before you have many tools rather than after—retrofitting is more disruptive than starting with it. For tool-heavy agents, the investment pays off quickly.

Testing Tool Handlers in Isolation

One underrated benefit of the defineTool pattern: handlers are testable without any LLM involvement. Since safeParse happens inside execute, and handler only runs with already-validated data, you can write focused unit tests for each tool's logic.

import { describe, it, expect, vi } from "vitest";
 
// Mock the actual API call, but keep the Zod validation real
vi.mock("./weather-api", () => ({
  fetchWeather: vi.fn().mockResolvedValue({
    temperature: 22,
    condition: "sunny",
  }),
}));
 
describe("weatherTool", () => {
  it("returns weather data for a valid location", async () => {
    const result = await weatherTool.execute({
      location: "Tokyo",
      unit: "celsius",
    });
    const parsed = JSON.parse(result);
    expect(parsed.temperature).toBe(22);
  });
 
  it("rejects invalid input and returns a structured error", async () => {
    // location is required—passing undefined should fail validation
    const result = await weatherTool.execute({ unit: "celsius" });
    const parsed = JSON.parse(result);
    expect(parsed.error).toBe("Invalid input");
    expect(parsed.details).toBeDefined();
  });
 
  it("handles null unit gracefully", async () => {
    // LLMs sometimes pass null for optional fields
    const result = await weatherTool.execute({
      location: "Osaka",
      unit: null,
    });
    const parsed = JSON.parse(result);
    expect(parsed.temperature).toBeDefined();
  });
});

These tests run fast, don't consume API credits, and catch schema regressions before deployment. The third test case—unit: null—is specifically the scenario that tripped me up in production. Adding it as an explicit test case makes sure it stays covered.

Alternatives Worth Knowing

If you prefer a more lightweight approach, typebox offers similar capabilities with a different API and generally smaller bundle size. The tradeoff is that TypeBox schemas use a builder pattern rather than chainable methods, which some teams find less ergonomic.

For projects already using Valibot, there's valibot-to-json-schema that serves the same conversion purpose. The pattern described here applies equally—the key insight is using a schema library that covers both the TypeScript type side and the JSON Schema output side.

The specific library matters less than the pattern: define once, derive the API definition and TypeScript types from the same source, validate at runtime before processing.

Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API & SDK2026-03-28
Build a Slack Bot with Claude API — AI-Powered Workplace Automation
Learn how to build an AI-powered Slack chatbot using Claude API and the Slack Bolt SDK. Step-by-step guide covering mention responses, threaded conversations, and Tool Use integration with production-ready code.
API & SDK2026-05-26
Stabilizing Claude API Structured Responses in Production — Notes on tool_use, JSON Schema, and Layered Validation
Getting Claude to return JSON takes a few lines. Keeping that JSON usable in production is a different problem. Here is the layered design I landed on after running a wallpaper classification pipeline through Claude API, built around tool_use, JSON Schema, and domain validation.
API & SDK2026-05-22
Why tool_result could not be submitted Keeps Coming Back, and How to Build a Recovery Handler That Actually Holds
Run a Claude agent long enough and one day it starts: 'tool_result could not be submitted', back to back, and retries change nothing. The error message hides four completely different root causes. Here is what I learned debugging this across the six auto-publishing pipelines I run as an indie developer, with the TypeScript recovery handler I now ship in production.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →