Type-Safe Claude API Tool Calling with Zod: Patterns for TypeScript Developers

When I first started implementing Claude API tool calling, my codebase was full of as any casts.

The input field in a tool_use block arrives as Record<string, unknown>, and every handler I wrote needed repeated type assertions to get anything useful out of it. TypeScript was technically in use, but the parts that mattered most—where the LLM returns structured data—were effectively untyped.

The real problem wasn't boilerplate. It was that validation errors only surfaced at runtime, in production, after the LLM returned something unexpected.

Using Zod solved this. A single schema definition gives you three things at once: the Claude API tool definition, the TypeScript type, and runtime validation. This article walks through that implementation, along with three specific pitfalls that tripped me up along the way.

Why Zod Works Here

TypeScript's type system operates at compile time. LLM outputs are generated at runtime—so TypeScript types alone can't protect you from malformed tool inputs.

Zod bridges that gap. With z.infer<typeof Schema>, TypeScript types are derived automatically from the schema, eliminating the need to maintain them separately. And with safeParse, you can validate the actual value the LLM returns before acting on it.

The other benefit: using zod-to-json-schema, the same Zod schema can be converted into the JSON Schema format that Claude API's input_schema field requires. One definition, three artifacts.

Installation and Basic Setup

npm install zod zod-to-json-schema @anthropic-ai/sdk

Here's a minimal working example with a weather tool:

import { z } from "zod";
import { zodToJsonSchema } from "zod-to-json-schema";
import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
// Define tool input with Zod
const GetWeatherSchema = z.object({
  location: z.string().describe("City name (e.g., Tokyo, New York)"),
  unit: z
    .enum(["celsius", "fahrenheit"])
    .optional()
    .describe("Temperature unit. Defaults to celsius"),
});
 
// TypeScript type derived automatically—no duplication
type GetWeatherInput = z.infer<typeof GetWeatherSchema>;
 
// Convert to Claude API's expected format
const getWeatherTool: Anthropic.Tool = {
  name: "get_weather",
  description: "Get the current weather for a given city",
  input_schema: zodToJsonSchema(GetWeatherSchema, {
    $refStrategy: "none", // Critical: inline-expand all $ref nodes
  }) as Anthropic.Tool["input_schema"],
};

The $refStrategy: "none" option is important. By default, zod-to-json-schema uses $ref references in its output. The Claude API JSON Schema parser can fail to resolve these correctly, silently falling back to untyped behavior. Setting it to "none" forces full inline expansion, which is reliably handled.

Runtime Validation of Tool Call Results

When the LLM invokes a tool, tool_use.input is typed as Record<string, unknown>. Validation should happen before any handler logic runs:

async function processToolCall(
  toolUseBlock: Anthropic.ToolUseBlock
): Promise<string> {
  if (toolUseBlock.name === "get_weather") {
    const result = GetWeatherSchema.safeParse(toolUseBlock.input);
 
    if (!result.success) {
      // Log structured validation errors for debugging
      console.error("Tool input validation failed:", result.error.format());
      return JSON.stringify({
        error: "Invalid input format",
        details: result.error.format(),
      });
    }
 
    // result.data is now fully typed as GetWeatherInput
    const { location, unit = "celsius" } = result.data;
    return JSON.stringify(await fetchWeather(location, unit));
  }
 
  return JSON.stringify({ error: "Unknown tool" });
}

safeParse vs parse: parse throws a ZodError on failure. In tool execution contexts where you want to return a structured error response rather than crash, safeParse is easier to work with. The result.error.format() output is also useful for understanding exactly what value the LLM passed—handy for prompt debugging.

Pitfall #1: optional and nullable Are Different

When Claude calls a tool with an optional field, it sometimes passes null rather than omitting the field entirely. Zod's optional() allows undefined but not null—so null from the LLM will fail validation.

// ⚠️ Fails when LLM passes null for optional fields
const UnsafeSchema = z.object({
  description: z.string().optional(),
});
 
// ✅ Handle both undefined and null
const SafeSchema = z.object({
  description: z.string().optional().nullable().describe("Optional description"),
});

For any field that might be omitted, .optional().nullable() is the safe default. I discovered this the hard way after intermittent validation failures that were hard to reproduce locally—LLMs don't consistently choose between omission and null.

Pitfall #2: Nested union Types Are Fragile

Schemas with z.union() often convert to anyOf in JSON Schema, and LLMs can return values that partially match multiple branches, causing validation failures.

// ⚠️ LLMs sometimes return values that don't cleanly match either branch
const ProblematicSchema = z.object({
  action: z.union([
    z.object({ type: z.literal("search"), query: z.string() }),
    z.object({ type: z.literal("fetch"), id: z.number() }),
  ]),
});
 
// ✅ Flatten into a single object with optional fields
const StableSchema = z.object({
  type: z.enum(["search", "fetch"]).describe("Action to perform"),
  query: z
    .string()
    .optional()
    .nullable()
    .describe("Search query (required when type=search)"),
  id: z
    .number()
    .optional()
    .nullable()
    .describe("Target ID (required when type=fetch)"),
});

Flattening union types into optional fields on a single object removes the ambiguity. The LLM just fills in the fields relevant to the selected action type.

Pitfall #3: $ref Not Inline-Expanded

This one is easy to miss because the error isn't always obvious. If you forget $refStrategy: "none" and your schema has any internal references (common with reused sub-schemas), the tool definition may silently pass validation but behave unexpectedly during tool use.

Always include this option:

zodToJsonSchema(YourSchema, { $refStrategy: "none" })

A Reusable Tool Factory

As the number of tools grows, keeping schema definitions and handlers in sync becomes error-prone. A typed factory function makes the relationship explicit:

type ToolHandler<T extends z.ZodTypeAny> = (
  input: z.infer<T>
) => Promise<string>;
 
function defineTool<T extends z.ZodObject<z.ZodRawShape>>(config: {
  name: string;
  description: string;
  schema: T;
  handler: ToolHandler<T>;
}) {
  return {
    definition: {
      name: config.name,
      description: config.description,
      input_schema: zodToJsonSchema(config.schema, {
        $refStrategy: "none",
      }) as Anthropic.Tool["input_schema"],
    },
    execute: async (rawInput: unknown): Promise<string> => {
      const result = config.schema.safeParse(rawInput);
      if (!result.success) {
        console.error(`[${config.name}] Validation error:`, result.error.format());
        return JSON.stringify({ error: "Invalid input", details: result.error.format() });
      }
      return config.handler(result.data);
    },
  };
}
 
// Define each tool in one place
const weatherTool = defineTool({
  name: "get_weather",
  description: "Get current weather for a city",
  schema: GetWeatherSchema,
  handler: async ({ location, unit }) =>
    JSON.stringify(await fetchWeather(location, unit ?? "celsius")),
});
 
// Wire everything together for the agent loop
const tools = [weatherTool];
const toolMap = new Map(tools.map((t) => [t.definition.name, t]));
 
async function runAgent(userMessage: string): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-6",
      max_tokens: 1024,
      tools: tools.map((t) => t.definition),
      messages,
    });
 
    if (response.stop_reason === "end_turn") {
      const text = response.content.find((b) => b.type === "text");
      return text?.text ?? "";
    }
 
    if (response.stop_reason === "tool_use") {
      messages.push({ role: "assistant", content: response.content });
 
      const toolResults: Anthropic.ToolResultBlockParam[] = [];
      for (const block of response.content) {
        if (block.type !== "tool_use") continue;
 
        const tool = toolMap.get(block.name);
        const result = tool
          ? await tool.execute(block.input)
          : JSON.stringify({ error: `Unknown tool: ${block.name}` });
 
        toolResults.push({ type: "tool_result", tool_use_id: block.id, content: result });
      }
 
      messages.push({ role: "user", content: toolResults });
    }
  }
}

The handler in defineTool receives a fully typed, already-validated value. It can be unit tested without mocking the LLM—just call handler directly with a test input.

What Changed After Adopting This Pattern

The main win wasn't eliminating runtime errors, though that improved. It was the debugging experience. When validation fails, result.error.format() tells you exactly what the LLM passed and why it didn't match the schema. That information is useful both for diagnosing broken behavior and for refining prompts.

Getting Started

If you have existing tool definitions, the lowest-friction path is converting them one at a time: replace the input_schema object with a Zod schema, add safeParse to the handler, and let z.infer replace your manually written input types.

The defineTool factory is worth setting up before you have many tools rather than after—retrofitting is more disruptive than starting with it. For tool-heavy agents, the investment pays off quickly.

Testing Tool Handlers in Isolation

One underrated benefit of the defineTool pattern: handlers are testable without any LLM involvement. Since safeParse happens inside execute, and handler only runs with already-validated data, you can write focused unit tests for each tool's logic.

import { describe, it, expect, vi } from "vitest";
 
// Mock the actual API call, but keep the Zod validation real
vi.mock("./weather-api", () => ({
  fetchWeather: vi.fn().mockResolvedValue({
    temperature: 22,
    condition: "sunny",
  }),
}));
 
describe("weatherTool", () => {
  it("returns weather data for a valid location", async () => {
    const result = await weatherTool.execute({
      location: "Tokyo",
      unit: "celsius",
    });
    const parsed = JSON.parse(result);
    expect(parsed.temperature).toBe(22);
  });
 
  it("rejects invalid input and returns a structured error", async () => {
    // location is required—passing undefined should fail validation
    const result = await weatherTool.execute({ unit: "celsius" });
    const parsed = JSON.parse(result);
    expect(parsed.error).toBe("Invalid input");
    expect(parsed.details).toBeDefined();
  });
 
  it("handles null unit gracefully", async () => {
    // LLMs sometimes pass null for optional fields
    const result = await weatherTool.execute({
      location: "Osaka",
      unit: null,
    });
    const parsed = JSON.parse(result);
    expect(parsed.temperature).toBeDefined();
  });
});

These tests run fast, don't consume API credits, and catch schema regressions before deployment. The third test case—unit: null—is specifically the scenario that tripped me up in production. Adding it as an explicit test case makes sure it stays covered.

Alternatives Worth Knowing

If you prefer a more lightweight approach, typebox offers similar capabilities with a different API and generally smaller bundle size. The tradeoff is that TypeBox schemas use a builder pattern rather than chainable methods, which some teams find less ergonomic.

For projects already using Valibot, there's valibot-to-json-schema that serves the same conversion purpose. The pattern described here applies equally—the key insight is using a schema library that covers both the TypeScript type side and the JSON Schema output side.

The specific library matters less than the pattern: define once, derive the API definition and TypeScript types from the same source, validate at runtime before processing.