⬡ API & SDK/2026-03-29Advanced

Claude API Think Tool — Dramatically Improve Tool Call Accuracy with Interleaved Reasoning in Agentic Workflows

Master the Claude API Think Tool pattern. Learn the key differences from Extended Thinking, implement interleaved reasoning in agent loops, and apply production design patterns that improve tool call accuracy by up to 54%.

claude-api⁸¹ think-tool agentic² tool-use²² interleaved-reasoning production¹¹¹

✦ Premium Article

What Is the Think Tool — The Critical Difference from Extended Thinking

One of the most impactful yet underappreciated techniques in Claude API agent development is the Think Tool pattern. Published by Anthropic's engineering team, this approach lets agents pause and reason during multi-step tool use chains, dramatically improving decision accuracy at each step.

The natural first question most developers ask is: "How is this different from Extended Thinking?" Despite the similar names, these two features operate at fundamentally different points in the response generation process, and understanding this distinction is the key to using both effectively.

Extended Thinking happens before Claude begins generating its visible response. When you enable the thinking parameter, Claude runs a deep internal reasoning chain before producing any output. This is useful for complex problems that require upfront planning and analysis, but it only happens once — at the very beginning.

The Think Tool is invoked during response generation. It's a regular tool that Claude can call between other tool calls to explicitly reason about the current situation, analyze intermediate results, and decide what to do next. Unlike Extended Thinking, which gives Claude one big opportunity to reason, the Think Tool provides multiple smaller reasoning checkpoints throughout an entire workflow.

Think of it this way: Extended Thinking is the deep breath before you start a chess game, carefully considering your opening strategy. The Think Tool is pausing after each of your opponent's moves to reassess the board and plan your next move. Both are valuable, but they serve very different purposes.

// Think Tool definition — simple yet powerful
const thinkTool = {
  name: "think",
  description:
    "Use this tool to think about the information you have gathered " +
    "and plan your next steps. Use it when you need to analyze data, " +
    "consider multiple options, or reflect on tool results before proceeding.",
  input_schema: {
    type: "object" as const,
    properties: {
      thought: {
        type: "string",
        description: "Your detailed reasoning and analysis",
      },
    },
    required: ["thought"],
  },
};

When this tool is called, no actual server-side processing occurs. Claude outputs its reasoning as structured text, which remains in the conversation context and directly informs subsequent decisions. It's essentially a structured scratchpad that lives within the agent's tool use flow — invisible to the end user but transformative for the quality of the agent's decisions.

Why the Think Tool Matters — The Agent Accuracy Wall

In complex agentic workflows, Claude frequently chains multiple tool calls together. Consider a customer support agent that needs to: retrieve customer info → check order history → look up return policies → verify eligibility → calculate refund amount → execute the refund. Each step involves parsing results and making decisions based on accumulated context.

The fundamental challenge is that as tool call chains grow longer, decision accuracy at each step tends to degrade. This happens for several interconnected reasons. First, relevant information from earlier tool calls gets buried deeper in the context as new results are added. Second, the model must hold multiple pieces of information in working memory while simultaneously planning the next action. Third, without explicit reasoning checkpoints, the model may jump to conclusions based on incomplete analysis of the available data.

This mirrors human cognition. When juggling multiple pieces of information while deciding on next actions, oversights and misjudgments become more likely. The solution in both cases is the same: pause, organize your thoughts, and then proceed.

Anthropic's benchmark data reveals significant improvements when the Think Tool is introduced:

Airline customer service domain: Pass metric improved from 0.370 to 0.570 — a 54% relative improvement
Retail customer service domain: Pass metric improved from 0.783 to 0.812

The improvement is particularly dramatic in the airline domain because it involves complex policy decisions with multiple interacting conditions — passenger status, ticket class, flight disruption reasons, compensation rules, and rebooking options all factor into a single decision. The Think Tool gives Claude a dedicated space to systematically work through which conditions apply to the specific case before taking action, rather than jumping directly from data retrieval to execution.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Understand the technical differences between Think Tool and Extended Thinking and when to use each

✦Master interleaved reasoning implementation patterns in agent loops with production-ready code

✦Apply best practices that improve tool call accuracy by up to 54% in production environments

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Server-Side Implementation

One of the Think Tool's greatest strengths is that server-side implementation is remarkably simple. When the tool is invoked, you simply return the thought content — no external processing, no database queries, no API calls. The entire value comes from giving Claude a structured space to reason within the flow.

import Anthropic from "@anthropic-ai/sdk";
 
// Initialize Anthropic client
const client = new Anthropic();
 
// Agent loop implementation with Think Tool support
async function runAgentLoop(
  userMessage: string,
  tools: Anthropic.Tool[]
): Promise<string> {
  const messages: Anthropic.MessageParam[] = [
    { role: "user", content: userMessage },
  ];
 
  while (true) {
    const response = await client.messages.create({
      model: "claude-opus-4-6",
      max_tokens: 8096,
      tools: tools,
      messages: messages,
    });
 
    // Add response to message history
    messages.push({ role: "assistant", content: response.content });
 
    // Exit loop if stop_reason is "end_turn"
    if (response.stop_reason === "end_turn") {
      const textBlock = response.content.find(
        (block) => block.type === "text"
      );
      return textBlock?.text ?? "";
    }
 
    // Process tool calls
    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of response.content) {
      if (block.type === "tool_use") {
        if (block.name === "think") {
          // Think Tool: return the thought as-is
          // No server-side processing needed
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: `Thought recorded: ${(block.input as { thought: string }).thought}`,
          });
        } else {
          // Regular tools: execute actual processing
          const result = await executeExternalTool(
            block.name,
            block.input
          );
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: result,
          });
        }
      }
    }
 
    // Add tool results to message history
    messages.push({ role: "user", content: toolResults });
  }
}

The key implementation detail that many developers miss is that Think Tool results must remain in the conversation context. Some developers are tempted to intercept Think Tool calls and discard them to save tokens, but this defeats the entire purpose. Claude references its previous Think Tool outputs when making subsequent decisions, creating a chain of reasoning that progressively builds understanding. Removing these thoughts from context is like tearing out pages from a detective's notebook mid-investigation.

Another important consideration is the tool result format. While you could return an empty string for Think Tool results, providing a confirmation like "Thought recorded: {thought}" reinforces to Claude that its reasoning has been captured and is available for reference. This subtle detail can improve the consistency of Think Tool usage across long conversations.

Designing Effective System Prompts

Maximizing Think Tool effectiveness requires clear system prompt guidance on when and how Claude should use the tool. Without this guidance, Claude may use the tool too frequently (wasting tokens), too rarely (missing critical reasoning opportunities), or with poorly structured thoughts (reducing reasoning quality).

const systemPrompt = `You are a customer support agent.
 
## Think Tool Usage Guidelines
 
Use the think tool BEFORE taking action in these situations:
 
1. **After receiving tool results**: When you need to interpret
   data returned by a tool before deciding what to do next
2. **Before policy decisions**: Before applying refund,
   cancellation, or exception policies — verify all conditions
3. **When facing multiple options**: When there are several
   possible next actions and the right choice isn't obvious
4. **When information conflicts**: When customer claims don't
   match system data — reason about the discrepancy
5. **Before final responses**: Review all gathered information
   before composing your response to the customer
 
## How to Structure Your Thinking
 
When using the think tool, organize your thoughts as follows:
- What I know so far (summarize key facts from tool results)
- What remains unclear (identify gaps in information)
- The action I should take next and why (explicit reasoning)
- Risks or caveats to consider (potential issues)
 
## When NOT to Use the Think Tool
 
Do not use the think tool for simple, straightforward steps
where the next action is obvious. For example, if the customer
asks for their order status and you need to call the
get_order tool, just call it directly.`;

The "when NOT to use" section is just as important as the "when to use" section. Without negative guidance, Claude can fall into a pattern of thinking before every single tool call, which adds latency and cost without proportional benefit. The goal is targeted reasoning at high-stakes decision points, not reflexive thinking at every step.

An advanced technique is to adjust the system prompt dynamically based on the complexity of the incoming request. For simple queries like order status checks, you might use a lightweight prompt that discourages Think Tool usage. For complex multi-policy scenarios like flight disruption compensation, you'd use a detailed prompt that strongly encourages systematic reasoning at each decision point.

Production Design Patterns — Three Architectures

Pattern 1: Stateful Agent Pattern

The most fundamental approach. Agent state is managed explicitly, with Think Tool outputs stored as part of the state for monitoring, debugging, and post-hoc analysis.

interface AgentState {
  conversationHistory: Anthropic.MessageParam[];
  thinkingLog: Array<{
    timestamp: number;
    thought: string;
    context: string;
  }>;
  toolCallCount: number;
  maxToolCalls: number;
}
 
async function statefulAgentLoop(
  state: AgentState,
  tools: Anthropic.Tool[]
): Promise<{ response: string; state: AgentState }> {
  while (state.toolCallCount < state.maxToolCalls) {
    const response = await client.messages.create({
      model: "claude-opus-4-6",
      max_tokens: 8096,
      tools: tools,
      messages: state.conversationHistory,
    });
 
    state.conversationHistory.push({
      role: "assistant",
      content: response.content,
    });
 
    if (response.stop_reason === "end_turn") {
      const text = response.content.find(
        (b) => b.type === "text"
      );
      return { response: text?.text ?? "", state };
    }
 
    const toolResults: Anthropic.ToolResultBlockParam[] = [];
    for (const block of response.content) {
      if (block.type === "tool_use") {
        state.toolCallCount++;
        if (block.name === "think") {
          const thought = (block.input as { thought: string })
            .thought;
          // Record thinking log for debugging and monitoring
          state.thinkingLog.push({
            timestamp: Date.now(),
            thought,
            context: `tool_call_${state.toolCallCount}`,
          });
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: thought,
          });
        } else {
          const result = await executeExternalTool(
            block.name,
            block.input
          );
          toolResults.push({
            type: "tool_result",
            tool_use_id: block.id,
            content: result,
          });
        }
      }
    }
 
    state.conversationHistory.push({
      role: "user",
      content: toolResults,
    });
  }
 
  return { response: "Tool call limit reached.", state };
}

The thinkingLog array is the workhorse of this pattern. After each conversation, you can export these logs for analysis, feeding them into dashboards that track reasoning quality over time. This is invaluable for prompt engineering iterations — when you see the agent consistently struggling with a particular type of reasoning, you can refine the system prompt to provide more specific guidance for that scenario.

Pattern 2: Pipeline Gate Pattern

This pattern analyzes Think Tool output and halts the pipeline if certain safety conditions aren't met — a critical safeguard for production agents that can take real-world actions.

interface ThinkAnalysis {
  confidence: "high" | "medium" | "low";
  risks: string[];
  shouldProceed: boolean;
}
 
function analyzeThought(thought: string): ThinkAnalysis {
  // Detect risk indicators and uncertainty
  const lowConfidenceSignals = [
    "not sure",
    "uncertain",
    "might be",
    "unclear",
    "possibly",
    "I think",
  ];
  const riskSignals = [
    "refund",
    "delete",
    "cancel",
    "override",
    "escalate",
  ];
 
  const hasLowConfidence = lowConfidenceSignals.some((s) =>
    thought.toLowerCase().includes(s)
  );
  const hasRisk = riskSignals.some((s) =>
    thought.toLowerCase().includes(s)
  );
 
  return {
    confidence: hasLowConfidence
      ? "low"
      : hasRisk
        ? "medium"
        : "high",
    risks: riskSignals.filter((s) =>
      thought.toLowerCase().includes(s)
    ),
    // Block when low confidence AND risky action
    shouldProceed: !(hasLowConfidence && hasRisk),
  };
}

When shouldProceed is false, your system should route the conversation to a human agent or request additional confirmation from the user. This creates a natural escalation path that prevents agents from executing uncertain, high-stakes actions while still allowing them to handle clear-cut cases autonomously.

In practice, you'd insert this analysis step into your agent loop immediately after a Think Tool call. If the analysis indicates the agent should not proceed, you inject a message telling Claude to seek clarification or escalate, rather than continuing with the next tool call.

Pattern 3: Multi-Agent Think Sharing Pattern

When multiple agents collaborate on a task, sharing Think Tool outputs maintains coherence and prevents agents from working at cross purposes.

class SharedThinkingContext {
  private thoughts: Map<string, string[]> = new Map();
 
  addThought(agentId: string, thought: string): void {
    const existing = this.thoughts.get(agentId) ?? [];
    existing.push(thought);
    this.thoughts.set(agentId, existing);
  }
 
  getSummary(): string {
    let summary = "=== Cross-Agent Thinking Summary ===\n";
    for (const [agentId, thoughts] of this.thoughts) {
      summary += `\n[${agentId}]:\n`;
      thoughts.forEach((t, i) => {
        summary += `  ${i + 1}. ${t.substring(0, 200)}\n`;
      });
    }
    return summary;
  }
}

By injecting the shared thinking summary into each agent's context before their next turn, you ensure that agents working on different aspects of a task maintain a consistent understanding of the overall situation. This is particularly effective when one agent's reasoning reveals information that should influence another agent's decisions — for example, a data analysis agent discovering an anomaly that should inform a report generation agent's conclusions.

Combining Extended Thinking with the Think Tool

Extended Thinking and the Think Tool aren't mutually exclusive — they deliver their best results when used together, each covering the other's blind spots.

// Extended Thinking + Think Tool combined
const response = await client.messages.create({
  model: "claude-opus-4-6",
  max_tokens: 16000,
  thinking: {
    type: "enabled",
    budget_tokens: 5000, // Budget for initial deep reasoning
  },
  tools: [thinkTool, ...businessTools],
  messages: messages,
});

Here's a practical framework for dividing responsibilities between the two:

Extended Thinking handles the strategic layer. Before generating any response, Claude uses Extended Thinking to understand the overall task, identify potential complications, and formulate a high-level plan. This is where complex constraint satisfaction and multi-factor analysis happen — the kind of reasoning that benefits from a large, uninterrupted thinking space.

The Think Tool handles the tactical layer. During execution, Claude uses the Think Tool at key decision points to assess whether the plan still holds given new information from tool results. It's where mid-course corrections happen, where unexpected data gets reconciled with the initial plan, and where specific policy conditions get evaluated against concrete customer data.

The combination is powerful because it addresses the two main failure modes of agentic workflows. Extended Thinking prevents "starting without a plan" failures. The Think Tool prevents "sticking to the plan despite new information" failures. Together, they produce agents that are both strategically sound and tactically adaptive.

Performance Optimization and Cost Management

Since the Think Tool consumes output tokens with each invocation, cost management is an important engineering consideration for production deployments. The good news is that the Think Tool's ROI is typically positive — the accuracy improvements reduce expensive failure modes like incorrect actions, customer escalations, and retry loops.

// Cost-aware Think Tool with token guidance
function createCostAwareThinkTool(
  maxThinkTokens: number = 500
) {
  return {
    name: "think",
    description:
      "Use this tool to analyze the current situation " +
      "and plan next steps. Keep your thinking concise " +
      `(under ${maxThinkTokens} tokens). ` +
      "Focus on: what you know, what you need, " +
      "and your next action.",
    input_schema: {
      type: "object" as const,
      properties: {
        thought: {
          type: "string",
          description: `Concise reasoning (max ~${maxThinkTokens} tokens)`,
        },
      },
      required: ["thought"],
    },
  };
}
 
// Token usage monitoring
interface TokenUsageTracker {
  thinkTokens: number;
  toolTokens: number;
  totalTokens: number;
}
 
function trackThinkToolUsage(
  response: Anthropic.Message,
  tracker: TokenUsageTracker
): void {
  for (const block of response.content) {
    if (
      block.type === "tool_use" &&
      block.name === "think"
    ) {
      const thought = (block.input as { thought: string })
        .thought;
      // Estimate: ~1.3 tokens per English word
      const estimatedTokens = Math.ceil(
        thought.split(/\s+/).length * 1.3
      );
      tracker.thinkTokens += estimatedTokens;
    }
  }
  tracker.totalTokens =
    (response.usage?.input_tokens ?? 0) +
    (response.usage?.output_tokens ?? 0);
}

The most effective cost optimization strategy is controlling Think Tool frequency through the system prompt. Rather than instructing Claude to think before every tool call, guide it to think only at critical decision points. This creates a tiered thinking approach: lightweight tasks might trigger zero Think Tool calls, moderate tasks might trigger one or two, and complex multi-policy decisions might trigger three to five.

Another optimization is to set explicit length guidance in the Think Tool description. Without guidance, Claude sometimes produces verbose thoughts that cover tangential considerations. Adding a token budget like "keep your thinking concise (under 500 tokens)" focuses the reasoning on what matters most for the immediate decision.

Real-World Example: Customer Support Refund Flow

To make these patterns concrete, let's walk through how the Think Tool transforms a real customer support scenario. A customer contacts support saying their flight was cancelled and they want a refund.

Without the Think Tool, the agent might retrieve the booking, see the cancellation, and immediately process a full refund — missing that the customer had already been rebooked on a later flight and the original cancellation was weather-related, which has different compensation rules than airline-initiated cancellations.

With the Think Tool, the flow becomes far more reliable. After retrieving the booking details, Claude calls the Think Tool: "The customer's original flight UA234 was cancelled. I can see they were rebooked on UA567. Before processing a refund, I need to check: (1) the cancellation reason — weather vs airline-initiated affects compensation eligibility, (2) whether the rebooking was accepted, and (3) what the fare difference is." This structured reasoning ensures that Claude checks the right information before taking any irreversible action.

After the policy lookup, Claude thinks again: "The cancellation was weather-related. Per policy section 4.2, weather cancellations don't qualify for automatic refunds, but the customer is entitled to a travel voucher. The rebooking was accepted, so this isn't an abandoned journey. I should offer the voucher and explain the weather policy rather than process a refund." This second Think step prevents a policy violation while still providing the customer with their entitled compensation.

This example illustrates why the 54% improvement in the airline domain is so significant — every Think step catches a potential mistake that would have required a correction, escalation, or customer complaint to resolve.

Debugging and Observability

One of the Think Tool's most valuable side effects is that it makes agent decision-making processes fully visible and auditable. Traditional agents are black boxes — you see the inputs and outputs but not the reasoning that connected them. The Think Tool transforms agents into glass boxes where every decision point is documented.

// Structured Think Tool logging
interface ThinkLog {
  id: string;
  timestamp: string;
  agentStep: number;
  thought: string;
  precedingTool: string | null;
  followingTool: string | null;
  confidence: string;
}
 
function formatThinkLogs(logs: ThinkLog[]): string {
  return logs
    .map(
      (log) =>
        `[Step ${log.agentStep}] ${log.timestamp}\n` +
        `  After: ${log.precedingTool ?? "initial"}\n` +
        `  Thought: ${log.thought}\n` +
        `  Then: ${log.followingTool ?? "final_response"}\n` +
        `  Confidence: ${log.confidence}`
    )
    .join("\n---\n");
}

In production environments, Think Tool logs serve multiple critical purposes. For prompt improvement, you can analyze how Claude reasoned through incorrect decisions to refine system prompts and tool descriptions. When an agent makes the wrong tool call, the Think log often reveals exactly why — perhaps a tool description was ambiguous, or a policy rule wasn't stated clearly enough. For quality monitoring, you can track confidence levels across Think Tool invocations and trigger alerts when low-confidence decisions spike, catching degradation before it impacts customers. And for compliance auditing in regulated domains like finance and healthcare, Think logs provide the auditable decision trail that regulators increasingly require for AI-powered systems.

A practical tip for production: store Think logs in a structured format (JSON) with the preceding and following tool names. This creates a decision graph that you can visualize and analyze. Patterns like "always uncertain after policy_lookup" immediately surface areas where your tool descriptions or system prompts need improvement.

Looking back

The Think Tool pattern is a simple yet powerful technique that dramatically improves decision accuracy in agentic workflows. While Extended Thinking provides deep pre-flight reasoning before Claude begins responding, the Think Tool serves as in-flight course correction throughout the entire tool use chain. The combination of both maximizes agent reliability by providing strategic planning upfront and tactical reasoning at every critical decision point.

The impact is especially pronounced in agents that chain multiple tool calls or make complex policy-based decisions. With near-zero server-side implementation cost — the tool requires no backend processing whatsoever — Anthropic's benchmarks demonstrate up to 54% accuracy improvement in challenging domains like airline customer service.

Take the three production patterns from this guide — stateful agent, pipeline gate, and multi-agent think sharing — and integrate the Think Tool into your own agents. Start with the stateful pattern to build observability, add the pipeline gate pattern when your agent performs high-stakes actions, and progress to think sharing when you build multi-agent systems. For a deeper dive into Claude API tool use, see the Claude API Advanced Tool Use Complete Guide. For strategies on combining with Extended Thinking, check out the Claude API Adaptive Thinking Production Guide. And for streaming implementations, the Claude API Streaming & Tool Use Production Guide provides comprehensive coverage.

For readers looking to deepen their understanding of agent architecture patterns covered in this article,

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.