CLAUDE LABJP
WWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skillsWWDC — WWDC 2026 confirms Siri runs on Google Gemini; third-party handoff to ChatGPT is dropped, and Siri AI won't ship in the EU under the DMA at iOS 27BILLING — 6 days until the Jun 15 change: Agent SDK, headless Claude Code, GitHub Actions, and third-party agents move to API-rate monthly creditOUTAGE — claude.ai, Claude Code, and Cowork saw an outage (Jun). Scheduled runs are safest when built around fallbackModel and retriesDYNAMIC-WORKFLOWS — Dynamic workflows are on by default on Max/Team and the API, for codebase-wide bug hunts and independent verificationULTRACODE — Claude Code's new ultracode setting sits in the effort menu, fixing effort to xhigh while Claude decides when to run a workflowOPUS4.8 — Claude Opus 4.8 is settled in as the default across major plans, with stronger coding, agentic, and reasoning skills
Articles/API & SDK
API & SDK/2026-05-06Intermediate

Claude API × Python in Practice: Building an AI Assistant with Tool Calling and Streaming

A practical guide to combining Claude API's Tool Use and Streaming in Python. Build a working AI assistant with real tool execution, complete source code included, plus a breakdown of the tricky parts that trip up most developers.

python32tool-use26streaming22api58claude-api71tutorial11

Tool calling works. Streaming works. But the moment you try to combine them, something breaks.

If you've hit that wall, you're not alone. The "Tool Use × Streaming" combination is one of the first real challenges when building with Claude API. Each feature is straightforward on its own, but combining them requires careful handling of the event stream—and the documentation doesn't always make this clear.

How Tool Use + Streaming Actually Works

The key to understanding this combination is knowing what the event stream looks like when a tool call happens.

In a normal streaming response, you receive text chunks. When Claude decides to call a tool, it pauses text generation and emits a different kind of block. The event types you need to handle are:

  • content_block_start — Start of a text block or tool call block
  • content_block_delta — A text chunk, or a fragment of the tool input JSON
  • content_block_stop — End of a block
  • message_delta — Contains stop_reason; if it's tool_use, you need to execute tools and continue

The challenge: tool input JSON arrives in fragments via input_json_delta. You can't parse it until the block ends. Meanwhile, Claude might be outputting text and preparing a tool call at the same time.

For background on tool calling fundamentals, see the Claude API Tool Use Guide. For a deeper look at how streaming and tools interact, Streaming × Tool Use Implementation Details is worth reading alongside this article.

Python Setup

You'll need Python 3.10+ and the Anthropic SDK:

pip install anthropic>=0.40.0

We'll define two simple tools to keep the focus on the event-handling logic:

  • get_current_time — Returns the current datetime in ISO 8601 format (no arguments)
  • calculate — Evaluates a math expression (takes an expression string)

Implementation: Tool Definitions and Stream Processing

First, the tool definitions and execution functions:

import anthropic
import json
import math
from datetime import datetime
from typing import Any
 
client = anthropic.Anthropic()
 
TOOLS = [
    {
        "name": "get_current_time",
        "description": "Returns the current date and time in ISO 8601 format",
        "input_schema": {
            "type": "object",
            "properties": {},
            "required": []
        }
    },
    {
        "name": "calculate",
        "description": "Evaluates a mathematical expression and returns the result. Supports basic arithmetic, exponentiation, and common math functions like sqrt.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The expression to evaluate (e.g., '2 + 3 * 4', 'sqrt(16)', '2**32')"
                }
            },
            "required": ["expression"]
        }
    }
]
 
def execute_tool(name: str, inputs: dict[str, Any]) -> str:
    """Execute a tool and return the result as a string."""
    if name == "get_current_time":
        return datetime.now().isoformat()
 
    elif name == "calculate":
        expr = inputs.get("expression", "")
        try:
            # Explicitly restrict what's available — never pass user input to bare eval()
            allowed_names = {
                "sqrt": math.sqrt, "sin": math.sin, "cos": math.cos,
                "tan": math.tan, "log": math.log, "abs": abs, "pi": math.pi
            }
            result = eval(expr, {"__builtins__": {}}, allowed_names)
            return str(result)
        except Exception as e:
            return f"Calculation error: {str(e)}"
 
    return f"Unknown tool: {name}"

Now the core streaming function. This is where the real work happens:

def chat_with_tools(messages: list, model: str = "claude-sonnet-4-6", depth: int = 0) -> str:
    """
    Run a chat turn with streaming + tool use support.
    If Claude calls tools, execute them and recurse with the results.
    """
    if depth > 5:
        return "(Tool call depth limit reached)"
 
    collected_text = []
    tool_calls = []
    current_tool = None
    current_tool_input_raw = ""
 
    with client.messages.stream(
        model=model,
        max_tokens=2048,
        tools=TOOLS,
        messages=messages
    ) as stream:
        for event in stream:
            event_type = event.type
 
            if event_type == "content_block_start":
                block = event.content_block
                if block.type == "tool_use":
                    # A tool call block is starting
                    current_tool = {"id": block.id, "name": block.name}
                    current_tool_input_raw = ""
 
            elif event_type == "content_block_delta":
                delta = event.delta
                if delta.type == "text_delta":
                    # Stream text to output in real time
                    print(delta.text, end="", flush=True)
                    collected_text.append(delta.text)
                elif delta.type == "input_json_delta":
                    # Accumulate JSON fragments — don't parse yet
                    current_tool_input_raw += delta.partial_json
 
            elif event_type == "content_block_stop":
                if current_tool is not None:
                    # JSON is complete — now we can parse it
                    tool_input = json.loads(current_tool_input_raw) if current_tool_input_raw else {}
                    tool_calls.append({**current_tool, "input": tool_input})
                    current_tool = None
                    current_tool_input_raw = ""
 
    # No tool calls? Return the text response directly
    if not tool_calls:
        return "".join(collected_text)
 
    # Execute tools and continue the conversation
    print("\n[Executing tools...]")
 
    # Build the assistant message (text + tool use blocks)
    assistant_content = []
    if collected_text:
        assistant_content.append({"type": "text", "text": "".join(collected_text)})
    for tc in tool_calls:
        assistant_content.append({
            "type": "tool_use",
            "id": tc["id"],
            "name": tc["name"],
            "input": tc["input"]
        })
    messages.append({"role": "assistant", "content": assistant_content})
 
    # Execute each tool and collect results
    tool_results = []
    for tc in tool_calls:
        result = execute_tool(tc["name"], tc["input"])
        print(f"  {tc['name']}({tc['input']}) → {result}")
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": tc["id"],
            "content": result
        })
    messages.append({"role": "user", "content": tool_results})
 
    # Get Claude's response now that it has the tool results
    return chat_with_tools(messages, model, depth + 1)

Finally, the conversation loop:

def main():
    print("Claude AI Assistant (Tools + Streaming)")
    print("Type 'quit' or 'exit' to stop\n")
 
    messages = []
 
    while True:
        user_input = input("You: ").strip()
        if not user_input or user_input.lower() in ("quit", "exit"):
            break
 
        messages.append({"role": "user", "content": user_input})
        print("Assistant: ", end="", flush=True)
        response = chat_with_tools(messages)
        print()
 
        messages.append({"role": "assistant", "content": response})
 
if __name__ == "__main__":
    main()

Run this and try asking: "What time is it?" or "What's 2 to the power of 32?" You'll see Claude invoke the tools and answer accurately.

The Gotchas That Trip Most Developers Up

Three things consistently catch people off guard when implementing this pattern.

Tool input JSON arrives in fragments

input_json_delta delivers JSON in pieces—you might see {"expression": "2** as one event. Never call json.loads() on a fragment. Always wait for content_block_stop before parsing. The current_tool_input_raw accumulation pattern handles this correctly.

Text and tool calls can overlap

Claude sometimes outputs a brief text response before making a tool call—something like "Let me calculate that for you." This means your stream can contain both text_delta and input_json_delta events for the same turn. Managing collected_text and tool_calls separately ensures both get included correctly in the conversation history.

The assistant message must include both text and tool use blocks

When you add the assistant's response back to the messages array before calling Claude again, you must include all content blocks—both any text Claude generated and the tool use requests. Omitting the text portion causes errors or unexpected behavior in subsequent turns.

Adding More Tools

Once the pattern clicks, extending the assistant is straightforward. Here's a file-reading tool as an example:

def read_file_tool(file_path: str) -> str:
    """Read a file and return its contents (truncated if too long)."""
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
        return content[:5000] if len(content) > 5000 else content
    except FileNotFoundError:
        return f"File not found: {file_path}"
    except Exception as e:
        return f"Read error: {str(e)}"

The same pattern applies to database queries, external API calls, or code execution. What matters most is the description field in your tool definition—Claude uses it to decide when to call the tool. Write it as clearly as you'd explain the function to a colleague.

For more on building with the Python SDK, the Python SDK Chatbot Tutorial covers the foundational patterns in detail.

Start with This Code

The fastest way to understand Tool Use × Streaming is to run this code and watch what happens. Try asking questions that require tools, then questions that don't. Observe how Claude decides when to call a tool versus answering directly.

Once that behavior makes sense, you'll find it straightforward to add tools that fit your actual use case—and the assistant starts feeling genuinely useful rather than just an experiment.

Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

API & SDK2026-05-16
Debugging Claude API Tool Use Schema Errors: 3 Patterns I've Hit and How to Fix Them
A practical guide to diagnosing Claude API Tool Use errors—from schema definition mistakes to invalid_tool_use blocks and Claude ignoring your tools entirely. Based on real implementation experience.
API & SDK2026-05-05
Let Claude Diagnose Its Own Tool Errors — Building a Self-Correction Loop with the Anthropic API
Learn how to handle Tool Use failures gracefully by feeding error details back to Claude using the is_error flag, enabling self-diagnosis and automatic retry. Includes working Python code and production antipatterns to avoid.
API & SDK2026-05-05
Building a 'Think-and-Search' AI Agent — Claude API Extended Thinking × Tool Use
A deep dive into combining Claude API Extended Thinking and Tool Use. Covers frequent errors, a complete research agent implementation in Python, plus cost estimation, timeout design, and error recovery for production use.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →