Claude API × Python in Practice: Building an AI Assistant with Tool Calling and Streaming

Tool calling works. Streaming works. But the moment you try to combine them, something breaks.

If you've hit that wall, you're not alone. The "Tool Use × Streaming" combination is one of the first real challenges when building with Claude API. Each feature is straightforward on its own, but combining them requires careful handling of the event stream—and the documentation doesn't always make this clear.

How Tool Use + Streaming Actually Works

The key to understanding this combination is knowing what the event stream looks like when a tool call happens.

In a normal streaming response, you receive text chunks. When Claude decides to call a tool, it pauses text generation and emits a different kind of block. The event types you need to handle are:

content_block_start — Start of a text block or tool call block
content_block_delta — A text chunk, or a fragment of the tool input JSON
content_block_stop — End of a block
message_delta — Contains stop_reason; if it's tool_use, you need to execute tools and continue

The challenge: tool input JSON arrives in fragments via input_json_delta. You can't parse it until the block ends. Meanwhile, Claude might be outputting text and preparing a tool call at the same time.

For background on tool calling fundamentals, see the Claude API Tool Use Guide. For a deeper look at how streaming and tools interact, Streaming × Tool Use Implementation Details is worth reading alongside this article.

Python Setup

You'll need Python 3.10+ and the Anthropic SDK:

pip install anthropic>=0.40.0

We'll define two simple tools to keep the focus on the event-handling logic:

get_current_time — Returns the current datetime in ISO 8601 format (no arguments)
calculate — Evaluates a math expression (takes an expression string)

Implementation: Tool Definitions and Stream Processing

First, the tool definitions and execution functions:

import anthropic
import json
import math
from datetime import datetime
from typing import Any
 
client = anthropic.Anthropic()
 
TOOLS = [
    {
        "name": "get_current_time",
        "description": "Returns the current date and time in ISO 8601 format",
        "input_schema": {
            "type": "object",
            "properties": {},
            "required": []
        }
    },
    {
        "name": "calculate",
        "description": "Evaluates a mathematical expression and returns the result. Supports basic arithmetic, exponentiation, and common math functions like sqrt.",
        "input_schema": {
            "type": "object",
            "properties": {
                "expression": {
                    "type": "string",
                    "description": "The expression to evaluate (e.g., '2 + 3 * 4', 'sqrt(16)', '2**32')"
                }
            },
            "required": ["expression"]
        }
    }
]
 
def execute_tool(name: str, inputs: dict[str, Any]) -> str:
    """Execute a tool and return the result as a string."""
    if name == "get_current_time":
        return datetime.now().isoformat()
 
    elif name == "calculate":
        expr = inputs.get("expression", "")
        try:
            # Explicitly restrict what's available — never pass user input to bare eval()
            allowed_names = {
                "sqrt": math.sqrt, "sin": math.sin, "cos": math.cos,
                "tan": math.tan, "log": math.log, "abs": abs, "pi": math.pi
            }
            result = eval(expr, {"__builtins__": {}}, allowed_names)
            return str(result)
        except Exception as e:
            return f"Calculation error: {str(e)}"
 
    return f"Unknown tool: {name}"

Now the core streaming function. This is where the real work happens:

def chat_with_tools(messages: list, model: str = "claude-sonnet-4-6", depth: int = 0) -> str:
    """
    Run a chat turn with streaming + tool use support.
    If Claude calls tools, execute them and recurse with the results.
    """
    if depth > 5:
        return "(Tool call depth limit reached)"
 
    collected_text = []
    tool_calls = []
    current_tool = None
    current_tool_input_raw = ""
 
    with client.messages.stream(
        model=model,
        max_tokens=2048,
        tools=TOOLS,
        messages=messages
    ) as stream:
        for event in stream:
            event_type = event.type
 
            if event_type == "content_block_start":
                block = event.content_block
                if block.type == "tool_use":
                    # A tool call block is starting
                    current_tool = {"id": block.id, "name": block.name}
                    current_tool_input_raw = ""
 
            elif event_type == "content_block_delta":
                delta = event.delta
                if delta.type == "text_delta":
                    # Stream text to output in real time
                    print(delta.text, end="", flush=True)
                    collected_text.append(delta.text)
                elif delta.type == "input_json_delta":
                    # Accumulate JSON fragments — don't parse yet
                    current_tool_input_raw += delta.partial_json
 
            elif event_type == "content_block_stop":
                if current_tool is not None:
                    # JSON is complete — now we can parse it
                    tool_input = json.loads(current_tool_input_raw) if current_tool_input_raw else {}
                    tool_calls.append({**current_tool, "input": tool_input})
                    current_tool = None
                    current_tool_input_raw = ""
 
    # No tool calls? Return the text response directly
    if not tool_calls:
        return "".join(collected_text)
 
    # Execute tools and continue the conversation
    print("\n[Executing tools...]")
 
    # Build the assistant message (text + tool use blocks)
    assistant_content = []
    if collected_text:
        assistant_content.append({"type": "text", "text": "".join(collected_text)})
    for tc in tool_calls:
        assistant_content.append({
            "type": "tool_use",
            "id": tc["id"],
            "name": tc["name"],
            "input": tc["input"]
        })
    messages.append({"role": "assistant", "content": assistant_content})
 
    # Execute each tool and collect results
    tool_results = []
    for tc in tool_calls:
        result = execute_tool(tc["name"], tc["input"])
        print(f"  {tc['name']}({tc['input']}) → {result}")
        tool_results.append({
            "type": "tool_result",
            "tool_use_id": tc["id"],
            "content": result
        })
    messages.append({"role": "user", "content": tool_results})
 
    # Get Claude's response now that it has the tool results
    return chat_with_tools(messages, model, depth + 1)

Finally, the conversation loop:

def main():
    print("Claude AI Assistant (Tools + Streaming)")
    print("Type 'quit' or 'exit' to stop\n")
 
    messages = []
 
    while True:
        user_input = input("You: ").strip()
        if not user_input or user_input.lower() in ("quit", "exit"):
            break
 
        messages.append({"role": "user", "content": user_input})
        print("Assistant: ", end="", flush=True)
        response = chat_with_tools(messages)
        print()
 
        messages.append({"role": "assistant", "content": response})
 
if __name__ == "__main__":
    main()

Run this and try asking: "What time is it?" or "What's 2 to the power of 32?" You'll see Claude invoke the tools and answer accurately.

The Gotchas That Trip Most Developers Up

Three things consistently catch people off guard when implementing this pattern.

Tool input JSON arrives in fragments

input_json_delta delivers JSON in pieces—you might see {"expression": "2** as one event. Never call json.loads() on a fragment. Always wait for content_block_stop before parsing. The current_tool_input_raw accumulation pattern handles this correctly.

Text and tool calls can overlap

Claude sometimes outputs a brief text response before making a tool call—something like "Let me calculate that for you." This means your stream can contain both text_delta and input_json_delta events for the same turn. Managing collected_text and tool_calls separately ensures both get included correctly in the conversation history.

The assistant message must include both text and tool use blocks

When you add the assistant's response back to the messages array before calling Claude again, you must include all content blocks—both any text Claude generated and the tool use requests. Omitting the text portion causes errors or unexpected behavior in subsequent turns.

Adding More Tools

Once the pattern clicks, extending the assistant is straightforward. Here's a file-reading tool as an example:

def read_file_tool(file_path: str) -> str:
    """Read a file and return its contents (truncated if too long)."""
    try:
        with open(file_path, "r", encoding="utf-8") as f:
            content = f.read()
        return content[:5000] if len(content) > 5000 else content
    except FileNotFoundError:
        return f"File not found: {file_path}"
    except Exception as e:
        return f"Read error: {str(e)}"

The same pattern applies to database queries, external API calls, or code execution. What matters most is the description field in your tool definition—Claude uses it to decide when to call the tool. Write it as clearly as you'd explain the function to a colleague.

For more on building with the Python SDK, the Python SDK Chatbot Tutorial covers the foundational patterns in detail.

Start with This Code

The fastest way to understand Tool Use × Streaming is to run this code and watch what happens. Try asking questions that require tools, then questions that don't. Observe how Claude decides when to call a tool versus answering directly.

Once that behavior makes sense, you'll find it straightforward to add tools that fit your actual use case—and the assistant starts feeling genuinely useful rather than just an experiment.