Claude API Token Counting Guide — How to Estimate Token Usage and Optimize Costs Before Sending Requests

Why Token Counting Matters for Your API Workflow

If you've ever been surprised by unexpected costs after sending large multimodal requests to Claude, you're not alone. Images, PDFs, tool definitions, and lengthy system prompts can consume far more tokens than you'd expect — and by the time you check your usage dashboard, the bill is already there.

Anthropic provides a dedicated Token Counting endpoint that solves this problem. It lets you determine exactly how many tokens a request will consume before you send it, giving you full control over costs and context window management.

Token Counting Endpoint Basics

What It Does

The Token Counting endpoint accepts the same request structure as the Messages API and returns the total number of input tokens — without actually generating a response.

Key features:

Free to use: There's no charge for token counting requests
Works with all models: Claude Opus 4.6, Sonnet 4.6, Haiku 4.5, and all other active models
Multimodal support: Counts tokens for text, images, PDFs, and tool definitions
Independent rate limits: Token counting has its own rate limits, separate from the Messages API, so counting won't eat into your message quota

Basic Usage

Here's the simplest way to count tokens using the Python SDK:

import anthropic
 
client = anthropic.Anthropic()
 
# Count tokens for a message
response = client.messages.count_tokens(
    model="claude-sonnet-4-6-20260320",
    messages=[
        {
            "role": "user",
            "content": "Explain how token counting works in the Claude API."
        }
    ]
)
 
print(f"Input tokens: {response.input_tokens}")
# Output: Input tokens: 22

And the same thing in TypeScript:

import Anthropic from "@anthropic-ai/sdk";
 
const client = new Anthropic();
 
const response = await client.messages.countTokens({
  model: "claude-sonnet-4-6-20260320",
  messages: [
    {
      role: "user",
      content: "Explain how token counting works in the Claude API.",
    },
  ],
});
 
console.log(`Input tokens: ${response.input_tokens}`);
// Output: Input tokens: 22

Counting Tokens with System Prompts and Tools

In real-world applications, system prompts and tool definitions often make up a significant portion of your input tokens. The Token Counting endpoint accounts for these as well.

import anthropic
 
client = anthropic.Anthropic()
 
# Count tokens including system prompt + tool definitions + messages
response = client.messages.count_tokens(
    model="claude-sonnet-4-6-20260320",
    system="You are a helpful weather assistant. Provide accurate weather information based on user queries.",
    tools=[
        {
            "name": "get_weather",
            "description": "Get the current weather for a specified city",
            "input_schema": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city to check the weather for"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    ],
    messages=[
        {
            "role": "user",
            "content": "What's the weather like in Tokyo today?"
        }
    ]
)
 
print(f"Input tokens (system + tools + messages): {response.input_tokens}")
# Output: Input tokens (system + tools + messages): 372

For applications with many tool definitions, tokens can add up to hundreds or even thousands. Knowing this upfront helps you decide which tools to include in each request.

Counting Tokens for Images and PDFs

Multimodal requests are where token counting becomes especially valuable, since image and PDF token consumption is hard to predict without measuring it.

import anthropic
import base64
 
client = anthropic.Anthropic()
 
# Read and encode an image file
with open("screenshot.png", "rb") as f:
    image_data = base64.standard_b64encode(f.read()).decode("utf-8")
 
response = client.messages.count_tokens(
    model="claude-sonnet-4-6-20260320",
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "type": "image",
                    "source": {
                        "type": "base64",
                        "media_type": "image/png",
                        "data": image_data
                    }
                },
                {
                    "type": "text",
                    "text": "Describe what you see in this image."
                }
            ]
        }
    ]
)
 
print(f"Input tokens (with image): {response.input_tokens}")
# Output: Input tokens (with image): 1,584

Image token counts vary significantly based on resolution — a high-resolution screenshot might consume several thousand tokens. Checking this before sending prevents unpleasant surprises on your bill.

Production Patterns

Pattern 1: Context Window Management

For chatbots that maintain long conversations, you need to trim old messages before hitting the context window limit. Token Counting makes this precise rather than guesswork.

import anthropic
 
client = anthropic.Anthropic()
 
MODEL = "claude-sonnet-4-6-20260320"
MAX_INPUT_TOKENS = 180_000  # Leave a safety margin
SYSTEM_PROMPT = "You are a helpful customer support assistant."
 
def manage_conversation(messages: list, new_message: dict) -> list:
    """Manage conversation history to stay within the context window."""
    candidate = messages + [new_message]
 
    # Count current tokens
    count_response = client.messages.count_tokens(
        model=MODEL,
        system=SYSTEM_PROMPT,
        messages=candidate
    )
 
    # Remove oldest message pairs until we're under the limit
    while count_response.input_tokens > MAX_INPUT_TOKENS and len(candidate) > 1:
        candidate = candidate[2:]  # Remove oldest user-assistant pair
        count_response = client.messages.count_tokens(
            model=MODEL,
            system=SYSTEM_PROMPT,
            messages=candidate
        )
        print(f"Trimmed conversation: {count_response.input_tokens} tokens")
 
    return candidate

Pattern 2: Batch Cost Estimation Dashboard

Before kicking off a large batch job, estimate total costs upfront.

import anthropic
 
client = anthropic.Anthropic()
 
# Input token pricing per 1M tokens (USD)
PRICING = {
    "claude-opus-4-6-20260205": 15.0,
    "claude-sonnet-4-6-20260320": 3.0,
    "claude-haiku-4-5-20251001": 0.80,
}
 
def estimate_batch_cost(
    model: str,
    tasks: list[dict],
    system: str = ""
) -> dict:
    """Estimate costs for a batch of tasks before execution."""
    total_tokens = 0
 
    for task in tasks:
        response = client.messages.count_tokens(
            model=model,
            system=system,
            messages=[{"role": "user", "content": task["content"]}]
        )
        total_tokens += response.input_tokens
 
    price_per_token = PRICING.get(model, 3.0) / 1_000_000
    estimated_cost = total_tokens * price_per_token
 
    return {
        "total_input_tokens": total_tokens,
        "estimated_input_cost_usd": round(estimated_cost, 4),
        "average_tokens_per_task": total_tokens // len(tasks),
        "task_count": len(tasks)
    }
 
# Usage example
tasks = [
    {"content": "Analyze the sentiment of this product review: ..."},
    {"content": "Summarize the following article: ..."},
    {"content": "Find the bug in this code snippet: ..."},
]
 
estimate = estimate_batch_cost("claude-sonnet-4-6-20260320", tasks)
print(f"Total input tokens: {estimate['total_input_tokens']:,}")
print(f"Estimated input cost: ${estimate['estimated_input_cost_usd']}")
# Output:
# Total input tokens: 1,245
# Estimated input cost: $0.0037

Pattern 3: Combining with Prompt Caching

If you use prompt caching, Token Counting helps you optimize cache breakpoint placement. By knowing exactly how many tokens each section of your prompt consumes, you can ensure cache-eligible portions meet the minimum threshold.

import anthropic
 
client = anthropic.Anthropic()
 
# Check if a system prompt qualifies for caching
large_system_prompt = "..." * 1000  # Large system prompt
 
response = client.messages.count_tokens(
    model="claude-sonnet-4-6-20260320",
    system=large_system_prompt,
    messages=[{"role": "user", "content": "test"}]
)
 
print(f"Input tokens with system prompt: {response.input_tokens}")
# Verify it meets the minimum for prompt caching (1,024 tokens)
if response.input_tokens >= 1024:
    print("✅ Eligible for prompt caching")
else:
    print("⚠️ Below the 1,024-token minimum for prompt caching")

Wrapping Up

The Token Counting endpoint is one of those tools that's easy to overlook but incredibly valuable once you start using it. Here's where it makes the biggest difference:

Pre-flight cost checks for batch processing: Know what you'll spend before you commit to processing thousands of requests
Smart conversation management: Keep chatbots running smoothly by monitoring context window usage in real time
Multimodal optimization: Understand exactly how much images and PDFs cost in tokens, and adjust resolutions or content accordingly

Since it's free to use, there's really no reason not to integrate it into your workflow. Start by identifying which requests in your application consume the most tokens — that's where you'll find the best optimization opportunities.

For a broader introduction to the Claude API, check out the API Quickstart guide. For managing rate limits effectively, see Rate Limits Best Practices.