CLAUDE LABJP
MODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic workCODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retriesCODE — CPU use during streaming drops about 37%, improving stability on long-running sessionsCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native accessSECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentialsPOLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agenciesMODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic workCODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retriesCODE — CPU use during streaming drops about 37%, improving stability on long-running sessionsCLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native accessSECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentialsPOLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agencies
Articles/API & SDK
API & SDK/2026-06-30Advanced

When a Tool Result Is Too Big and Melts Your Context Window: Designing Cursor-Based Pagination

When a list tool returns hundreds of rows at once, an agent's context can collapse in a single call. Here is a cursor-based pagination design that keeps tool output small and protects your token budget, with working code.

Claude API94Agent SDK4MCP38Context ManagementToken Optimization

Premium Article

Running four technical blogs as an indie developer, I ask my agent to "show me the published articles" several times a day. One morning the MCP tool that answers that request handed back more than 400 titles and slugs in one shot, and a single call ate what felt like a fifth of the context window.

We were still early in the conversation. Articles still had to be generated, quality gates still had to run, and a push still had to happen. Yet the very first tool call had already crowded the budget. Everything stalled.

The cause was plain: the tool returned everything. What the agent actually needed was the most recent few dozen items and the simple fact that there was more. This article builds the cursor-based pagination that closes that gap, all the way down to a signed opaque-cursor implementation.

Why a "return everything" tool breaks agents

With an ordinary web API, an oversized response is something the client receives and discards. With an agent, a tool's return value becomes input tokens for every following turn. The larger the list, the longer the agent carries that weight through the rest of the conversation.

I measured it. My article-list tool returns a title, slug, category, publish date, and tags per row. With Japanese titles included, that is roughly 45 tokens each, or about 18,000 tokens for 400 rows. Parked at the top of a conversation, combined with the system prompt, generation instructions, and code snippets, the budget is already crumbling before anything has been built.

There is a second, subtler problem: this output is hostile to prompt caching. The list changes every time an article is added, so it becomes a large, highly volatile block. Even with a careful split between stable and volatile cache blocks, this one list keeps producing cache misses.

Why offset pagination is not enough

"Just paginate it," I thought, and reached for offset pagination first. It is the simplest thing to implement.

def list_articles_offset(offset: int = 0, limit: int = 20):
    rows = db.query(
        "SELECT slug, title, category, published_at "
        "FROM articles WHERE status = 'published' "
        "ORDER BY published_at DESC LIMIT ? OFFSET ?",
        (limit, offset),
    )
    return {"items": rows, "next_offset": offset + limit}

That brought a page down to 20 items, about 900 tokens. By token count alone, a win.

But soon after it went live, the agent processed the same article twice. While it was reviewing page one, another scheduled task published a fresh article. Under published_at DESC, the new row cut in at the front. The article that had been row 20 slid to row 21 and reappeared on page two at offset=20. A deletion produces the mirror image: a skipped row.

Offset pagination assumes a fixed ordering. In an agent's world, that assumption frequently fails, because several tasks touch the same data concurrently.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
The implementation steps that took a list tool returning 400 rows at ~18,000 tokens down to ~900 tokens using 20-item cursor pages
A signed opaque-cursor encoder (Python and TypeScript) that avoids the duplicate-and-skip failures of offset pagination
A result-object design that ships has_more plus a short summary hint so the agent decides for itself whether to fetch the next page
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-28
Did That Post Actually Go Through? Safely Retrying an Interrupted MCP Write Without Double-Executing
When an MCP write tool call is interrupted by a dropped connection, you can't tell whether the server ran it. Here's why naive retries cause double-execution, and a working wrapper that uses idempotency keys and a reconcile read to retry safely — with examples from an unattended pipeline.
API & SDK2026-06-25
Reach a Remote MCP Server in a Single API Request: Implementing the Messages API MCP Connector
How to call a remote MCP server's tools using only the Messages API's mcp_servers and mcp_toolset—no local MCP client. Covers allowlist/denylist design, response handling, and the pitfalls to avoid before unattended production use.
API & SDK2026-06-20
Running Subagents in Parallel Without One Failure Sinking the Whole Run
A fan-out / fan-in design for running several subagents in parallel, covering token budgeting, a result contract, and partial-failure handling. Includes an implementation where one branch can fail without stopping the rest, plus measured numbers.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →