⬡ API & SDK/2026-06-30Advanced

When a Tool Result Is Too Big and Melts Your Context Window: Designing Cursor-Based Pagination

When a list tool returns hundreds of rows at once, an agent's context can collapse in a single call. Here is a cursor-based pagination design that keeps tool output small and protects your token budget, with working code.

Claude API⁹⁴ Agent SDK⁴ MCP³⁸ Context Management Token Optimization

✦ Premium Article

Running four technical blogs as an indie developer, I ask my agent to "show me the published articles" several times a day. One morning the MCP tool that answers that request handed back more than 400 titles and slugs in one shot, and a single call ate what felt like a fifth of the context window.

We were still early in the conversation. Articles still had to be generated, quality gates still had to run, and a push still had to happen. Yet the very first tool call had already crowded the budget. Everything stalled.

The cause was plain: the tool returned everything. What the agent actually needed was the most recent few dozen items and the simple fact that there was more. This article builds the cursor-based pagination that closes that gap, all the way down to a signed opaque-cursor implementation.

Why a "return everything" tool breaks agents

With an ordinary web API, an oversized response is something the client receives and discards. With an agent, a tool's return value becomes input tokens for every following turn. The larger the list, the longer the agent carries that weight through the rest of the conversation.

I measured it. My article-list tool returns a title, slug, category, publish date, and tags per row. With Japanese titles included, that is roughly 45 tokens each, or about 18,000 tokens for 400 rows. Parked at the top of a conversation, combined with the system prompt, generation instructions, and code snippets, the budget is already crumbling before anything has been built.

There is a second, subtler problem: this output is hostile to prompt caching. The list changes every time an article is added, so it becomes a large, highly volatile block. Even with a careful split between stable and volatile cache blocks, this one list keeps producing cache misses.

Why offset pagination is not enough

"Just paginate it," I thought, and reached for offset pagination first. It is the simplest thing to implement.

def list_articles_offset(offset: int = 0, limit: int = 20):
    rows = db.query(
        "SELECT slug, title, category, published_at "
        "FROM articles WHERE status = 'published' "
        "ORDER BY published_at DESC LIMIT ? OFFSET ?",
        (limit, offset),
    )
    return {"items": rows, "next_offset": offset + limit}

That brought a page down to 20 items, about 900 tokens. By token count alone, a win.

But soon after it went live, the agent processed the same article twice. While it was reviewing page one, another scheduled task published a fresh article. Under published_at DESC, the new row cut in at the front. The article that had been row 20 slid to row 21 and reappeared on page two at offset=20. A deletion produces the mirror image: a skipped row.

Offset pagination assumes a fixed ordering. In an agent's world, that assumption frequently fails, because several tasks touch the same data concurrently.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦The implementation steps that took a list tool returning 400 rows at ~18,000 tokens down to ~900 tokens using 20-item cursor pages

✦A signed opaque-cursor encoder (Python and TypeScript) that avoids the duplicate-and-skip failures of offset pagination

✦A result-object design that ships has_more plus a short summary hint so the agent decides for itself whether to fetch the next page

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

How cursor pagination thinks

A cursor is not a position but a pointer: "the last item I returned was this one." The next page fetches everything after that item. If inserts or deletes happen in between, the anchor itself does not move, so neither duplicates nor skips occur.

For the sort key, pick a value that is both monotonic and unique. I used the composite key (published_at, id). Even when several articles share a publish timestamp, id makes the final order unambiguous. A timestamp-only cursor skips rows at the boundary where two articles publish in the same second — a trap I actually hit.

def list_articles_cursor(after_published_at=None, after_id=None, limit=20):
    if after_published_at is None:
        rows = db.query(
            "SELECT id, slug, title, category, published_at "
            "FROM articles WHERE status = 'published' "
            "ORDER BY published_at DESC, id DESC LIMIT ?",
            (limit + 1,),
        )
    else:
        rows = db.query(
            "SELECT id, slug, title, category, published_at "
            "FROM articles WHERE status = 'published' "
            "AND (published_at, id) < (?, ?) "
            "ORDER BY published_at DESC, id DESC LIMIT ?",
            (after_published_at, after_id, limit + 1),
        )
    return rows

The key detail is passing limit + 1 to LIMIT ?. Fetch one row more than requested, and the presence of that surplus tells you whether there is a next page (has_more). It is cheaper and more reliable than a separate COUNT(*).

Hiding internals with an opaque cursor

Here a decision is required: should you hand the raw published_at and id back to the agent as the cursor?

I recommend against returning them raw, for two reasons. First, leaking internal column names and sort keys gives the agent — or any input trying to steer it — room to rewrite the cursor and fire an unintended query. Second, if you later want to switch the sort key to (updated_at, id), raw cursors in the wild break backward compatibility.

So encode the cursor's contents, sign them, and make the result an opaque string. To the agent it is just a "password to fetch the next page," with no internals to interpret.

import base64
import hmac
import hashlib
import json
 
CURSOR_SECRET = b"replace-with-env-secret"  # load from an env var in practice
 
def encode_cursor(published_at: str, item_id: int) -> str:
    payload = json.dumps(
        {"p": published_at, "i": item_id, "k": "published_at_id_v1"},
        separators=(",", ":"),
    ).encode()
    sig = hmac.new(CURSOR_SECRET, payload, hashlib.sha256).digest()[:8]
    return base64.urlsafe_b64encode(payload + sig).decode()
 
def decode_cursor(cursor: str):
    raw = base64.urlsafe_b64decode(cursor.encode())
    payload, sig = raw[:-8], raw[-8:]
    expected = hmac.new(CURSOR_SECRET, payload, hashlib.sha256).digest()[:8]
    if not hmac.compare_digest(sig, expected):
        raise ValueError("cursor signature mismatch")
    data = json.loads(payload)
    if data.get("k") != "published_at_id_v1":
        raise ValueError("cursor scheme mismatch — stale cursor")
    return data["p"], data["i"]

Embedding k (a key-scheme version) lets you explicitly reject old cursors as "stale" when you change the sort key. A tampered cursor fails the signature check, so catching the decode_cursor exception and replying "the cursor is invalid; please start from the top" is the safe operational move.

A result object the agent can page through on its own

The real difficulty of pagination is not chopping the data up — it is making the agent aware that it is only seeing part of the list. Return 20 items naively and the agent assumes that is the whole set and acts on it.

So bundle "is there more" and "a summary of the whole" into the return value.

def list_articles_tool(cursor: str | None = None, limit: int = 20):
    after_p, after_i = (decode_cursor(cursor) if cursor else (None, None))
    rows = list_articles_cursor(after_p, after_i, limit)
 
    has_more = len(rows) > limit
    page = rows[:limit]
    next_cursor = (
        encode_cursor(page[-1]["published_at"], page[-1]["id"])
        if has_more and page else None
    )
 
    total = db.scalar(
        "SELECT COUNT(*) FROM articles WHERE status = 'published'"
    )
    return {
        "items": [
            {"slug": r["slug"], "title": r["title"],
             "category": r["category"], "published_at": r["published_at"]}
            for r in page
        ],
        "page_size": len(page),
        "has_more": has_more,
        "next_cursor": next_cursor,
        "total_published": total,
        "hint": (
            f"Showing the {len(page)} most recent of {total} articles. "
            "Pass next_cursor to call again only if you need older ones."
        ),
    }

The hint field looks trivial but earns its place. Told explicitly that it is seeing "the 20 most recent of 400," the agent usually stops there. It fetches more only when it is genuinely hunting for older articles. After I added this hint, average page fetches per request dropped from 2.1 to 1.3. Fewer wasted fetches means an even lighter context.

The schema when you expose it as an MCP tool

When you present this to Claude as a tool, keep the input schema minimal. Make cursor optional and cap limit, so an agent that throws limit: 5000 gets clamped server-side.

const listArticlesTool = {
  name: "list_articles",
  description:
    "Returns published articles newest-first, at most 50 per call. " +
    "Only when you need more, pass the previous response's next_cursor as cursor.",
  input_schema: {
    type: "object",
    properties: {
      cursor: {
        type: "string",
        description: "next_cursor from the previous response. Omit on the first call.",
      },
      limit: {
        type: "integer",
        minimum: 1,
        maximum: 50,
        default: 20,
      },
    },
  },
} as const;
 
function clampLimit(requested?: number): number {
  if (!requested || requested < 1) return 20;
  return Math.min(requested, 50);
}

Writing "only when you need more" into the description is another small but effective nudge. A tool's description is the closest instruction the agent has, so one line on pagination etiquette there meaningfully curbs needless full scans.

Operational lessons that surfaced

A few weeks of running this revealed things the docs do not mention.

First, the right default for limit is "the smallest value at which one page closes a decision." I started at 50, but for the agent's job of scanning the list to weed out duplicate candidates, 20 already supplied enough context. The lower the default, the lighter each call's tokens.

Second, whether to return total deserves a careful call. COUNT(*) gets quietly expensive as rows grow and, on a huge table, can be slower than fetching the page itself. For lists where an estimate is fine, I drop total and run on has_more alone. I split off a separate tool for screens that need the exact count.

Third, cursor lifetime. A signed cursor is valid forever in principle, but the moment you change the sort-key schema, every cursor goes invalid on the k mismatch. A scheduled task resuming with an old cursor will throw there. I made decode_cursor failures silently restart from the top so the agent does not stall. Swallowing an exception is usually a call I avoid, but here I prioritize "do not stall."

Fourth, the same design paid off in a tool that aggregates the AdMob revenue dashboard. Instead of returning every daily revenue row, I return only the last seven days with a cursor, and for the agent's "check this morning's numbers" task it never once goes back for more. List-shaped and time-series tools almost always benefit from this pagination design.

Where to start

If an existing agent's context balloons early, the first suspect is "a tool that returns a whole list." Open your tool-call logs and find the one with the largest return value. Usually there are only one or two culprits.

Replace that one with a version that judges has_more via limit + 1 and returns the rest through a signed cursor. Always attach has_more and a short hint to the return object. Even this minimal shape largely resolves the budget-melting problem at the start of a conversation. The first thing I fixed was a single tool, and that alone visibly changed the stability of the whole day's generation pipeline.

I hope this helps anyone working on the same problem.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.