●SANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)●PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limit●CONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflows●MCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)●CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcement●MODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude Code●SANDBOX — Claude Managed Agents can now run in your own sandbox and connect to private MCP servers (self-hosted beta, MCP tunnels in preview)●PLATFORM — The Claude Developer Platform adds new code execution, web search, and web fetch tools, exposing a 90-second per-cell limit●CONTEXT — response_inclusion trims consumed result blocks to save context in agentic workflows●MCP — Enterprise-managed MCP connectors (Okta) continue: zero-touch access across Claude, Claude Code, and Cowork (Team/Enterprise beta)●CODE — Claude Code adds /cd, a post-session hook, and a safe mode while tightening MCP policy enforcement●MODEL — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; Fable 5 is available from Claude Code
Surviving the 90-Second Code Execution Cell Limit with Checkpointed Chunking
Claude's code execution tool now enforces a 90-second per-cell limit. Here is how to keep a long batch from getting cut off there: persist progress to the container filesystem and resume across cells, with working code for timing, idempotent checkpoints, and knowing when to offload.
I had been handing a job that inspects every article MDX across my four sites to a single code execution cell. One day the response just stopped mid-way. No clear error, no stack trace — the output simply ended. It took a second look at my logs to understand why: the June tool update made the 90-second limit on each code execution cell explicit, and my inspection loop had been quietly creeping toward that wall as the file count grew.
At first glance the limit looks like an annoyance. Reframe it as a design starting point, though, and it becomes a clean contract. The key is an asymmetry: the 90 seconds apply per cell, but the container's filesystem lives on across cells in the same session. Use that, and you can split a long job into sub-90-second cells, carrying progress forward and resuming as many times as you need.
This article narrows the carry-forward craft to three things: measuring, idempotent checkpoints, and the offload decision. My example is a batch job from my own work as an indie developer at Dolice Labs, but the approach is the same whether you are post-processing search results or bulk-converting image metadata.
The 90 seconds are per cell; the container keeps living — that asymmetry is the starting point
The first thing to internalize is that the limit applies to a single cell's execution, not the whole conversation. If one cell gets cut at 90 seconds, the container for that session stays alive. A file you wrote to /tmp in the previous cell is still readable from the next one.
Get this backwards and your design falls apart. The container's lifetime is tied to the conversation (the session); it is not rebuilt for each cell. So the strategy is simple. Split the work into units that fit inside 90 seconds, and at the end of each cell, write "how far I got" to a file in the container. The next cell picks up from there. That's the whole idea.
The flip side: cross a session boundary and the assumption breaks. A new conversation means a fresh container, and the /tmp progress is gone. If you need to carry state beyond a session, push progress to external storage (your own MCP server or an object store). This article stays inside the "finish a long batch within one session" scope. For the basics of the code execution tool itself, I cover them in automating CSV reports with the code execution tool; start there if the foundation feels shaky.
Measure one unit's wall time first — don't split by guesswork
Pick a chunk size by intuition and you'll usually miss. My first attempt was a lazy "one site per cell should be fine," and of course one site had far more files and hit 90 seconds again. Measuring the per-unit wall time first looks like a detour but is the shortcut.
import timedef measure_unit(items, process, sample=5): """Run only the first `sample` items for real to estimate per-item wall time.""" start = time.monotonic() n = min(sample, len(items)) for item in items[:n]: process(item) elapsed = time.monotonic() - start per_item = elapsed / n if n else 0.0 usable = 75.0 # the time budget discussed below safe_chunk = max(1, int(usable / per_item)) if per_item else len(items) print(f"{per_item*1000:.0f}ms/item -> safely ~{safe_chunk} items per cell") return safe_chunk# Example output:# 320ms/item -> safely ~234 items per cell
Use time.monotonic(), not time.time(). The latter can jump backward when the system clock is adjusted, which makes it unreliable for measuring elapsed time. The former is guaranteed to move forward, so it is safe for deadline checks like this.
When you actually measure, the numbers often surprise you. In my MDX inspection, parsing the MDX body dominated over the regex passes, and per-item time swung more with the number of code blocks than with file size. Measuring moves your split decision from "impression" to "number."
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Take a long batch that kept dying at 90 seconds and run it to completion by checkpointing across cells
✦Walk away with a resumable loop that persists progress to the container filesystem, plus an in-cell time-budget guard that bails out on its own
✦Draw a measured line between work that belongs in the code execution tool and work you should push to your own infrastructure
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Persist progress to a container file and resume across cells
This is the core. A checkpoint that writes "where to start next" to a single file is plenty. The one rule that matters: make the write atomic, so the body file never breaks even if the cell is cut at 90 seconds mid-write.
import json, osCKPT = "/tmp/batch_progress.json"def load_ckpt(): if os.path.exists(CKPT): with open(CKPT) as f: return json.load(f) return {"done": 0, "results": []}def save_ckpt(state): # Write to a temp file, then rename. rename is atomic on the same FS, # so even if we're cut off, CKPT itself stays at the last valid state. tmp = CKPT + ".tmp" with open(tmp, "w") as f: json.dump(state, f, ensure_ascii=False) os.replace(tmp, CKPT)
Going through os.replace() is the crux. If you json.dump() straight into the body file and the cell dies during that write, you're left with half-written, broken JSON. The next cell can't read it and restarts from scratch — the worst possible loop. Write the temp file completely, then rename, and the body file is only ever either a complete old state or a complete new one.
With save_ckpt in hand, the resume loop is surprisingly short.
state = load_ckpt()todo = all_items[state["done"]:] # skip what's already donefor item in todo: state["results"].append(process(item)) state["done"] += 1 save_ckpt(state) # record after each itemprint(f"Advanced to {state['done']}/{len(all_items)} this cell")
Run the same code in the next cell, and load_ckpt() reads done, while all_items[state["done"]:] returns only what's left. Finished work is never touched again. As long as the session continues, you can fire this cell as many times as you like and it keeps inching forward.
Put a time-budget guard in the cell — fold at 75 seconds on your own terms
That loop has a hole. With many items left, the cell still hits 90 seconds and gets cut. The checkpoint protects your progress, but the cut is a forced kill from the outside, and you don't control what happens in the last few hundred milliseconds. Better to fold just before that, on your own terms.
So set a deadline at the top of the cell and check the remaining time on every item.
import time, json, osCKPT = "/tmp/batch_progress.json"BUDGET_SEC = 75.0 # 15-second margin under the 90-second limitdef run_one_cell(all_items, process): start = time.monotonic() state = load_ckpt() todo = all_items[state["done"]:] for item in todo: if time.monotonic() - start > BUDGET_SEC: save_ckpt(state) print(f"PAUSED at budget. Stopped safely at {state['done']}/{len(all_items)}") return False # not done; continue next cell state["results"].append(process(item)) state["done"] += 1 save_ckpt(state) print(f"DONE all {len(all_items)} items") return True # complete
The 15-second margin fits my environment, where the heaviest item takes two to three seconds. The heavier a single item, the greater the risk that starting one more pushes you past the deadline, so set the margin "larger than your heaviest single item." Cross-check it against measure_unit's return value and tune BUDGET_SEC per environment.
The False / True return is the signal for the outside. In automation built around code execution, you read the completion sentinel printed to stdout (DONE all ... items); if it's not done, you simply ask the model to run the cell once more with a "keep going." Partial responses are now preserved when a stream is interrupted, so this sentinel approach pairs well with cut-resistant design. Keeping background jobs from crowding out user traffic — see isolating background workloads with service_tier — helps you decide where long-running work belongs.
Make checkpoints idempotent — no double work on re-run
The reliability of a cross-cell design comes down to one question: if you run the same cell again, does the result stay intact? That's idempotency.
The implementation above advances the done index monotonically and always resumes from all_items[state["done"]:], so completed elements are never processed twice. The thing to watch is when process() itself has side effects (writing files, writing to an external system). If you perform the side effect before advancing the index and get cut just before advancing, the next resume runs that same side effect again.
When side effects are involved, the safe order is not "side effect -> checkpoint update" but "make the output destination deterministic per element." If you derive a result filename from the element's slug, writing twice just overwrites with the same content — no duplication.
def process(item): # Derive the output path deterministically from item (re-run overwrites = idempotent) out = f"/tmp/out/{item['slug']}.json" os.makedirs(os.path.dirname(out), exist_ok=True) result = compute(item) tmp = out + ".tmp" with open(tmp, "w") as f: json.dump(result, f, ensure_ascii=False) os.replace(tmp, out) return out
Guarantee "re-running yields the same result" at the per-element level, and a slightly coarse checkpoint granularity won't break you. I apply the same idempotency-first principle to scheduled runs in splitting Managed Agents scheduled deploys from a home-grown cron.
Where to keep work in code execution vs. offload to your own infrastructure
Checkpointed chunking is powerful but not universal. A single indivisible unit that already exceeds 90 seconds can't be split no matter what. How far to lean on the code execution tool, and where to offload to your own infrastructure — I decide with this table.
Nature of the work
Where it goes
Why
One-shot pure compute that fits in 90s
Code execution tool
Completes in a single cell. No splitting machinery needed
N items, each light but large in total
Code execution + checkpointed chunking
Can advance across cells. The subject of this article
One indivisible unit over 90s (heavy transform, large inference)
Your own infra (a tool via MCP)
Can't be split, so hand it to an environment without the time limit
Needs a persistent outbound connection / long-lived process
Managed Agents / your own worker
Code execution assumes short-lived cells; not for residency
State must persist across sessions
External storage + your own logic
The container FS dies with the conversation
The nice thing about this line is that it speeds up the decision to retreat. Push too hard on "maybe checkpoints can make it work" and you end up cramming genuinely heavy work — which should go to your own infrastructure — into a cell. The moment you've measured one unit's wall time, you can see which row it lands on relative to 90 seconds. If the numbers are in, the right call is to offload without hesitation.
Three things that tend to trip you up
To close, three potholes I actually stepped in.
First, non-atomic checkpoint writes. Skip os.replace and json.dump straight into the body file, and a cut leaves broken JSON behind, forcing the next cell to redo everything. Don't omit the atomic write.
Second, mistaking the container lifetime. Assume /tmp progress survives across conversations and you'll panic when a different session "lost the progress." Confirm up front that carry-forward only works within the same session.
Third, I/O at too fine a granularity. This article calls save_ckpt per item, but at tens of thousands of items the write itself becomes the bottleneck. Then coarsen it — "save every N items, or once remaining time drops below half the budget." The checkpoint is a safety device; writing every time is not the goal.
When a long job gets cut at 90 seconds, start by measuring one unit's wall time. Once you have the number, how many to chunk per cell — and whether to keep it in code execution or offload — becomes a decision you can make without hesitation. I'm still tuning this as I run it, but making measurement the starting point is the one habit that has never let me down across any automation.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.