CLAUDE LABJP
CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handlingCODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stableADMIN — New org model restrictions let administrators control which models are availableMCP — Structured output, remote MCP, and session resume all get more reliableMODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K outputLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per taskCODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handlingCODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stableADMIN — New org model restrictions let administrators control which models are availableMCP — Structured output, remote MCP, and session resume all get more reliableMODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K outputLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
Articles/API & SDK
API & SDK/2026-06-27Advanced

Designing the Give-Up Condition in Self-Repair Loops: Four Error Classes, Four Retry Budgets

LLM self-repair loops break on the fantasy that 'if you keep fixing, it eventually passes.' Classify errors into four classes, give each its own retry budget. Working TypeScript and real cost numbers included.

Claude API89self-repairretry designerror handling2production104

Premium Article

Late at night, the cost of an unattended generation pipeline spiked to three times the previous day.

Nothing had crashed. The opposite, in fact: every job was logged as "finally succeeded." Tracing it, one job had rebuilt the same output 27 times. The loop had dutifully thrown "try again" at an output that could never pass validation.

LLM self-repair loops have a quiet trap. Write the loop on the assumption that "fixing makes it pass," and it will try to fix errors that are unfixable, forever. What matters in production is not how cleverly you repair, but when you decide to stop.

This piece covers a design that classifies errors into four classes and assigns a retry budget per class. The code is shown in copy-and-run form.

Why naive retry breaks in production

The loop most of us write first looks like this.

// Anti-pattern: keep fixing until it passes
async function generateUntilValid(prompt: string) {
  while (true) {
    const out = await callClaude(prompt);
    if (validate(out).ok) return out;
    prompt = `${prompt}\n\nThe previous output failed validation. Please fix it.`;
  }
}

This is fine when a human is watching interactively. They give up after a few rounds and fix it by hand.

It breaks under unattended operation. A bug in the validator, a constraint that can't be satisfied, a transient model hiccup — none of these are solved by "please fix it." Yet the loop keeps spinning and keeps burning tokens.

The root issue is treating all retries uniformly. A transient overload (429 or 529) and a structurally unsatisfiable constraint demand completely different actions. The former heals if you wait; the latter never heals no matter how many times you resend.

Classify errors into four classes

What actually drives the retry decision in production is this four-way split. The trick is to divide by "how should I respond," not by where the error originates.

ClassTypical examplesCorrect responseSuggested retry budget
transient429 / 529 / timeout / network dropWait with exponential backoff and resend. Do not change the prompt5–7 (including backoff)
repairableBroken JSON / schema mismatch / missing required fieldRebuild once or twice with the error attachedUp to 2
semantic-invalidFactual error / constraint violation / quality-gate failureDo not repeat the same request. Change the approach itself1 (with a different strategy)
hard-fail401 / 400 (bad input) / model not found / unsatisfiable constraintAbort immediately. Hand to a human or fallback0

The value of this split is that the give-up condition falls out naturally per class. hard-fail is zero; semantic-invalid is one, with a different strategy. Plain retry of the same request only ever makes sense for transient.

The distinction between repairable and semantic-invalid is the most important one in the design. repairable means "the shape is broken," so showing the error fixes it. semantic-invalid means "the content does not meet the requirement," so repeating the same ask just goes in circles. Those 27 attempts at the top were a textbook case of mistaking semantic-invalid for repairable.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A classifier that maps errors into transient / repairable / semantic-invalid / hard-fail and assigns a separate retry budget per class (working TypeScript)
Why an unattended pipeline's token cost balloons several-fold without an explicit give-up condition, and how to set retry budgets and a hard cost ceiling
How to add structured attempt logs and fallbacks (switch strategy, hand off to a human) so a silent loop never spins forever undetected
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-21
Reserving Priority Capacity for User Traffic with service_tier
If you pay for Priority Tier but your user-facing responses still slow down at peak, the culprit is often your own background jobs eating the priority pool. Here is how to read service_tier, prove the contention, and isolate background work.
API & SDK2026-06-17
When Claude API Extracts the Wrong Value With Full Confidence — Designing the Verification Layer
When you extract invoices or contracts with Claude API, the scariest failure isn't an exception — it's plausible-but-wrong JSON. Here is how I build a verification layer that catches silent extraction errors with schema checks, arithmetic reconciliation, and dual-extraction agreement, in TypeScript.
API & SDK2026-06-17
Making the Numbers Add Up in a Multi-Tenant Claude API SaaS — Field Notes on Isolation and Cost Attribution
The first thing that breaks when you make a Claude API SaaS multi-tenant is the month-end reconciliation. Here are field notes on a single metering chokepoint, atomic counters, reconciling against Anthropic's bill, and proving tenant isolation with adversarial tests — with production TypeScript.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →