CLAUDE LABJP
CODE — Claude Code adds Trusted Devices, verifying a machine before remote admin sessions beginCODE — CPU use drops about 37% during streaming, keeping long always-on automation steadierCODE — Fullscreen mouse-click controls, voice dictation fixes, and better Linux voice detection landAUTH — Static API keys can now be replaced with short-lived, scoped WIF credentialsTEAM — You can tag Claude directly in Slack and delegate tasks while you focus elsewhereWORKFLOW — Dynamic workflows arrive in research preview, breaking complex work into steps on their ownCODE — Claude Code adds Trusted Devices, verifying a machine before remote admin sessions beginCODE — CPU use drops about 37% during streaming, keeping long always-on automation steadierCODE — Fullscreen mouse-click controls, voice dictation fixes, and better Linux voice detection landAUTH — Static API keys can now be replaced with short-lived, scoped WIF credentialsTEAM — You can tag Claude directly in Slack and delegate tasks while you focus elsewhereWORKFLOW — Dynamic workflows arrive in research preview, breaking complex work into steps on their own
Articles/API & SDK
API & SDK/2026-06-28Advanced

Every Tool Call Succeeds, Yet Nothing Moves Forward: Detecting Stagnation in Unattended Agents

No errors, yet the agent keeps replaying the same move while your budget quietly drains. Here is how to detect a success-but-no-progress loop using a progress oracle and action fingerprints, with a working Python implementation that halts safely.

Claude Agent SDK11Agent designUnattended operationReliability3Cost optimization2

Premium Article

The article-generation pipeline I run overnight was still going in the morning.

I opened the logs. Not a single tool call had failed. read_file returned, edit_file returned, run_tests returned, then back to read_file. Everything was a clean 200, not one error line. And yet the output was barely different from six hours earlier.

We tend to guard against loops that fail: retry limits, circuit breakers, give-up budgets. But a loop that keeps succeeding while making no progress gives you nothing to trip on. No exception, no non-zero exit code.

As an indie developer running several sites unattended, this is the worst kind of failure I deal with. A failure you can notice. A loop that quietly stops advancing while draining the budget costs far more.

This article lays out how to detect that "succeeds but never stops" loop structurally, and how to halt it safely while leaving diagnostics behind.

Error budgets cannot catch stagnation

First, why the usual guardrails slip past this.

Almost every safety device in a typical agent loop counts failures.

GuardrailWhat it countsFires on a succeeding-but-stuck loop?
Retry limitConsecutive exceptionsNo (no exceptions thrown)
Circuit breakerError rateNo (error rate is 0)
Give-up budgetSelf-repair attemptsNo (no error to repair)
Turn limitTotal turnsLate (only after the whole budget is spent)

Only the turn limit eventually kicks in, but that is not detection of stagnation; it is exhaustion of the entire budget. With a 50-turn cap, you waste 49 turns before stopping. Far too late to call it cost control.

Stagnation has to be framed not as the absence of errors but as the absence of progress. And progress cannot be measured unless you define it explicitly. That is where the design starts.

"Stagnation" does not exist without a definition of progress

Stagnation means "no progress for a stretch of steps." Put the other way around: if you have not defined progress, you cannot define stagnation. The reason so many agent implementations cannot detect it is, I think, that they never had a progress oracle in the first place.

A progress oracle is a function that returns a roughly monotone number tied to the task goal. As long as the value keeps improving, the agent is moving forward.

from typing import Protocol
 
class ProgressOracle(Protocol):
    def score(self) -> float:
        """Higher means closer to the goal. Observes current state, no side effects."""
        ...
 
# Example: an oracle for a code-fixing task
class TestPassProgress:
    def __init__(self, run_tests):
        self._run_tests = run_tests  # () -> (passed:int, total:int)
 
    def score(self) -> float:
        passed, total = self._run_tests()
        if total == 0:
            return 0.0
        # Pass ratio as the main component, with a strong bonus for all-green
        return passed / total + (1.0 if passed == total else 0.0)

The natural metric for progress changes by task.

Task typeCandidate progress oracle
Code fixingTests passing / drop in remaining linter violations
Research gatheringSub-questions answered / new primary sources obtained
File organizingUnsorted files remaining (a decrease is progress)
Article writingQuality-gate items satisfied

Some tasks resist a progress oracle. For those, use behavioral novelty, described below, as a proxy for progress. A proxy is a weaker signal than the real thing, but it catches stagnation far sooner than nothing.

One thing I learned in practice: the progress oracle must be cheap and side-effect-free to call. If you run the full test suite every turn, the cost of detecting stagnation outgrows the work itself. Use a cached test result, or just peek at a counter for research. Keep the observation cost flat.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A concrete implementation that catches loops where every tool call succeeds but nothing advances, using a progress oracle plus action fingerprints
Logic that separates three distinct stagnation patterns: exact repeats, oscillation (A-B-A-B), and a drop in novelty
How to wire a stagnation budget into your agent loop so it halts safely and leaves structured diagnostics behind
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

API & SDK2026-06-02
Guard Your Agent's Destructive Operations with Pre- and Post-condition Contracts
A design for wrapping an autonomous agent's writes in deterministic pre- and post-condition checks. A contract gate stops the destructive operations that better prompts can never reliably prevent.
API & SDK2026-06-25
When the Previous Run Hasn't Finished and the Next One Starts: Leases and Fencing Tokens for Scheduled Agents
A scheduled agent that runs on a fixed clock can overtake itself and start twice. From the moment a naive lock breaks to leases, fencing tokens, and bounded catch-up — worked through with the implementation I actually run.
API & SDK2026-06-23
An Unattended Agent That Wakes Up to a Blank Machine Every Time
A scheduled, unattended agent wakes up on a fresh disposable machine every run: paths it cannot write to, a filesystem that has not finished booting, a working directory that has vanished. Here is how to design state recovery into the first thirty seconds, drawn from real operational logs.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →