●DESIGN — Claude Design gets a major update: design-system imports, direct canvas editing, and more export formats●CODE — Claude Design can start from your local codebase and hand a design off to Claude Code to implement●FABLE — Fable 5, a Mythos-class model made safe for general use, is now available in Claude Code v2.1.170●FIX — Mid-stream connection drops now preserve partial responses instead of showing a raw error●SCROLL — A new wheelScrollAccelerationEnabled setting disables mouse-wheel scroll acceleration in fullscreen●TIER — The Claude Design beta is available to Pro, Max, Team, and Enterprise customers●DESIGN — Claude Design gets a major update: design-system imports, direct canvas editing, and more export formats●CODE — Claude Design can start from your local codebase and hand a design off to Claude Code to implement●FABLE — Fable 5, a Mythos-class model made safe for general use, is now available in Claude Code v2.1.170●FIX — Mid-stream connection drops now preserve partial responses instead of showing a raw error●SCROLL — A new wheelScrollAccelerationEnabled setting disables mouse-wheel scroll acceleration in fullscreen●TIER — The Claude Design beta is available to Pro, Max, Team, and Enterprise customers
Noticing From the Outside When a Scheduled Job Quietly Did Nothing
exit 0, but zero output. How to catch a silent no-op not from the job's own log but from an external heartbeat ledger and ground truth, written from running several sites on a nightly schedule as an indie developer.
One morning I opened the update log and a single line was missing — the line for a generation job that should have run overnight. The error log held nothing. The exit code was 0. So as far as the system was concerned, it had succeeded, yet not a single article had been added. There was no commit at that time in the git history either.
That state — "succeeded, but nothing was produced" — was the hardest to deal with. A crash is easier; at least you notice. This failed in silence, and worse, it believed it had succeeded. When you run several sites in sequence overnight as an indie developer, these silent no-ops slip in now and then. Today I want to write down how to notice them from the outside.
Start from the premise that "success" and "result" are different
We tend to treat the exit code as proof of a result. But exit 0 only guarantees that the last command returned no error. There are plenty of paths where the job finds nothing to generate and exits cleanly without creating anything.
In my case, the cause was usually one of these. A reference-data cat hit the wrong path and returned empty, so the job slid past topic selection in silence. The disk was full and a clone gave up halfway, yet the following steps still went through the motions. A model pause left generation empty-handed, with nothing to push. None of these are individual bugs; they share the same structure — a no-op that still counts as success.
So the goal of monitoring is not "did the job crash." It is "did the expected output actually come into existence." Narrowing to that one question makes the design much cleaner.
Why you cannot trust the job's own log
The first thing you want to reach for is writing a "done" line at the end of the job. I did exactly that. It does not work.
Silent no-ops happen precisely when the run takes a path off the main flow. And the "write the done log" step sits at the tail of that main flow. In other words, the situation that drops your log and the situation that drops your output share the same root. The one time it fails is the one time the line announcing failure never gets written. Self-reporting goes quiet at exactly the moment you need it most.
That realization became the starting point for the design. Move the observer outside the job. Base the verdict not on what the job says about itself, but on facts a third party — independent of the job — can see. Once you decide that, what to record becomes obvious.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Why a job's self-reported log cannot be trusted, and how to record a heartbeat to a JSONL ledger
✦The dead-man's-switch idea — absence as the signal — and the expected ⊆ observed reconciliation
✦A full watchdog that double-checks output against git ground truth instead of self-reports
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Instead of a self-reported success, ask the job to leave only a heartbeat. A single line that says, "at this time I produced this output, this many of them," appended to a shared ledger. No judgment. Just a timestamped fact.
I made the ledger a JSONL file, one record per line. Append-only, so it is hard to corrupt and easy to follow by machine or by eye later. At the end of a generation job, you add just a few lines.
# heartbeat.sh — call at the tail of a generation job. No judgment; just stamp what happened.# usage: heartbeat.sh <job_id> <repo_dir> <expected_min_articles>set -uo pipefailJOB_ID="$1"REPO_DIR="$2"EXPECTED_MIN="${3:-1}"LEDGER="${HEARTBEAT_LEDGER:-$HOME/ops/heartbeat.jsonl}"mkdir -p "$(dirname "$LEDGER")"# Ground truth: count ja articles added by the latest commit (do not self-count)cd "$REPO_DIR" || exit 0ADDED=$(git show --name-only --pretty=format: HEAD 2>/dev/null \ | grep -c '^content/articles/ja/.*\.mdx$')HEAD_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "nohead")TS=$(TZ=Asia/Tokyo date +%Y-%m-%dT%H:%M:%S%z)printf '{"job":"%s","ts":"%s","sha":"%s","added":%s,"expected_min":%s}\n' \ "$JOB_ID" "$TS" "$HEAD_SHA" "${ADDED:-0}" "$EXPECTED_MIN" >> "$LEDGER"
There is one deliberate choice here. The added value comes not from a number the job counted in its head, but from the files that actually entered the commit, via git show. The principle of excluding self-reports is enforced from the moment of stamping.
Absence is the signal — the dead-man's-switch idea
Here is the crux. Treat the heartbeat's absence, not its presence, as the signal. That is a dead-man's switch.
It helps to picture it this way. A train driver has to keep a handle held down; if they stop holding it for a set interval, the device brakes automatically. While the handle is held, nothing happens; the instant the hand lets go, it triggers. Our monitoring is the same — while jobs keep leaving heartbeats, it stays quiet, and it speaks up only the moment a beat goes missing.
In implementation, you prepare "the set of jobs that should have stamped today" (the expected set) and "the set that actually stamped" (the observed set), and check only whether expected ⊆ observed still holds.
The reconciliation runs in these steps.
Declare the job IDs scheduled for that weekday as the expected set
Read today's ledger lines and collect job IDs into the observed set
Subtract observed from expected and confirm the difference is empty
If it is not empty, report the missing job IDs as no-ops
Because you ask "did what was scheduled arrive," you also catch the complete silence of a job that never even started. Monitoring that waits for a crash log has its blind spot exactly there.
What the watchdog reads is ground truth, not self-reports
Even when a beat arrives, you cannot relax yet. There is a failure mode where the stamp exists but added is 0 — a hollow beat. So the watchdog cross-checks both the presence of the beat in the ledger and the ground truth in the repository.
Ground truth means: did the .mdx files under that repo's ja directory actually increase that day. Rather than trust the ledger's self-count, the watchdog recounts in its own cloned copy of the repo. When the two numbers disagree, it is the ledger that gets doubted.
The script below is the body of the monitor. Run it once a day, after every generation window has closed.
#!/usr/bin/env python3# watchdog.py — reconcile expected job heartbeats against repository ground truthimport json, subprocess, sysfrom datetime import datetimefrom pathlib import Pathfrom zoneinfo import ZoneInfoJST = ZoneInfo("Asia/Tokyo")LEDGER = Path.home() / "ops" / "heartbeat.jsonl"# Expected set per weekday (0=Mon .. 6=Sun). Declare rest days here to avoid false alarms.EXPECTED_BY_WEEKDAY = { 0: {"claudelab-premium", "gemilab-premium", "antigravitylab-premium", "rorklab-premium"},}for wd in range(1, 5): EXPECTED_BY_WEEKDAY[wd] = EXPECTED_BY_WEEKDAY[0]EXPECTED_BY_WEEKDAY[5] = {"weekend-content"}EXPECTED_BY_WEEKDAY[6] = {"weekend-content"}REPO_OF = { "claudelab-premium": Path.home() / "repos" / "claudelab.net", "gemilab-premium": Path.home() / "repos" / "gemilab.net", "antigravitylab-premium": Path.home() / "repos" / "antigravitylab.net", "rorklab-premium": Path.home() / "repos" / "rorklab.net",}def today_jst(): return datetime.now(JST).date()def load_today_beats(): beats = {} if not LEDGER.exists(): return beats for line in LEDGER.read_text(encoding="utf-8").splitlines(): try: rec = json.loads(line) ts = datetime.fromisoformat(rec["ts"]) except (json.JSONDecodeError, KeyError, ValueError): continue if ts.astimezone(JST).date() == today_jst(): beats[rec["job"]] = rec # keep the last stamp for a given job return beatsdef ground_truth_added(repo: Path) -> int: """The watchdog counts for itself. It does not trust the ledger's self-count.""" if not (repo / ".git").exists(): return -1 # no clone == cannot verify. Treat that as an anomaly too. since = today_jst().isoformat() out = subprocess.run( ["git", "-C", str(repo), "log", "--since", since, "--name-only", "--pretty=format:"], capture_output=True, text=True ).stdout return sum( 1 for ln in out.splitlines() if ln.startswith("content/articles/ja/") and ln.endswith(".mdx") )def main(): weekday = today_jst().weekday() expected = EXPECTED_BY_WEEKDAY.get(weekday, set()) beats = load_today_beats() missing, hollow = [], [] for job in sorted(expected): rec = beats.get(job) if rec is None: missing.append(job) # no beat == complete silence continue repo = REPO_OF.get(job) truth = ground_truth_added(repo) if repo else int(rec.get("added", 0)) if truth <= 0: hollow.append((job, rec.get("added", 0), truth)) # stamped, but not on the ground if not missing and not hollow: print(f"[OK] {today_jst()} confirmed output for all {len(expected)} expected jobs") return 0 if missing: print(f"[ALERT] silent no-op (no beat): {', '.join(missing)}") for job, claimed, truth in hollow: print(f"[ALERT] hollow beat (self={claimed} / measured={truth}): {job}") return 1if __name__ == "__main__": sys.exit(main())
The reason missing and hollow are kept apart is that they call for different responses. A job with no beat points to the scheduler side (a failed launch, a broken upstream dependency), while a hollow beat points to the generation logic (topic exhaustion, missing reference data). When the symptom narrows the cause, the next morning's investigation gets shorter.
Reducing false positives — how to define the expected set
This monitor lives or dies by the correctness of the expected set. If the expected set drifts, it keeps crying "no-op!" over a rest day's job, and eventually no one reads the alerts. Not turning it into the boy who cried wolf is the condition for keeping the monitor alive.
I try to hold to three things.
Declare the expected set by weekday, and update it in the same commit as the scheduler change. Never fix just one side
Make intentional pauses explicit, as a declaration of the empty set, so they are distinguishable from an implicit absence
On special days like the start of a month, confirm the expected set by hand the day before. Do not let exceptions through quietly
The second point matters most. Once you tell the monitor "nothing is scheduled today," the machine can tell whether an absence is normal or abnormal. You stop reacting to silence itself and start picking up only "silence that contradicts the plan."
The numbers I saw in operation, and where I tripped
Over about three weeks, I caught two silent no-ops and one hollow beat, all by the next morning. Previously I could not notice these until I happened to spot the missing line in the update log myself. Time from detection to root cause settled around 15 minutes on average, helped by the symptom split.
There were stumbles. At first I stamped the ledger date with a bare date, which read as the previous day in UTC and dropped records out of the same-day reconciliation. It only stabilized once I aligned both the stamp and the watchdog to TZ=Asia/Tokyo. Time zones, I was reminded, are the one thing to pin down first in any process whose subject is "when."
One more. If the watchdog itself dies, all monitoring goes silent. This is the eternal nesting problem of monitoring, and it cannot be fully removed. I drew a line: the watchdog alone carries an outward heartbeat (one line to a separate file when it manages to run), and I folded a check into my morning routine so that if it goes quiet for two days I notice by hand. The last rung is left to a person.
If you too run several jobs overnight, start by adding this one heartbeat line to the single job whose silence would hurt most. The expected set can grow later. I expanded mine from that first job, little by little. I hope it gives you a starting point if you are wrestling with the same quiet kind of failure.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.