●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
Logged as success, but it produced nothing — stopping silent failures in Cowork scheduled tasks with end-of-run assertions
A Cowork scheduled task exits 0, yet not a single artifact was produced. Trusting the exit code alone hides this silent failure. Here is how to turn your definition of done into end-of-run assertions that fail loudly with an evidence log.
A task you believe is running on schedule keeps logging "SUCCESS" — but you open the output folder and not a single file has appeared since last week. As an indie developer running several sites under Dolice, if you run a handful of unattended Cowork scheduled tasks, you will hit this exact accident at least once. I did, one morning when I sat down to review a batch of logs and realized that a recurring job marked "success" for three days straight had not actually written a single line. My stomach dropped.
The cruel part is that nothing crashed. No exception was raised. The exit code was 0. The scheduler history was green. And yet the deliverable count was zero. This article is about catching that silent success: not by trusting the exit code, but by turning the definition of done itself into assertions that fail loudly.
An exit code does not promise that work happened
We unconsciously read "exit 0" as "it worked." But all an exit code guarantees is that the last command that ran returned 0 — not that the job you actually wanted got done. Those are two completely different propositions.
Silent failure tends to arrive by one of three routes.
Route
What happens
Why it stays exit 0
Wrong write target
You think you generated a file, but it landed in a stale temp path or a directory that doesn't exist
cat > file itself succeeds. The contents just aren't where you meant
Empty input
You mistype the path of an input file, and processing proceeds on an empty string
cat wrong-path doesn't error — it returns empty. Downstream "succeeds" on nothing
Commit that never landed
An unset git identity means the commit silently does nothing, and the push goes green with "up to date"
There's no diff to push, so the push itself counts as a success
What they share is that every individual command is honestly returning 0. Not one command is lying. And yet the whole thing failed. That is precisely why staring at exit codes from above will never reveal it.
The first time this bit me was a freshly cloned repo where I had forgotten to set the git identity. git commit printed a warning and effectively did nothing; git push came back with "Everything up-to-date." The scheduler history stayed a clean green for three days while the remote gained not one line.
Write your definition of done out loud
The first step toward killing silent failure is not writing code. It is stating, as concrete observable facts, what it means for this job to have succeeded. If that stays vague, you have no way to decide what to assert.
For a recurring job that "reflects generated output into a repo," the definition of done decomposes like this:
The output file actually exists and its size exceeds a floor
The expected count matches the real file count (for a JA/EN pair, both sides are even)
The local commit SHA has changed from its value before the push
The remote SHA and the local SHA match
The point is that each of these is a fact you can check from the outside, not a "should be done." Not "I committed" but "the SHA changed." Not "I wrote the file" but "there is a file of at least the floor size at that path." Once you can make that translation, the assertions write themselves.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Stop trusting exit 0 on automation that quietly produced nothing — you'll be able to detect that silent failure mechanically
✦You'll get a reusable harness that turns your definition of done into end-of-run assertions, writes an evidence log on failure, and exits non-zero
✦You'll learn to separate three nasty states — empty success, partial success, and double production — using idempotency keys and post-run checks
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Once the definition of done is settled, place a single gate at the very end of the job that checks it mechanically. If even one condition is broken, refuse to write the success log and exit non-zero. That alone drags silent failure into the open.
#!/usr/bin/env bash# verify_done.sh — verify a job's definition of done before it exits# Usage: source at the end of the job, line up assert_* calls, close with finish_runset -uo pipefailEVIDENCE_LOG="${EVIDENCE_LOG:-/tmp/run_evidence_$(date +%s).log}"FAILED=0# Record failures instead of swallowing them. $1=condition, $2=observed valuefail() { FAILED=1 printf '[FAIL] %s | %s\n' "$1" "$2" | tee -a "$EVIDENCE_LOG" >&2}pass() { printf '[ OK ] %s | %s\n' "$1" "$2" | tee -a "$EVIDENCE_LOG"}# Condition 1: file exists and exceeds a byte floorassert_file_min() { local path="$1" min="${2:-200}" if [ ! -f "$path" ]; then fail "file_exists" "missing: $path"; return fi local size; size=$(wc -c < "$path") if [ "$size" -lt "$min" ]; then fail "file_min_size" "$path = ${size}B (< ${min}B)"; return fi pass "file_min_size" "$path = ${size}B"}# Condition 2: two counts match (e.g. a JA/EN pair)assert_count_match() { local label="$1" a="$2" b="$3" if [ "$a" != "$b" ]; then fail "count_match:$label" "left=$a right=$b"; return fi pass "count_match:$label" "$a == $b"}# Condition 3: commit SHA advanced from its prior valueassert_sha_changed() { local before="$1" after="$2" if [ "$before" = "$after" ]; then fail "sha_changed" "commit did not advance: $after"; return fi pass "sha_changed" "$before -> $after"}# Close: if any FAIL occurred, exit non-zerofinish_run() { if [ "$FAILED" -ne 0 ]; then printf '\n🛑 Definition of done not met. Refusing to log SUCCESS.\n' >&2 printf ' Evidence log: %s\n' "$EVIDENCE_LOG" >&2 exit 1 fi printf '\n✅ All conditions met.\n'}
The caller looks like this. Capturing SHA_BEFORE ahead of the push is essential — without it you cannot decide afterward whether anything changed.
source verify_done.shOUT="content/articles/en/cowork/example.mdx"SHA_BEFORE=$(git rev-parse HEAD)# ... generate, commit, push here ...SHA_AFTER=$(git rev-parse HEAD)REMOTE_SHA=$(git rev-parse '@{u}')assert_file_min "$OUT" 800assert_count_match "ja_en" \ "$(find content/articles/ja -name '*.mdx' | wc -l)" \ "$(find content/articles/en -name '*.mdx' | wc -l)"assert_sha_changed "$SHA_BEFORE" "$SHA_AFTER"assert_count_match "local_remote_sha" "$SHA_AFTER" "$REMOTE_SHA"finish_run # success or failure is only decided here
What I like about this shape is that finish_run holds the right to write the success log. If anything upstream broke, execution simply never reaches the logging step. The worst path — the log marching forward on a job that only thinks it finished — is physically sealed off.
Why "don't swallow it" is the whole point
You might think set -e (exit on error) would handle this. But most silent failures never become errors, so set -e won't catch them. An empty cat, a diff-less push — both exit 0.
set -e protects you only when a command explicitly fails; it sails straight past "the command succeeded but produced nothing." So the direction of the fix cannot be "propagate the error." It has to be "actively confirm that the work was produced." It's a shift in where the burden of proof sits: you make the job prove it succeeded.
This echoes the verification-before-completion discipline in Anthropic's internal skills: before you declare you're done, produce evidence that you're done. The reason we always keep an evidence log is so a human can later trace why it went red. Silent failures are hard to reproduce, so capturing the observed values at the moment of failure is the most valuable asset you can leave yourself.
Separate empty, partial, and double production
If you take silent failure seriously, two neighboring states also need attention. End-of-run assertions catch "zero output," but operations need a finer distinction.
State
Symptom
Remedy
Empty success
exit 0 but no artifact
End-of-run assertions (this article)
Partial success
only part generated, then ran out of steam
Count-match assertion + cleanup of half-built artifacts
Double production
a retry creates the same artifact twice
An idempotency key that checks "did I already do this?" first
Double production is the trap that pairs badly with retries. Once you design things to go red on a failed assertion and re-run, you create duplicates in the case where "last time was actually half a success." To avoid that, put an idempotency check at the very start of the job.
import hashlibimport osdef idempotency_key(*parts: str) -> str: """Build a stable key from the input combination. Same input, same key.""" raw = "\x1f".join(parts).encode("utf-8") return hashlib.sha256(raw).hexdigest()[:16]def already_done(key: str, ledger: str = ".run_ledger") -> bool: """True if this input was completed before. One key per line in the ledger.""" if not os.path.exists(ledger): return False with open(ledger, encoding="utf-8") as f: return any(line.strip() == key for line in f)def mark_done(key: str, ledger: str = ".run_ledger") -> None: with open(ledger, "a", encoding="utf-8") as f: f.write(key + "\n")# Usage: derive the key from inputs that uniquely identify the targetkey = idempotency_key("cowork", "2026-06-27", "example-topic")if already_done(key): print(f"⏭ Already completed: {key} — skipping")else: # ... generation ... mark_done(key) # write to the ledger only AFTER the conditions are met print(f"✅ Recorded completion: {key}")
Where you call mark_done matters. Write it not mid-generation but after the end-of-run assertions pass. That way a run that went red never lands in the ledger, so the next retry correctly redoes it. Write to the ledger at the start of generation instead, and a job that dies midway gets falsely recorded as "done" and is skipped forever — a brand new silent failure of its own making.
Don't trust the log itself too much
One last small operational habit that punches above its weight: always attach observed values like counts and SHAs to the success log. A log that says only "SUCCESS" gives a human no way to spot a silent failure after the fact.
With a log like this, even if some unknown route slips past the assertions, a human can notice the anomaly in the numbers. If Files: ja=683 en=683 shows the same figure for days on end, that itself is a sign of silent failure. A machine gate plus a human glance — two layers — feels very different in practice.
After switching to this design, the meaning of my morning habit of skimming the logs changed entirely. I used to glance and think "green, so we're fine." Now I look at whether the numbers are moving. Being green and the work actually progressing are, I have learned in my bones, two separate things.
Pick one of your own scheduled jobs and write out three observable facts that define "this has succeeded." Turning them into assertions can wait — that part comes easily afterward.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.