◈ Cowork/2026-06-29Advanced

Failing Loud on Stale Inputs: A Freshness Contract for Unattended Pipelines

How to stop a scheduled, unattended pipeline from silently shipping degraded work when its upstream data is empty or stale. We implement a freshness contract in bash that asserts recency, non-emptiness, and provenance, plus two real pitfalls I hit running Cowork scheduled tasks.

Cowork²⁹ Scheduled Tasks⁷ Automation³² Pipelines Reliability⁴

✦ Premium Article

I opened the scheduled-run log one morning and the output count had gone up by one, exactly as planned. Then I read what it had produced, and felt a small chill.

The shape was right, but the specific, hard-won details were missing. Tracing it back, the data file the job read had still been empty at that moment. An upstream refresh had run a few minutes late, and the downstream step happily concluded it had "successfully read an empty file" — then quietly shipped a thin result.

When you run things unattended, the scariest outcome is not a crash. It is a run that does not stop and finishes wrong anyway. Crashes get noticed. Silent degradation hides until a human re-reads the output days later.

I am an indie developer, and alongside my apps I run automated content updates across several sites using Cowork's scheduled tasks. After hitting this class of bug more than once, I gave up on patching symptoms. What I actually needed was to treat the input itself as something to verify on every run. This article turns that into what I call an input freshness contract.

Silent degradation is almost always an input problem

When output degrades quietly, the generation logic is rarely the broken part. Far more often the input reaching the generator is wrong, and nothing checks it. I kept seeing the same three shapes.

The first is an empty file. Misread a path with cat, or read a path that does not exist, and the shell returns an empty string and marches on. The generator dutifully treats "no reference data" as a valid state and produces something generic and inoffensive.

The second is a stale file. When an upstream refresh fails or lags, last run's file is still sitting there. The downstream step reads it as "today's data" and builds today's output from yesterday's context. The file exists and has content. What it lacks is recency.

The third is leftover residue. If you write to a fixed-name temp file, a failed write this run leaves the previous contents in place, and they get mixed in silently. This is nastier than empty or stale, because it looks plausible and so survives longer.

What they share is that the process appears to succeed. Unless you place an explicit gate that distrusts the input, the degradation slips straight through.

Treat the input as a contract: recency, non-emptiness, provenance

So I started treating upstream inputs not as an implicit assumption but as an explicit contract. I call it a contract because I want enforcement: if it is not satisfied, the downstream step does not get to run. I narrowed the conditions to three.

Recency: was the file updated within an acceptable window? For reference data regenerated daily, say within 24 hours. Past that threshold, it is "stale."

Non-emptiness: is there a meaningful minimum of content? Not merely "non-zero bytes," but a sensible minimum byte count per input, so a few dozen bytes of headers are not mistaken for real content.

Provenance: did the input come from the right source? For a cloned repository, does local HEAD match the remote tip? For a temp file, was it written by this run? This is the condition that stops you mistaking an old location for the latest one.

Only inputs that satisfy all three count as trustworthy; if even one is missing, the run halts. The key is to decide this at a mechanical gate, not to leave the judgment to the generator.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦The three failure modes — empty, stale, and leftover input — that make automated pipelines ship degraded output silently, and how to tell them apart

✦A roughly 60-line bash freshness gate that asserts recency, non-emptiness, and provenance before any generation runs

✦Design decisions shaped by two pitfalls I hit on Cowork scheduled runs: UTC date overwrites and a nobody-owned stale clone

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Writing the freshness gate

A contract only bites once it is code. What I use is a small bash gate that reads a manifest of inputs and checks each file's recency and non-emptiness. I did not want new dependencies, so it leans only on standard commands.

The manifest is just the inputs I want to verify, tab-separated.

# path<TAB>max_age_min<TAB>min_bytes
_documents/_reference_data/claudelab/reference_data.md	1440	500
_documents/_reference_data/claudelab/reference_keywords.txt	1440	200
_documents/_reference_data/_shared/tech_trends_2026.md	2880	400

Each line is one contract: updated within 1440 minutes (24 hours), at least 500 bytes. Fail that, and the input is not allowed through.

The gate walks the manifest top to bottom.

#!/usr/bin/env bash
# input_freshness_gate.sh — verify upstream inputs are fresh and non-empty; abort on breach
# usage: input_freshness_gate.sh <manifest.tsv>
set -euo pipefail
 
manifest="${1:?manifest path required}"
now=$(date +%s)
fail=0
 
while IFS=$'\t' read -r path max_age_min min_bytes; do
  [ -z "${path:-}" ] && continue
  case "$path" in \#*) continue ;; esac          # skip comment lines
 
  if [ ! -f "$path" ]; then
    echo "❌ missing: $path"; fail=1; continue    # missing == breeding ground for empty reads
  fi
 
  bytes=$(wc -c < "$path" | tr -d ' ')
  if [ "$bytes" -lt "$min_bytes" ]; then
    echo "❌ too small: $path (${bytes}B < ${min_bytes}B)"; fail=1; continue
  fi
 
  mtime=$(stat -c %Y "$path" 2>/dev/null || date -r "$path" +%s)
  age_min=$(( (now - mtime) / 60 ))
  if [ "$age_min" -gt "$max_age_min" ]; then
    echo "❌ stale: $path (age ${age_min}min > ${max_age_min}min)"; fail=1; continue
  fi
 
  echo "✅ ok: $path (${bytes}B / ${age_min}min)"
done < "$manifest"
 
[ "$fail" -eq 0 ] || { echo "🛑 freshness gate failed — abort"; exit 1; }
echo "✅ all inputs fresh"

Under sixty lines, but it earned its keep. The crucial parts are set -euo pipefail, so unexpected failures are not swallowed, and exit 1, so the downstream step never runs. I call this gate before invoking generation, and abort the whole run on a non-zero exit.

One easy trap: how you read mtime differs by OS. Linux containers want stat -c %Y; macOS wants date -r. Chaining both with || keeps it from being quietly wrong on either. I originally wrote it Linux-only, and on my Mac stat returned a different value and threw off the recency check.

Provenance is confirmed by matching HEAD

Recency and non-emptiness are measurable on a single file, but provenance needs one more step. If a cloned repository is your input, you have to confirm "is this clone actually current?" or you will mistake an old state for the latest.

I check this by comparing local HEAD to the remote tip.

git fetch --depth 1 origin main -q
local_sha=$(git rev-parse HEAD)
remote_sha=$(git rev-parse origin/main)
if [ "$local_sha" != "$remote_sha" ]; then
  echo "❌ stale clone: local ${local_sha:0:7} != remote ${remote_sha:0:7}"
  exit 1
fi

Provenance checks also pay off after the write. Once the generator produces output, assert that the output actually contains the tokens it should — proper nouns, the target slug — and treat a miss as failure. Turning "it should have been written" into "I confirmed it was written" removes a surprising number of silent drops.

Halt on breach, and make the halt visible

A contract that only halts is half a contract. Without a path to notice the halt, unattended runs turn it into a different degradation: "quietly produced nothing."

I use two layers. On a gate breach, I leave a log file whose name contains _FAILED, with the missing detail recorded. Separately, a next-morning check detects that zero outputs exist for the day. The first answers "why did it stop," the second catches "that it stopped at all." They play different roles.

I would strongly recommend designing the fallback, too. Ending at zero outputs just because an input was stale also halts whatever revenue or traffic path depended on it. In my case, when fresh generation is impossible I switch to improving existing output instead. What you want to stop is "degraded new generation," not "the entire day's activity."

The cost worry turned out to be mostly unfounded. All gates together add a few seconds per run, under 5% of total even counting the repository fetch. If those few seconds stop one output that would have made me wince on re-reading, I add them without hesitation.

Two pitfalls I actually hit

Design talk lands better with scars, so here are two of mine. Both came from not questioning the provenance of an input.

The first was a UTC date overwrite. I was saving scheduled-run logs under date-stamped filenames, but a container's date defaults to UTC. For a task that runs in the early morning Japan time, UTC is still the previous day. So the job thought yesterday's log was "today's," overwrote it, and the record was lost. Time is itself an input, and unless you pin its provenance — which timezone's "today" — it breaks quietly. The fix is trivial: state it, as in TZ=Asia/Tokyo date.

The second was a nobody-owned stale clone. A working clone I had persisted for speed was owned by another process and not writable, and it had frozen at a state from days earlier. Keep reading it unaware, and you mistake deleted, outdated information for current, and start re-fixing something you already fixed once. This very pitfall recurred today, while I was preparing the environment to write this article. That is exactly why I now include the clone itself in the input contract — verifying writability and HEAD match before using it.

The shared lesson is the mundane fact that a file or repo "being there" and "being the correct, current thing" are not the same. Unattended, you have to make the machine confirm that mundane fact every time.

Start from the one line worth defending

You do not have to build all of this at once. The first line worth defending is a handful of lines, right before you invoke generation, that confirm only one thing: the input is not empty. Even that prevents a large share of silent degradation.

From there, add the recency threshold, the HEAD-match for provenance, and the post-write assert in the order the pain teaches you. Mine was not complete from day one either; I added one gate each time I caused one incident.

Whether you can trust an unattended pipeline comes down, I think, to how thoroughly you are willing to distrust its inputs. Next time you write a scheduled task, before the generation logic, add the one line that confirms the input is genuinely new, non-empty, and from the right source. You should get one fewer cold-sweat morning.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.