Articles/Claude Code

⟐ Claude Code/2026-06-14Advanced

Running Claude Code Hooks as a Quality Gate Without Breaking Your Pipeline

An implementation note on running Claude Code Hooks as a safety valve for automation: when to block with exit code 2 versus JSON output, how to keep formatters from looping or over-blocking, and how to log every hook firing so misfires are traceable.

claude-code¹¹⁷ hooks¹¹ automation⁶² ci² reliability⁴

✦ Premium Article

As an indie developer running several sites unattended, the first time my Claude Code setup ground to a halt, the culprit was a hook I had written myself. I had put a formatter on PostToolUse meaning to "tidy up every file after it's written," and the formatting rewrote the file, which then got treated as more work, which triggered the formatter again — close to a back-and-forth that never settled.

Hooks don't ask the model to do something; they guarantee that something will happen. That power is exactly why a poorly designed hook can take down your whole automation. What follows is a set of hard-won notes — grounded in the documented behavior but shaped by actually running an automated publishing pipeline — on how to wield that power without breaking things.

Start with the contract: exit codes and stdout

Before writing any clever script, nail down the contract: what does a hook return to Claude Code? Get this wrong and you'll have gates that wave through what you meant to block, or block what you meant to allow.

A hook reports its result in two ways: the exit code, and JSON written to stdout.

The exit codes break down into three cases:

exit 0 — success. How stdout is treated depends on the hook type (more below).
exit 2 — a blocking error. stderr is fed back to Claude and the operation is stopped.
anything else (e.g. exit 1) — a non-blocking error. The user sees a warning, but execution continues.

For automation, exit 2 is the one that matters most. Every gate — "stop a dangerous command," "reject an edit that violates a rule" — is built from exit 2 plus a message on stderr. The corollary: if you mean to block but return exit 1, you get a warning and the operation still goes through, so it isn't a gate at all.

#!/usr/bin/env bash
# block-force-push.sh — stop a dangerous push from PreToolUse(Bash)
input=$(cat)                      # hooks receive JSON on stdin
cmd=$(echo "$input" | jq -r '.tool_input.command // empty')
 
if echo "$cmd" | grep -qiE 'git +push.*(--force|-f)\b'; then
  # write the reason to stderr and exit 2 -> fed back to Claude, operation stops
  echo "Force pushes are disabled on this repo. Consider --force-with-lease." >&2
  exit 2
fi
exit 0

Three things matter here. Hook input arrives on stdin as JSON, not as arguments; the block reason goes to stderr, not stdout; and you must read the right field (tool_input.command). My first version wrote the reason to stdout, which produced a confusing state: the block happened, but no feedback ever reached Claude.

Use JSON output to make "stop or continue" explicit

Exit-code control is simple and robust, but limited in expressiveness. When you want finer control — "block, but for this reason" or "continue, but inject extra context" — emitting JSON on stdout is the better fit.

For PreToolUse, returning JSON like this lets you express allow/deny without relying on the exit code:

#!/usr/bin/env bash
# guard-writes.sh — deny writes to protected paths from PreToolUse(Write|Edit)
input=$(cat)
path=$(echo "$input" | jq -r '.tool_input.file_path // empty')
 
case "$path" in
  *.env|*/secrets/*|*/.git/*)
    cat <<JSON
{
  "hookSpecificOutput": {
    "hookEventName": "PreToolUse",
    "permissionDecision": "deny",
    "permissionDecisionReason": "Protected path ($path). Make this change by hand if it's intentional."
  }
}
JSON
    exit 0
    ;;
esac
exit 0

The advantage of the JSON approach is that you express "deny" while keeping the exit code at 0. That separates a hook error (a script bug exiting non-zero) from a business-logic denial. In practice this distinction pays off later: your logs can tell whether the hook crashed or the rule rejected something.

Stop hooks expose a decision field. Returning "decision": "block" overrides Claude's attempt to stop and keeps it working per the reason you provide. A "don't stop until the tests are green" gate looks like this:

#!/usr/bin/env bash
# require-green-tests.sh — force continuation from a Stop hook if tests fail
input=$(cat)
# loop guard: if this hook already blocked once, don't block again
if [ "$(echo "$input" | jq -r '.stop_hook_active // false')" = "true" ]; then
  exit 0
fi
 
if ! npm test --silent >/tmp/test.log 2>&1; then
  cat <<JSON
{"decision": "block", "reason": "Tests are failing. Check /tmp/test.log, fix them, then finish."}
JSON
  exit 0
fi
exit 0

The single most important line is the stop_hook_active check. Without it, the Stop hook blocks -> works -> stops again -> blocks again, an endless round trip. As I'll get to below, most unattended-run incidents come from leaving this loop entrance open.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦When to block with exit code 2 versus JSON decision output, broken down per PreToolUse and Stop

✦How to put formatters and linters on hooks without triggering loops or false blocks

✦A JSONL observability wrapper that records every hook's firing, duration, and block count

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

The traps of putting a formatter on Hooks

Back to the opening story. Running a formatter on PostToolUse is convenient, but it has two traps: a re-edit loop, and hook slowness dragging down the whole session.

On re-edits: PostToolUse fires after a tool completes, so a formatter rewriting the file there doesn't recurse as a fresh Edit call. What actually bites is when the formatter fails, exits non-zero, PostToolUse is treated as blocking, and Claude keeps getting "formatting failed" feedback and retries the same fix over and over. The fix is simple: treat formatting as "must never stop the session" and always exit 0.

#!/usr/bin/env bash
# auto-format.sh — PostToolUse(Write|Edit|MultiEdit)
input=$(cat)
file=$(echo "$input" | jq -r '.tool_input.file_path // empty')
[ -z "$file" ] && exit 0
[ -f "$file" ] || exit 0
 
case "${file##*.}" in
  ts|tsx|js|jsx) npx prettier --write "$file" >/dev/null 2>&1 ;;
  py)            ruff format "$file" >/dev/null 2>&1 ;;
  rs)            rustfmt "$file" >/dev/null 2>&1 ;;
esac
# always succeed regardless of formatting result; a formatter failure must not stop the session
exit 0

Pairing >/dev/null 2>&1 with the trailing exit 0 is deliberate. Even if a formatter isn't installed, or a file has a syntax error that can't be formatted, the hook won't drag the session down with it. I keep formatting as a "nice to have" and leave the actual quality enforcement to a separate lint gate — a clear division of labor.

The second trap, slowness, also shows up in real runs. Hooks run synchronously on every tool call, so a 2–3 second format on every file noticeably slows the felt pace. I scope formatting to changed files only and push heavier type checks to the Stop hook (run once on response completion) rather than PostToolUse, balancing a light edit loop against final quality.

Use matcher so hooks don't run on unrelated tools

Hook configuration lives in .claude/settings.json. Put project-shared rules here, commit them to Git, and every teammate gets the same gates. A matcher (a tool-name regex) narrows what a hook applies to.

{
  "hooks": {
    "PreToolUse": [
      {
        "matcher": "Bash",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/block-force-push.sh" }
        ]
      },
      {
        "matcher": "Write|Edit|MultiEdit",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/guard-writes.sh" }
        ]
      }
    ],
    "PostToolUse": [
      {
        "matcher": "Write|Edit|MultiEdit",
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/auto-format.sh" }
        ]
      }
    ],
    "Stop": [
      {
        "hooks": [
          { "type": "command", "command": "bash .claude/hooks/require-green-tests.sh" }
        ]
      }
    ]
  }
}

Omit the matcher and the hook runs for every target of that event. Events that don't involve a tool, like Stop, take no matcher. Conversely, loading heavy work onto PreToolUse/PostToolUse without scoping the tools means even light operations like Read or Grep drag an extra script along, and the felt pace collapses. Deciding "on which event, and only for which tools" up front is the first step to staying responsive.

For precedence, both the user level (~/.claude/settings.json) and the project level (.claude/settings.json) are loaded, and if an event has multiple hooks they all run in order. Put the shared safety valves in the project and personal things like notification settings at the user level — that split is the easiest to manage.

Observability so you can trace a misfired hook

The scariest thing in unattended operation is the quiet failure: "a hook had been blocking every edit, and the overnight task made no progress at all." Because hooks act silently, it's hard to notice when they act too much. So I wrap every hook in an observer that appends one JSONL line per firing.

#!/usr/bin/env bash
# hook-observe.sh <name> <target-script> — record each firing, duration, and exit code
name="$1"; shift
target="$1"; shift
log="${CLAUDE_PROJECT_DIR:-.}/.claude/hook-metrics.jsonl"
input=$(cat)
start=$(date +%s.%N)
 
# run the real script with stdin, capturing its output and exit code
out=$(printf '%s' "$input" | bash "$target")
code=$?
 
end=$(date +%s.%N)
dur=$(awk "BEGIN{printf \"%.3f\", $end-$start}")
ts=$(date -u +%Y-%m-%dT%H:%M:%SZ)
# record a block (exit 2, or a deny/block in the JSON output)
blocked="false"
{ [ "$code" = "2" ] || echo "$out" | grep -qE '"(deny|block)"'; } && blocked="true"
 
printf '{"ts":"%s","hook":"%s","code":%s,"dur":%s,"blocked":%s}\n' \
  "$ts" "$name" "$code" "$dur" "$blocked" >> "$log"
 
# pass the real output and exit code through to Claude Code unchanged
printf '%s' "$out"
exit $code

With this wrapper in place you can, for instance, tally "which hook blocked how many times in the last 24 hours" after the fact.

# pull out only blocking firings and see their frequency
jq -c 'select(.blocked==true)' .claude/hook-metrics.jsonl \
  | jq -r '.hook' | sort | uniq -c | sort -rn

I glance at this weekly to catch hooks that block more than expected. One week, the protected-path guard was blocking ten times its usual rate; it turned out a newly added directory name happened to match a protection pattern. Without the log, that work would have been quietly stalling the whole time. Making it visible as a number lets you catch this "over-blocking" before it becomes an incident.

Recording duration (dur) too lets you watch for hooks getting slow. If the formatter's median creeps from 0.3s to 1.5s, that's a sign dependencies grew or target files got bigger — a cue to revisit how tightly I'm scoping the targets.

A minimal rollout order

Loading everything at once makes it impossible to tell which hook caused a stall. I recommend adding them one at a time, in this order:

Put only block-force-push.sh on PreToolUse(Bash) and confirm by hand that an exit 2 block fires as intended. This is where you get comfortable with the exit-code contract.
Add auto-format.sh to PostToolUse and, using a file that deliberately fails to format, verify it always exits 0 — that a failure doesn't stop the session.
Finally put require-green-tests.sh on Stop and, by deliberately turning a test red, verify the stop_hook_active loop guard holds. Only promote to production once these three are stable.

The benefit of this order is that you observe one failure mode per step. Growing the set one hook at a time stabilizes faster than loading five at once and watching the overnight task wiped out.

What to delegate to hooks, and what not to

To close, the line I've settled on. Hooks suit work that is deterministic, fast, and recoverable on failure: blocking dangerous commands, guarding protected paths, light formatting, checking tests before completion all meet that bar.

What I keep off hooks: heavy network-bound work, destructive writes to external services, and anything where the judgment is context-dependent and shifts. "Is this change safe to deploy?", for example, rarely reduces to a deterministic rule; forcing a hook to block it just breeds false rejections. I leave those to the Claude side (CLAUDE.md and prompts) and let hooks guard only the lines that must never be crossed.

Hooks are powerful — and precisely because they are, pairing the design of their stopping power with a way to observe over-blocking is what makes unattended operation something you can run with peace of mind. I hope it spares you a fall if you're building automation the same way.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.