CLAUDE LABJP
BILLING — 1 day to the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents move to separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverFABLE5 — Claude Fable 5, a Mythos-class model billed as Anthropic's most capable generally available release, is usable in Claude Code v2.1.170+ (launched Jun 9)SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, with smarter model and region handlingENTERPRISE — Custom roles gain admin permissions, letting members reach billing and privacy settings without Owner accessPLUGINS — New plugin search plus better Chrome, VSCode, and terminal workflows; session, memory, and permission bugs fixedUI — New setting disables mouse-wheel scroll acceleration in fullscreen; the /model picker now shows model families correctlyBILLING — 1 day to the Jun 15 change: Agent SDK, headless runs, GitHub Actions, and third-party agents move to separate monthly credits ($20/$100/$200) metered at full API rates, no rolloverFABLE5 — Claude Fable 5, a Mythos-class model billed as Anthropic's most capable generally available release, is usable in Claude Code v2.1.170+ (launched Jun 9)SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, with smarter model and region handlingENTERPRISE — Custom roles gain admin permissions, letting members reach billing and privacy settings without Owner accessPLUGINS — New plugin search plus better Chrome, VSCode, and terminal workflows; session, memory, and permission bugs fixedUI — New setting disables mouse-wheel scroll acceleration in fullscreen; the /model picker now shows model families correctly
Articles/Claude Code
Claude Code/2026-06-14Advanced

A SubagentStop Hook That Grades Subagent Output and Sends It Back to Be Redone

When a Claude Code subagent occasionally returns rule-breaking work, a SubagentStop hook can grade it automatically and ask for a redo. Here is a working setup with code and field notes.

Claude Code149Subagents5Hooks6Quality Gate2Automation24

When you hand routine work to a subagent, nine times out of ten it comes back clean—but that one stray result that broke a rule you agreed on is the real problem. As an indie developer running several sites (Dolice Labs), I split article drafting out to a separate agent, and violations like "a banned word slipped in" or "too few headings" tend to get merged unnoticed on the busiest days.

Reviewing everything by hand would catch them, but then delegation buys you nothing. So I built a check that grades the work the instant a subagent finishes and sends it back for a redo when it falls short, using a SubagentStop hook.

SubagentStop sits right on the subagent's "submit" button

Claude Code hooks fire on several events, and the one dedicated to subagents is SubagentStop. It is distinct from Stop, which reacts to the parent agent halting; SubagentStop fires only the moment a subagent finishes returning its response. That is exactly the inspection line that sits before you accept the delivery.

What matters is that this hook can steer behavior through its exit code and JSON output. Printing {"decision": "block", "reason": "..."} to stdout tells Claude Code not to stop the subagent but to keep going with reason injected as an instruction. In other words, you can feed back "this part is wrong, fix it." Reusing the grading result directly as the rejection note is the heart of this design.

Pin the criteria to a JSON rubric, not prose

A vague bar like "write a good article" is unsuited to machine grading. Break it down until each violation can be judged unambiguously. The rubric I use for the drafting subagent looks like this.

{
  "min_h2": 4,
  "max_chars": 12000,
  "min_chars": 2500,
  "banned_words": ["sensational", "the best", "blazing", "godlike", "complete guide", "definitive"],
  "forbidden_openers": ["In this article", "This article will", "How was that"],
  "require_code_block": true
}

The key is that every item is either countable or decidable by string match. Subjective judgments (is it interesting, is it readable) do not belong here. Keep only "rule violations a machine can reject with certainty," and leave creative quality to the human and the main prompt. Drawing that boundary is what keeps the hook from looping on false positives.

The hook body: open the transcript from the stdin JSON

A SubagentStop hook receives session info as JSON on stdin. Inside it, transcript_path points to the subagent's conversation log (JSONL). The last assistant message is the deliverable, so we pull it out and pass it to the grader.

#!/usr/bin/env bash
# .claude/hooks/grade-subagent.sh
set -euo pipefail
 
INPUT="$(cat)"                       # the hook receives JSON on stdin
TRANSCRIPT="$(printf '%s' "$INPUT" | node -e \
  'let d="";process.stdin.on("data",c=>d+=c).on("end",()=>{
     console.log(JSON.parse(d).transcript_path || "")})')"
 
if [ -z "$TRANSCRIPT" ] || [ ! -f "$TRANSCRIPT" ]; then
  exit 0                             # nothing to grade, pass through
fi
 
node "$(dirname "$0")/grade.mjs" "$TRANSCRIPT"

The set -euo pipefail is there so that if the grader crashes, the hook does not silently count as a pass. A broken inspection line that quietly approves everything is the scariest failure mode for a quality gate.

The grader: read the tail of the transcript and match the rubric

Keep the grader deterministic (same input, same result). It calls no external API, so it is fast, free, and unaffected by network outages.

// .claude/hooks/grade.mjs
import { readFileSync } from "node:fs";
 
const RUBRIC = JSON.parse(
  readFileSync(new URL("./rubric.json", import.meta.url), "utf8")
);
 
const transcriptPath = process.argv[2];
const lines = readFileSync(transcriptPath, "utf8").trim().split("\n");
 
// Walk the JSONL from the end to find the last assistant text
let text = "";
for (let i = lines.length - 1; i >= 0; i--) {
  const ev = JSON.parse(lines[i]);
  if (ev.message?.role !== "assistant") continue;
  const blocks = ev.message.content ?? [];
  text = blocks.filter(b => b.type === "text").map(b => b.text).join("\n");
  if (text) break;
}
 
const fail = [];
const h2 = (text.match(/^##\s+/gm) ?? []).length;
if (h2 < RUBRIC.min_h2) fail.push(`${h2} H2 headings (need at least ${RUBRIC.min_h2})`);
 
const chars = [...text].length;
if (chars < RUBRIC.min_chars) fail.push(`${chars} chars (need at least ${RUBRIC.min_chars})`);
if (chars > RUBRIC.max_chars) fail.push(`${chars} chars (over the ${RUBRIC.max_chars} cap)`);
 
for (const w of RUBRIC.banned_words)
  if (text.includes(w)) fail.push(`contains banned word "${w}"`);
 
for (const o of RUBRIC.forbidden_openers)
  if (text.includes(o)) fail.push(`contains boilerplate phrase "${o}"`);
 
if (RUBRIC.require_code_block && !text.includes("```"))
  fail.push("no code block");
 
if (fail.length === 0) process.exit(0);   // pass: exit with no output
 
// fail: return a block decision as JSON; the reason becomes the redo instruction
console.log(JSON.stringify({
  decision: "block",
  reason:
    "The deliverable does not meet the quality rubric. Fix these and resubmit:\n" +
    fail.map(f => `- ${f}`).join("\n"),
}));
process.exit(0);

The trick is returning decision: block while still exiting with code 0. Exit code 2 also blocks, but returning JSON lets you pass reason straight to the subagent as an instruction, so it knows what to fix. Making the rejection note a bullet list visibly improved the accuracy of the regeneration.

Always add a guard against infinite loops

The first thing this setup got wrong was a subagent retrying forever on a violation it could not fix. If the rubric is too strict to satisfy structurally, block-and-regenerate never stops.

As a guard, count how many prior rejections appear in the transcript, and once it exceeds a limit, stop blocking and escalate to a human.

const blockCount = lines.filter(l => l.includes('"decision":"block"')).length;
if (fail.length && blockCount >= 2) {
  // if two redos still do not pass, stop and log it for a human
  console.error("[grade] retry limit. Escalate to manual review: " + fail.join(" / "));
  process.exit(0);          // do NOT block here = break the loop
}

For a grading gate, "not blocking forever" turned out to matter more in practice than "blocking automatically." A gate that never stops is like an alert that never stops: eventually everyone ignores it.

Three practical lessons from running it

First, I pulled the rubric out into a JSON file instead of hard-coding it. Criteria always change in operation. If you have to touch the script every time a new banned word appears, you will not keep it up.

Second, I kept the grader deterministic. The temptation to let a model grade is strong, but if the same deliverable passes or fails at random, the subagent cannot learn anything except "bad luck." Splitting the layers—mechanical violations in code, subjective quality in the main prompt and the human—proved far more stable.

Third, I always made the rejection reason concrete. Returning "3 H2 headings (need 4)" with numbers, instead of "insufficient quality," raises the odds the redo passes on the first try. The granularity of feedback directly sets how fast the self-correction loop converges.

As a next step, wire up a single SubagentStop hook with a minimal rubric of just banned_words and min_h2. Once the inspection line is running, adding criteria later is easy. I hope this helps anyone wrestling with the quality of delegated work.

Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

If you found this article helpful, a small tip ($1.50) would mean a lot to us. Your support helps keep this site ad-free and covers server and hosting costs.

Related Articles

Claude Code2026-05-23
Skill, Subagents, and Rules in Claude Code: A One-Hour Implementation Loop That Fits a Solo Operator
Misaki Ito at SonicGarden wrote about wiring Claude Code's Skill, Subagents, and Rules to close a week's worth of low-priority work in one meeting. Here is how I adapted that pattern as a solo developer running a 50M-download app business.
Claude Code2026-05-05
Building a Zero-Touch Code Review Environment with Claude Code Hooks
Learn how to use Claude Code's hook system to automatically review code on every tool execution. Covers PostToolUse, Stop hooks, and the pitfalls to avoid when implementing PreToolUse blockers.
Claude Code2026-04-27
Claude Code Hooks: A Complete Field Guide to All 8 Hook Types and How to Pick the Right One
Claude Code hooks are powerful, but most people give up before figuring out which event does what. Here's the field guide I wish I had when I started — six months of running hooks in production, distilled.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →