●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
Will It Stay Light When You Run It Unattended? Observing and Capping Claude Code's Long-Session Memory
How to keep long, unattended Claude Code sessions from slowly getting heavier — with a tiny ps-based RSS sampler, a rolling-baseline watchdog, and session segmentation, shown with working scripts and a before/after comparison.
There was a quiet line in the 2026-06-27 Claude Code update that I did not want to skim past: CPU and memory usage during streaming and long-running sessions were reduced. It is not a flashy feature, but for anyone who keeps Claude Code running unattended for hours, this kind of baseline improvement is the most welcome kind.
As an indie developer, I run headless Claude Code to auto-publish articles across several sites, and the failure I fear most is not a crash — it is the quiet one where the process slowly gets heavier and only the last few jobs of the batch get dropped. A crash is loud; a gradual slowdown that misses the final couple of jobs is easy to miss. This update softens that worry a notch, but if I lean on the app-side improvement alone and stop measuring, I will eventually fall back into the same hole.
This article builds the three operational layers I rely on: observe Claude Code's resident memory in a few dozen lines, detect the onset of bloat, and split long runs into segments so RSS levels off instead of climbing.
What Actually Becomes "Heavy" in a Long Unattended Session
The first thing to separate is that "memory" here means two different things. One is the context Claude carries (the context window), which drives token cost and latency. The other is the resident memory of the local Claude Code process itself (RSS: resident set size), which is how heavy the process looks to the OS. The 6/27 improvement mainly lowered the latter. Context-grooming techniques are a separate topic, so this article stays consistently on RSS.
In unattended runs, RSS tends to become a problem in a few recognizable shapes:
Stacking dozens of tool calls into a single session lets streaming buffers and intermediate state accumulate, and RSS creeps upward
If a VM or container has a low memory ceiling, the grown RSS hits it and the OS OOM killer takes the whole process down
Even without a kill, once swapping starts everything slows down and the later jobs miss their window
A crash is loud, so you notice it. The nasty case is the third one: it just "gets slow" without raising an error. That is exactly why we start from observation.
What the 6/27 Update Lowers — and What It Doesn't
The published improvement is a reduction in CPU and memory usage during streaming and long-running sessions. In other words, the same work now ramps RSS up more gently than before. That is genuinely helpful.
But what the app lowers is the basal metabolism. If your job crams hundreds of operations' worth of work into one session, even a gentler slope will reach the ceiling given enough time. The update delays arrival; it does not promise you can run forever. Observation and segmentation stack on top of the app's improvement without conflicting with it.
A useful framing: the app's memory reduction makes the hill less steep, while the operational practice in this article adds a flat landing partway up that brings you back down.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How to keep a fixed eye on Claude Code's resident memory with a ps-based RSS sampler in a few dozen lines
✦Building a drift-tolerant watchdog using a rolling baseline and median absolute deviation to catch the start of bloat
✦Splitting long runs into bounded segments and using --resume to cap RSS while preserving context
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
The first step is not installing a monitoring platform. Periodically writing out RSS with ps tells you a surprising amount. The script below appends the combined RSS of a target process and its children to a CSV every 15 seconds.
#!/usr/bin/env bash# rss-sample.sh — append combined RSS of a PID and its descendants to a CSV# usage: ./rss-sample.sh <root_pid> <out.csv> [interval_sec]set -euo pipefailROOT_PID="$1"OUT="$2"INTERVAL="${3:-15}"# collect descendant PIDs recursivelycollect_descendants() { local pid="$1" echo "$pid" for child in $(pgrep -P "$pid" 2>/dev/null || true); do collect_descendants "$child" done}# header (first time only)[ -s "$OUT" ] || echo "ts_epoch,iso,proc_count,rss_kb" >> "$OUT"while kill -0 "$ROOT_PID" 2>/dev/null; do pids="$(collect_descendants "$ROOT_PID" | sort -u)" count=0 total=0 for p in $pids; do # ps rss is in KB; ignore PIDs that no longer exist kb="$(ps -o rss= -p "$p" 2>/dev/null | tr -d ' ' || true)" if [ -n "${kb:-}" ]; then total=$(( total + kb )) count=$(( count + 1 )) fi done printf '%s,%s,%s,%s\n' "$(date +%s)" "$(date -u +%FT%TZ)" "$count" "$total" >> "$OUT" sleep "$INTERVAL"done
From your headless wrapper, start this sampler in the background right after launching Claude Code.
That alone gives you a time series of RSS for a single session: the peak, the slope of growth, and whether it ever flattens back down. For the first few runs, plotting and just looking is enough. In my environment, a light article-generation job peaked roughly in the 320–420 MB range, and when I stacked many jobs back-to-back into the same session, that figure slowly drifted upward. The absolute number matters less than the question: does it return to flat, or not?
Catching It With a Threshold — A Rolling-Baseline Watchdog
A fixed threshold (say, "warn above 600 MB") misfires easily across environments, because changing the machine changes the baseline. So instead, build the baseline from recent samples and judge by deviation from it. Using the median and the median absolute deviation (MAD) makes it resistant to transient spikes while reliably catching the onset of bloat.
#!/usr/bin/env python3# rss-watchdog.py — read the CSV, detect drift via rolling median + MADimport csv, sys, statisticsWINDOW = 20 # recent sample countK = 6.0 # threshold coefficient (how many MADs to allow)MIN_SAMPLES = 8 # minimum samples needed to form a baselinedef mad(xs, med): return statistics.median([abs(x - med) for x in xs]) or 1.0def main(path): rss = [] with open(path) as f: for row in csv.DictReader(f): rss.append(int(row["rss_kb"])) if len(rss) < MIN_SAMPLES: print("not enough samples"); return 0 # use the stable region after the initial ramp as the baseline baseline = rss[2:2 + WINDOW] if len(rss) >= 2 + WINDOW else rss[2:] med = statistics.median(baseline) spread = mad(baseline, med) latest = rss[-1] threshold = med + K * spread drift = (latest - med) / med * 100 print(f"baseline_median={med/1024:.0f}MB latest={latest/1024:.0f}MB " f"threshold={threshold/1024:.0f}MB drift={drift:+.1f}%") if latest > threshold: print("ALERT: RSS drifted above rolling baseline") return 1 return 0if __name__ == "__main__": sys.exit(main(sys.argv[1]))
If your operational loop reads the watchdog's exit code, you can turn it into a decision: "once it starts getting heavy, close out the current job at a safe boundary and move to the next segment." The crucial part is that detection does not mean killing the process immediately — advance the current work to a safe boundary, then cut. Killing mid-flight loses partial results.
Here is the fixed-threshold versus rolling-baseline trade-off laid out:
Aspect
Fixed threshold
Rolling baseline + MAD
Robustness to environment
Weak (needs per-machine tuning)
Strong (baselines on that run's stable region)
Transient-spike tolerance
Low (fires on instantaneous values)
High (absorbed by the median)
What it detects
Exceeding an absolute amount
Upward trend (drift)
Initial setup cost
Re-tune per environment
Reusable with just the K coefficient
Splitting a Long Session Into Bounded Segments
Observation and the watchdog are the "notice it" layer. The real fix is not to stack infinitely into one session in the first place. Split a long unattended job into a handful of bounded segments, and end the process at each segment boundary; RSS resets there and levels off.
What makes this work is the ability to resume a session while preserving context. If continuity across segments matters, --resume or --continue carries the prior session forward; if the jobs are fully independent, just start fresh each time. For my own site auto-posting, I use "one segment = one or two articles' worth" as a rule of thumb.
#!/usr/bin/env bash# segmented-run.sh — run a long job split into bounded segmentsset -euo pipefailJOBS=("generate article A" "generate article B" "generate article C" "generate article D")SEGMENT_SIZE=2 # jobs per segmenti=0session_id=""while [ "$i" -lt "${#JOBS[@]}" ]; do batch=("${JOBS[@]:i:SEGMENT_SIZE}") prompt="$(printf '%s\n' "${batch[@]}")" # use --resume if context must carry over; omit it if independent if [ -n "$session_id" ]; then claude -p "$prompt" --resume "$session_id" --output-format stream-json > "seg-$i.log" 2>&1 else claude -p "$prompt" --output-format stream-json > "seg-$i.log" 2>&1 fi # move to the next segment; the process exits each time, so RSS resets here session_id="$(grep -o '"session_id":"[^"]*"' "seg-$i.log" | head -1 | cut -d'"' -f4 || true)" i=$(( i + SEGMENT_SIZE ))done
The key is that the claude process always exits at a segment boundary. The next segment launches as a new process, so the resident memory the previous segment was holding is returned to the OS. If the app's memory reduction makes the hill less steep, this landing is the flat stretch you build on the operations side.
Before / After — One Long Session vs Segmented
Let's compare the RSS behavior of the same four jobs' worth of work, run once in a single long session versus split into segments of two. Reduced to the smallest code, the idea looks like this.
# Before: stack everything into one session (RSS tends to climb)claude -p "$(printf '%s\n' "${ALL_JOBS[@]}")" --output-format stream-json > run.log 2>&1
# After: split into segments, ending the process at each boundary (RSS resets at each segment start)for batch in "${SEGMENTS[@]}"; do claude -p "$batch" --output-format stream-json > "seg.log" 2>&1done
The RSS trend I measured in my environment looked roughly like the following. The numbers depend on the machine, so read the shape (does it keep climbing, or return to flat?) rather than the absolute values.
Metric
Before (single session, 4 jobs)
After (2 jobs x 2 segments)
RSS peak
~690 MB
~430 MB
RSS at exit
~660 MB (stays high)
returns to ~320 MB at each segment start
Time for later jobs
visibly longer than the early ones
roughly constant across segments
OOM headroom (assuming a 2 GB ceiling)
~34% consumed at peak
~21% consumed at peak
This is not about chasing dramatic numbers. The goal is to keep the peak well clear of the ceiling and to avoid staying high at exit. When it stays high at exit like the Before case, chaining the next task on the same VM carries the leftover forward. When it returns at each segment start like the After case, you can keep going for many segments without nearing the ceiling.
Reading the Numbers and Turning Them Into Decisions
Once you have numbers, there are only three things to watch.
First, how far the peak sits from the ceiling. With a 2 GB ceiling and a 1.5 GB peak, a single spike is within reach of OOM — a cue to shrink segments or raise the ceiling. Second, whether RSS returns at the start of each segment. If it accumulates instead of returning, suspect that the process is not actually exiting (a lingering background child, for instance). Third, whether the time per job stays constant across segments. If only the later ones stretch, that is a sign swapping has begun.
When you wire this into an operational loop, use the watchdog's alert as a signal to bring the next segment boundary forward — never as a trigger to forcibly kill the running process. A forced kill invites loss of partial results and, in the worst case, an inconsistent cleanup afterward. Closing out at a safe boundary, consistently, ends up dropping fewer jobs.
A Next Step
Start with observation alone. It is enough to add one rss-sample.sh to a job you already run unattended and capture a single run's RSS time series to a CSV. Where is the peak, and does it stay high at exit? Once you can see that, the right segment granularity decides itself. Ride the app-side memory reduction, but keep one eye on the measurements. That alone prevents most of the quiet drop-outs.
I hope this helps anyone wrestling with the same unattended-operation headaches. Thank you for reading.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.