Articles/Claude Code

⟐ Claude Code/2026-06-27Advanced

Will It Stay Light When You Run It Unattended? Observing and Capping Claude Code's Long-Session Memory

How to keep long, unattended Claude Code sessions from slowly getting heavier — with a tiny ps-based RSS sampler, a rolling-baseline watchdog, and session segmentation, shown with working scripts and a before/after comparison.

Claude Code¹⁶⁸ Automation²⁹ Memory² Headless Operations⁶

✦ Premium Article

There was a quiet line in the 2026-06-27 Claude Code update that I did not want to skim past: CPU and memory usage during streaming and long-running sessions were reduced. It is not a flashy feature, but for anyone who keeps Claude Code running unattended for hours, this kind of baseline improvement is the most welcome kind.

As an indie developer, I run headless Claude Code to auto-publish articles across several sites, and the failure I fear most is not a crash — it is the quiet one where the process slowly gets heavier and only the last few jobs of the batch get dropped. A crash is loud; a gradual slowdown that misses the final couple of jobs is easy to miss. This update softens that worry a notch, but if I lean on the app-side improvement alone and stop measuring, I will eventually fall back into the same hole.

This article builds the three operational layers I rely on: observe Claude Code's resident memory in a few dozen lines, detect the onset of bloat, and split long runs into segments so RSS levels off instead of climbing.

What Actually Becomes "Heavy" in a Long Unattended Session

The first thing to separate is that "memory" here means two different things. One is the context Claude carries (the context window), which drives token cost and latency. The other is the resident memory of the local Claude Code process itself (RSS: resident set size), which is how heavy the process looks to the OS. The 6/27 improvement mainly lowered the latter. Context-grooming techniques are a separate topic, so this article stays consistently on RSS.

In unattended runs, RSS tends to become a problem in a few recognizable shapes:

Stacking dozens of tool calls into a single session lets streaming buffers and intermediate state accumulate, and RSS creeps upward
If a VM or container has a low memory ceiling, the grown RSS hits it and the OS OOM killer takes the whole process down
Even without a kill, once swapping starts everything slows down and the later jobs miss their window

A crash is loud, so you notice it. The nasty case is the third one: it just "gets slow" without raising an error. That is exactly why we start from observation.

What the 6/27 Update Lowers — and What It Doesn't

The published improvement is a reduction in CPU and memory usage during streaming and long-running sessions. In other words, the same work now ramps RSS up more gently than before. That is genuinely helpful.

But what the app lowers is the basal metabolism. If your job crams hundreds of operations' worth of work into one session, even a gentler slope will reach the ceiling given enough time. The update delays arrival; it does not promise you can run forever. Observation and segmentation stack on top of the app's improvement without conflicting with it.

A useful framing: the app's memory reduction makes the hill less steep, while the operational practice in this article adds a flat landing partway up that brings you back down.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦How to keep a fixed eye on Claude Code's resident memory with a ps-based RSS sampler in a few dozen lines

✦Building a drift-tolerant watchdog using a rolling baseline and median absolute deviation to catch the start of bloat

✦Splitting long runs into bounded segments and using --resume to cap RSS while preserving context

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

The Smallest Mechanism to Observe Memory

The first step is not installing a monitoring platform. Periodically writing out RSS with ps tells you a surprising amount. The script below appends the combined RSS of a target process and its children to a CSV every 15 seconds.

#!/usr/bin/env bash
# rss-sample.sh — append combined RSS of a PID and its descendants to a CSV
# usage: ./rss-sample.sh <root_pid> <out.csv> [interval_sec]
set -euo pipefail
 
ROOT_PID="$1"
OUT="$2"
INTERVAL="${3:-15}"
 
# collect descendant PIDs recursively
collect_descendants() {
  local pid="$1"
  echo "$pid"
  for child in $(pgrep -P "$pid" 2>/dev/null || true); do
    collect_descendants "$child"
  done
}
 
# header (first time only)
[ -s "$OUT" ] || echo "ts_epoch,iso,proc_count,rss_kb" >> "$OUT"
 
while kill -0 "$ROOT_PID" 2>/dev/null; do
  pids="$(collect_descendants "$ROOT_PID" | sort -u)"
  count=0
  total=0
  for p in $pids; do
    # ps rss is in KB; ignore PIDs that no longer exist
    kb="$(ps -o rss= -p "$p" 2>/dev/null | tr -d ' ' || true)"
    if [ -n "${kb:-}" ]; then
      total=$(( total + kb ))
      count=$(( count + 1 ))
    fi
  done
  printf '%s,%s,%s,%s\n' "$(date +%s)" "$(date -u +%FT%TZ)" "$count" "$total" >> "$OUT"
  sleep "$INTERVAL"
done

From your headless wrapper, start this sampler in the background right after launching Claude Code.

# excerpt from run-with-sampling.sh
claude -p "$PROMPT" --output-format stream-json > run.log 2>&1 &
CLAUDE_PID=$!
 
./rss-sample.sh "$CLAUDE_PID" "metrics/rss-$(date +%Y%m%d-%H%M%S).csv" 15 &
SAMPLER_PID=$!
 
wait "$CLAUDE_PID"
kill "$SAMPLER_PID" 2>/dev/null || true

That alone gives you a time series of RSS for a single session: the peak, the slope of growth, and whether it ever flattens back down. For the first few runs, plotting and just looking is enough. In my environment, a light article-generation job peaked roughly in the 320–420 MB range, and when I stacked many jobs back-to-back into the same session, that figure slowly drifted upward. The absolute number matters less than the question: does it return to flat, or not?

Catching It With a Threshold — A Rolling-Baseline Watchdog

A fixed threshold (say, "warn above 600 MB") misfires easily across environments, because changing the machine changes the baseline. So instead, build the baseline from recent samples and judge by deviation from it. Using the median and the median absolute deviation (MAD) makes it resistant to transient spikes while reliably catching the onset of bloat.

#!/usr/bin/env python3
# rss-watchdog.py — read the CSV, detect drift via rolling median + MAD
import csv, sys, statistics
 
WINDOW = 20          # recent sample count
K = 6.0              # threshold coefficient (how many MADs to allow)
MIN_SAMPLES = 8      # minimum samples needed to form a baseline
 
def mad(xs, med):
    return statistics.median([abs(x - med) for x in xs]) or 1.0
 
def main(path):
    rss = []
    with open(path) as f:
        for row in csv.DictReader(f):
            rss.append(int(row["rss_kb"]))
 
    if len(rss) < MIN_SAMPLES:
        print("not enough samples"); return 0
 
    # use the stable region after the initial ramp as the baseline
    baseline = rss[2:2 + WINDOW] if len(rss) >= 2 + WINDOW else rss[2:]
    med = statistics.median(baseline)
    spread = mad(baseline, med)
 
    latest = rss[-1]
    threshold = med + K * spread
    drift = (latest - med) / med * 100
 
    print(f"baseline_median={med/1024:.0f}MB latest={latest/1024:.0f}MB "
          f"threshold={threshold/1024:.0f}MB drift={drift:+.1f}%")
 
    if latest > threshold:
        print("ALERT: RSS drifted above rolling baseline")
        return 1
    return 0
 
if __name__ == "__main__":
    sys.exit(main(sys.argv[1]))

If your operational loop reads the watchdog's exit code, you can turn it into a decision: "once it starts getting heavy, close out the current job at a safe boundary and move to the next segment." The crucial part is that detection does not mean killing the process immediately — advance the current work to a safe boundary, then cut. Killing mid-flight loses partial results.

Here is the fixed-threshold versus rolling-baseline trade-off laid out:

Aspect	Fixed threshold	Rolling baseline + MAD
Robustness to environment	Weak (needs per-machine tuning)	Strong (baselines on that run's stable region)
Transient-spike tolerance	Low (fires on instantaneous values)	High (absorbed by the median)
What it detects	Exceeding an absolute amount	Upward trend (drift)
Initial setup cost	Re-tune per environment	Reusable with just the K coefficient

Splitting a Long Session Into Bounded Segments

Observation and the watchdog are the "notice it" layer. The real fix is not to stack infinitely into one session in the first place. Split a long unattended job into a handful of bounded segments, and end the process at each segment boundary; RSS resets there and levels off.

What makes this work is the ability to resume a session while preserving context. If continuity across segments matters, --resume or --continue carries the prior session forward; if the jobs are fully independent, just start fresh each time. For my own site auto-posting, I use "one segment = one or two articles' worth" as a rule of thumb.

#!/usr/bin/env bash
# segmented-run.sh — run a long job split into bounded segments
set -euo pipefail
 
JOBS=("generate article A" "generate article B" "generate article C" "generate article D")
SEGMENT_SIZE=2   # jobs per segment
 
i=0
session_id=""
while [ "$i" -lt "${#JOBS[@]}" ]; do
  batch=("${JOBS[@]:i:SEGMENT_SIZE}")
  prompt="$(printf '%s\n' "${batch[@]}")"
 
  # use --resume if context must carry over; omit it if independent
  if [ -n "$session_id" ]; then
    claude -p "$prompt" --resume "$session_id" --output-format stream-json > "seg-$i.log" 2>&1
  else
    claude -p "$prompt" --output-format stream-json > "seg-$i.log" 2>&1
  fi
 
  # move to the next segment; the process exits each time, so RSS resets here
  session_id="$(grep -o '"session_id":"[^"]*"' "seg-$i.log" | head -1 | cut -d'"' -f4 || true)"
  i=$(( i + SEGMENT_SIZE ))
done

The key is that the claude process always exits at a segment boundary. The next segment launches as a new process, so the resident memory the previous segment was holding is returned to the OS. If the app's memory reduction makes the hill less steep, this landing is the flat stretch you build on the operations side.

Before / After — One Long Session vs Segmented

Let's compare the RSS behavior of the same four jobs' worth of work, run once in a single long session versus split into segments of two. Reduced to the smallest code, the idea looks like this.

# Before: stack everything into one session (RSS tends to climb)
claude -p "$(printf '%s\n' "${ALL_JOBS[@]}")" --output-format stream-json > run.log 2>&1

# After: split into segments, ending the process at each boundary (RSS resets at each segment start)
for batch in "${SEGMENTS[@]}"; do
  claude -p "$batch" --output-format stream-json > "seg.log" 2>&1
done

The RSS trend I measured in my environment looked roughly like the following. The numbers depend on the machine, so read the shape (does it keep climbing, or return to flat?) rather than the absolute values.

Metric	Before (single session, 4 jobs)	After (2 jobs x 2 segments)
RSS peak	~690 MB	~430 MB
RSS at exit	~660 MB (stays high)	returns to ~320 MB at each segment start
Time for later jobs	visibly longer than the early ones	roughly constant across segments
OOM headroom (assuming a 2 GB ceiling)	~34% consumed at peak	~21% consumed at peak

This is not about chasing dramatic numbers. The goal is to keep the peak well clear of the ceiling and to avoid staying high at exit. When it stays high at exit like the Before case, chaining the next task on the same VM carries the leftover forward. When it returns at each segment start like the After case, you can keep going for many segments without nearing the ceiling.

Reading the Numbers and Turning Them Into Decisions

Once you have numbers, there are only three things to watch.

First, how far the peak sits from the ceiling. With a 2 GB ceiling and a 1.5 GB peak, a single spike is within reach of OOM — a cue to shrink segments or raise the ceiling. Second, whether RSS returns at the start of each segment. If it accumulates instead of returning, suspect that the process is not actually exiting (a lingering background child, for instance). Third, whether the time per job stays constant across segments. If only the later ones stretch, that is a sign swapping has begun.

When you wire this into an operational loop, use the watchdog's alert as a signal to bring the next segment boundary forward — never as a trigger to forcibly kill the running process. A forced kill invites loss of partial results and, in the worst case, an inconsistent cleanup afterward. Closing out at a safe boundary, consistently, ends up dropping fewer jobs.

A Next Step

Start with observation alone. It is enough to add one rss-sample.sh to a job you already run unattended and capture a single run's RSS time series to a CSV. Where is the peak, and does it stay high at exit? Once you can see that, the right segment granularity decides itself. Ride the app-side memory reduction, but keep one eye on the measurements. That alone prevents most of the quiet drop-outs.

I hope this helps anyone wrestling with the same unattended-operation headaches. Thank you for reading.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.