●CORPS — Anthropic unveils Claude Corps (Jun 11), a $150M national fellowship placing 1,000 early-career workers inside US nonprofits; the first cohort starts in October●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, up to 5 levels deep — multi-stage delegation workflows out of the box●WORKFLOWS — Dynamic workflows arrive in research preview across CLI, Desktop, and VS Code for codebase-wide bug hunts and large migrations (Max/Team/Enterprise)●BILLING — 2 days to the Jun 15 change: Agent SDK, headless runs, and GitHub Actions move to monthly credits ($20/$100/$200); Sonnet 4 and Opus 4 retire from the API the same day●FABLE5 — Fable 5 remains included free on Pro, Max, Team, and Enterprise through Jun 22●CODE80 — IPO coverage reports Claude now writes over 80% of its own code, up from under 10% in February 2025●CORPS — Anthropic unveils Claude Corps (Jun 11), a $150M national fellowship placing 1,000 early-career workers inside US nonprofits; the first cohort starts in October●SUBAGENTS — Claude Code sub-agents can now spawn their own sub-agents, up to 5 levels deep — multi-stage delegation workflows out of the box●WORKFLOWS — Dynamic workflows arrive in research preview across CLI, Desktop, and VS Code for codebase-wide bug hunts and large migrations (Max/Team/Enterprise)●BILLING — 2 days to the Jun 15 change: Agent SDK, headless runs, and GitHub Actions move to monthly credits ($20/$100/$200); Sonnet 4 and Opus 4 retire from the API the same day●FABLE5 — Fable 5 remains included free on Pro, Max, Team, and Enterprise through Jun 22●CODE80 — IPO coverage reports Claude now writes over 80% of its own code, up from under 10% in February 2025
Wiring Claude's Dreaming Into Memory Hygiene for Long-Running Agents
Managed Agents' Dreaming reviews past sessions and rewrites memory as a self-improvement loop. We unpack the published Harvey and Wisedocs numbers, build a complete self-hosted consolidation loop with the Anthropic SDK, and cover the production pitfalls.
When you let autonomous agents run four technical blogs as an indie developer, the memory files those agents write balloon to dozens of entries within a few months. At first it was a gift: the agent remembered "we tried this fix before and it failed." But as the count grew, old facts and new facts started colliding quietly. A note from six months ago said "this flag was disabled," while a newer note said "the same flag was re-enabled." The agent had no way to know which to trust, and its judgment dulled.
The Dreaming feature (research preview) for Managed Agents, announced at the June 2026 Code w/ Claude developer conference, tackles exactly this problem. It periodically reviews past sessions and memory, extracts patterns, and updates the agent itself. It folds the "memory inventory" I had been doing by hand into the agent's own operating loop. This article unpacks the thinking behind Dreaming, then shows a self-hosted consolidation loop you can build today with the Anthropic SDK, complete with working code.
Why agents with more memory often make worse decisions
Left alone, a long-running agent's memory grows monotonically. Append "what I learned" every session and you reach dozens of entries in half a year, sometimes past a hundred in a year.
The problem isn't the count itself; it's that facts of wildly different freshness sit side by side. In my case, three notes about the same setting existed across different points in time. Only the newest was correct; the other two were already wrong. The agent loaded all three as equally valid "past learnings" and got dragged back toward the old instructions, re-introducing a bug I had already fixed.
A human team periodically reviews its docs and deletes outdated lines. An agent's memory, however, tends to become an append-only log that nobody prunes. Dreaming makes more sense once you see it as a mechanism for handing that pruning to the agent itself.
Dreaming folds session records into knowledge you can use
The core of Dreaming is that it does not hoard raw session history. Instead it extracts recurring patterns and folds them into more general knowledge.
Suppose several sessions in a row read "the API timed out, so I retried with exponential backoff." A naive memory adds one record each time: "On date X, resolved a timeout with a retry." Dreaming looks across them and rewrites them into one level-up policy: "This workload sees intermittent timeouts, so exponential backoff should be the default from the start."
This is the same thing I was attempting by hand: delete the individual events, keep only the judgment they taught. The difference is that Dreaming runs that loop autonomously on a schedule.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How Dreaming folds past sessions into reusable knowledge, and how to read the published Harvey (~6x task completion) and Wisedocs (50% review-time cut) numbers
✦A complete Python consolidation loop built on the Anthropic SDK: duplicate detection, staleness flags, and index regeneration
✦Which workloads Dreaming helps versus hurts, plus three production pitfalls and their fixes
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
How to read the published numbers — Harvey 6x, Wisedocs 50%
Anthropic reports that with Dreaming, Harvey raised task completion roughly sixfold and Wisedocs cut review time by 50%. Those are striking figures, but you should discount them somewhat when carrying them into an indie-developer context.
First, these are workloads where accumulated memory maps directly to outcomes. Legal and medical document review treats consistency with past judgments as the quality itself, so memory hygiene flows straight into the success metric. For tasks that are nearly independent each time (a one-off translation, say), folding memory barely moves completion rates.
Second, "6x" assumes the baseline is an unmaintained memory. In my own experience, an agent with dirty memory degrades visibly, so the improvement from that state looks large. A big upside is, flipped around, a measure of how deeply things rot when you neglect it.
So I read these numbers not as "add Dreaming and go 6x faster," but as a warning: "neglect memory freshness and outcomes alone can drop to a fraction."
Building a self-hosted consolidation loop before Dreaming ships
Dreaming is still a research preview and isn't available everywhere yet. But its essence is a simple loop: periodically read memory, judge duplicates and staleness, fold, and write back. You can reproduce that with the Anthropic SDK alone.
The complete script below loads memory accumulated as JSON Lines, has Claude consolidate it, and writes it back. Each memory carries id, created, text, and status, and the result comes back as structured output.
import jsonimport osfrom datetime import datetime, timezonefrom pathlib import Pathfrom anthropic import Anthropicclient = Anthropic(api_key=os.environ["ANTHROPIC_API_KEY"])MEMORY_FILE = Path("memory.jsonl")MODEL = "claude-sonnet-4-6"def load_memories() -> list[dict]: if not MEMORY_FILE.exists(): return [] rows = [] for line in MEMORY_FILE.read_text(encoding="utf-8").splitlines(): line = line.strip() if line: rows.append(json.loads(line)) # only consolidate entries that are still active return [m for m in rows if m.get("status", "active") == "active"]CONSOLIDATION_PROMPT = """\You manage memory for a long-running agent.Below is a list of operational notes accumulated over time (oldest first).Consolidate them by the following rules and return JSON only.1. If new and old notes about the same target conflict, treat the newest as authoritative and mark the older ones stale.2. If the same lesson appears across 3+ separate events, fold them into one generalized policy.3. Mark one-off, non-reproducible records as archive.4. Do not alter the underlying facts of notes you keep.Return format:{ "keep": [{"id": "...", "text": "...", "merged_from": ["id1","id2"]}], "stale": ["id3", "id4"], "archive": ["id5"]}Notes:"""def consolidate(memories: list[dict]) -> dict: listing = "\n".join( f'- id={m["id"]} created={m["created"]} : {m["text"]}' for m in memories ) resp = client.messages.create( model=MODEL, max_tokens=4096, messages=[{"role": "user", "content": CONSOLIDATION_PROMPT + listing}], ) raw = resp.content[0].text start, end = raw.find("{"), raw.rfind("}") return json.loads(raw[start : end + 1])def apply_plan(memories: list[dict], plan: dict) -> list[dict]: by_id = {m["id"]: m for m in memories} now = datetime.now(timezone.utc).isoformat() result = [] demoted = set(plan.get("stale", [])) | set(plan.get("archive", [])) for item in plan.get("keep", []): base = by_id.get(item["id"], {}) result.append({ "id": item["id"], "created": base.get("created", now), "updated": now, "text": item["text"], "status": "active", "merged_from": item.get("merged_from", []), }) for mid in demoted: if mid in by_id: row = dict(by_id[mid]) row["status"] = "stale" if mid in plan.get("stale", []) else "archived" row["updated"] = now result.append(row) return resultdef main(): memories = load_memories() if len(memories) < 10: print(f"active memories = {len(memories)}: no consolidation needed") return plan = consolidate(memories) merged = apply_plan(memories, plan) MEMORY_FILE.write_text( "\n".join(json.dumps(m, ensure_ascii=False) for m in merged) + "\n", encoding="utf-8", ) kept = sum(1 for m in merged if m["status"] == "active") print(f"consolidated: active {len(memories)} -> {kept} entries")if __name__ == "__main__": main()
Three things matter here. First, carry a three-valued status (active, stale, archived) and never physically delete. Dreaming similarly keeps history, which is the safe design: a wrong call is reversible. Second, restrict the consolidation input to active only so stale notes are never re-read. Third, keep merged_from so you can trace why a policy emerged later.
Designing the consolidation prompt — what to keep, what to drop
The loop's quality is almost entirely down to the prompt. After several failures, here is the rubric I settled on.
keep: reproducible judgment rules, facts confirmed multiple times, constraints still in forcemark stale: old facts that conflict with a newer notemark archive: one-off events, records meaningful only in a single sessiondo not touch: the factual core of notes you keep (you may tidy wording, but never change the content)
Early on, I only told the model to "summarize and shorten." It helpfully rewrote the facts themselves: "disabled" got softened into "adjusted," and the crucial decision turned vague. Stating that folding (generalize) and summarizing are different made the difference. Folding derives a policy from many facts; summarizing shortens a single fact — and you must not do the latter to memory.
Frequency matters too. I run this not every session but only once active crosses ten entries. Running it every time forces folding before enough of a pattern exists, which loses information instead.
When it helps and when it backfires
Neither Dreaming nor a self-hosted loop is universal. In my operation, the workloads split cleanly.
It helps on long tasks where consistency with past judgment maps to quality: continuously touching the same codebase, handling the same customer, operating the same set of sites. Memory freshness directly governs the outcome. Harvey and Wisedocs fall into this shape.
It tends to backfire when tasks are mutually independent. For a workload that processes a different user's one-off request each time, "patterns" extracted from past sessions act as biases that don't fit the task in front of you. When I added a consolidation loop to a translation agent, it over-generalized past stylistic habits and started ignoring the new request's instructions. I fixed it by narrowing the consolidation scope to operational constraints only and keeping content-specific judgments out of memory entirely.
The rule of thumb in one sentence: do you want to reference past judgment on the next task? If yes, folding is worth it. If no, it's safer not to keep that in memory at all.
Three pitfalls I hit in production
Wiring a self-hosted consolidation loop into the Dolice Labs operation, I hit three reproducible pitfalls.
First, a race where another process reads memory mid-consolidation. If the agent reads memory while the write-back is in progress, it grabs a half-updated state. Writing to a temp file and swapping atomically with os.replace() fixed it. Overwriting directly with write_text opens that window.
Second, runaway folding. Over repeated passes, the model decided it could "make this cleaner" and merged two policies that should have stayed independent, losing one context. As a guard, I discard the consolidation and notify a human if keep drops by 30% or more versus the previous run. A sharp shrink is almost always a sign of over-folding.
Third, misjudged staleness. The newest note isn't always correct. A note from a temporary experiment, treated as "latest," once marked a correct older policy stale. I prevented this by labeling notes as "confirmed" or "experimental" beyond just created, and forbidding experimental notes from marking others stale.
While Dreaming is still a research preview, building these guards into your own loop means that when the official version lands, you already have your own sense of where to draw the line between what to delegate and what to keep in hand.
Your next step
Start by dumping your current agent's memory once and counting the active entries and how many conflicting pairs sit inside. If you're past ten entries with even one conflict, it's already dulling your agent's judgment. Run just the consolidate function above in a read-only mode with no write-back, and you'll see which notes Claude considers stale. Write back only once you agree with that call.
Since I made memory-freshness management an explicit part of operations, I've hit the same bug twice far less often. The longer you run an agent, the more tending its memory beats growing it.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.