CLAUDE LABJP
MODEL — Claude Fable 5 reached general availability on June 9 with a 1M-token context, always-on adaptive thinking, and 128K outputPLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hourMCP — Admins can provision MCP connectors org-wide via Okta, giving users zero-touch access on first loginSANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP serversCODING — Opus 4.8 scores 72.5% on SWE-bench and 43.2% on Terminal-bench, excelling at long-running workLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per taskMODEL — Claude Fable 5 reached general availability on June 9 with a 1M-token context, always-on adaptive thinking, and 128K outputPLATFORM — The Developer Platform adds code execution, an MCP connector, a Files API, and prompt caching up to one hourMCP — Admins can provision MCP connectors org-wide via Okta, giving users zero-touch access on first loginSANDBOX — Claude Managed Agents now run in your own sandbox and connect to private MCP serversCODING — Opus 4.8 scores 72.5% on SWE-bench and 43.2% on Terminal-bench, excelling at long-running workLINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
Articles/Cowork
Cowork/2026-06-23Advanced

Stopping an Unattended Writer From Publishing the Same Article Twice

When a Cowork scheduled task generates articles every day, the real danger isn't a crash — it's quietly publishing a piece that overlaps with one from a few days ago. Here is a gate that compares slug similarity and the day's log before publishing, built from a near-miss I caught this morning.

Cowork26scheduled tasks7duplicate detectionautomation72Python16SEO2content operations

Premium Article

This morning my Cowork scheduled task was about to start writing on extending the prompt cache TTL from five minutes to one hour. Had it slipped through and published, it would have produced a near-twin of claude-api-prompt-cache-5m-1h-two-tier-ttl-design, the piece I shipped half a year ago — same substance, different URL.

When you generate articles unattended every day, the scariest failure isn't a crash. A crash stops, lands in the log, and you notice it the next morning. The truly dangerous failure is the one that never stops: calmly publishing pieces that overlap with something from a few days back. Each one reads fine in isolation, so nobody catches it unless a human reviews every article. And to Google, this is the textbook behavior of a site mass-producing thin content.

Running four sites on unattended scheduled tasks for the past six months, I've come to see this "quiet duplication" as the single biggest thing eroding search standing. Today I want to share the countermeasure: a gate that, before publishing, compares slug proximity and the day's log and stops the run if a duplicate looks likely — with the actual code that's running in production.

Why a count check can't catch duplicates

Most auto-publishing pipelines confirm that the Japanese and English article counts match right before push. That's essential for avoiding 404s, but it does nothing for duplicate detection. Matching counts on overlapping content just means one well-counted duplicate has been added.

Exact title matching doesn't help either. An unattended task phrases each title slightly differently, so "Prompt cache TTL design" and "Cutting cost by extending cache lifetime" never match as strings — they sail right through.

To catch duplicates you have to compare what concept the article is about, not its wording. And fortunately we already hold a short string that summarizes the concept: the slug. A slug is a hyphen-separated list of English words, with the article's subject terms lined up directly. Compare those as sets of tokens and you get duplicate detection that's robust to rewording.

Turn the slug into a token set and measure Jaccard similarity

The idea is simple. Split the candidate slug and each existing slug on hyphens into sets of words. Measure how much the two sets overlap with the Jaccard coefficient (size of intersection ÷ size of union), and flag anything above a threshold as a suspected same-concept piece.

#!/usr/bin/env python3
"""dup_gate.py — check whether a candidate slug overlaps conceptually with existing articles.
Usage:
  python3 dup_gate.py <repo> <category> <candidate-slug>
Exit codes:
  0  no duplicate (safe to publish)
  1  suspected duplicate (re-angle or switch to enrichment)
"""
import sys
from pathlib import Path
 
# Reduce a slug to its subject terms. Noise words aren't subjects, so drop them.
STOPWORDS = {
    "claude", "api", "sdk", "cli", "guide", "the", "a", "to", "for",
    "with", "and", "of", "in", "on", "how", "your", "cowork",
}
 
def slug_tokens(slug: str) -> set:
    parts = [p for p in slug.lower().split("-") if p]
    return {p for p in parts if p not in STOPWORDS and len(p) > 1}
 
def jaccard(a: set, b: set) -> float:
    if not a or not b:
        return 0.0
    return len(a & b) / len(a | b)
 
def main():
    repo, category, candidate = sys.argv[1], sys.argv[2], sys.argv[3]
    cand = slug_tokens(candidate)
    ja_dir = Path(repo) / "content" / "articles" / "ja" / category
    hits = []
    for mdx in ja_dir.glob("*.mdx"):
        existing = mdx.stem
        if existing == candidate:
            continue
        score = jaccard(cand, slug_tokens(existing))
        if score >= 0.5:
            hits.append((score, existing))
    hits.sort(reverse=True)
    if hits:
        print(f"❌ suspected duplicate: {candidate}")
        for score, existing in hits[:5]:
            print(f"   {score:.2f}  {existing}")
        sys.exit(1)
    print(f"✅ no duplicate: {candidate}")
    sys.exit(0)
 
if __name__ == "__main__":
    main()

Run this on this morning's case and the candidate claude-api-prompt-cache-ttl-5m-to-1h-refresh-design trips on the existing claude-api-prompt-cache-5m-1h-two-tier-ttl-design at 0.62. The shared tokens are prompt, cache, 5m, 1h, ttl, design — six words against a union of about ten. The wording differs, yet the number makes the conceptual overlap unambiguous.

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN
A Python gate that tokenizes existing slugs, measures Jaccard similarity, and blocks 'same-concept' articles before they publish
How to pick a threshold (around 0.5) that avoids both false positives and missed near-duplicates, using a banded decision table
What to do after a hit — re-pick a different angle, or switch to enriching the existing article instead of adding a new URL
Secure payment via Stripe · Cancel anytime

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

or
Unlock all articles with Membership →
Share

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.

  • Copy-paste ready implementation code
  • New advanced guides published daily
  • $5/mo or $10 for lifetime access
View Membership →

Related Articles

Cowork2026-06-21
Why Cowork's bash Says 'No Such File' When Finder Shows It Right There
Connect a cloud-synced folder to Cowork and bash sees empty placeholders while cat fails. Here is how on-demand materialization actually works, and the design patterns that keep your automations from silently dropping data.
Cowork2026-04-17
Auto-Organizing Dropbox with Cowork — File Classification, Duplicate Removal, and Naming Conventions
When your Dropbox starts feeling chaotic, let Cowork handle it. Learn how to build an automated system for file classification, duplicate detection, and bulk renaming — and schedule it to run weekly.
Cowork2026-03-20
Running 4 Sites with 600+ Articles on Autopilot Using Cowork — A Solo Developer's Real Automation Story
How I use Claude's Cowork mode to automatically generate and publish articles across 4 AI knowledge base sites. Covers scheduled task design, skill files, real failures, and lessons learned.
📚RECOMMENDED BOOKS
Build a Large Language Model (From Scratch)
Sebastian Raschka
LLM Dev
Prompt Engineering for LLMs
Berryman & Ziegler
Prompting
AI Engineering
Chip Huyen
AI Eng
* Contains affiliate links
See all →