⬡ API & SDK/2026-06-16Advanced

PII Masking for Claude API Lives or Dies on the Ledger — Restore, Encrypt, Measure

The hard part of masking PII before Claude API isn't detection — it's operating the token ledger you restore from. Encrypted storage, multi-instance sharing, and a daily leak-rate loop, with working code.

claude-api⁶² pii security⁹ production⁹³ privacy³ observability¹¹

✦ Premium Article

Few people argue with the idea of masking PII before it reaches the Claude API. The trouble starts right after. Once you've built a detector from regexes and an NER model, you're left with operational questions: how do you put the masked values back, where does that mapping live, and how do you prove nothing leaked? This is where the design suddenly gets hard.

The detection logic comes together in a few days. What hurts in production is almost never detection — it's the handling of the restore ledger and the absence of a way to keep measuring leak rate. Across the business assistants I've run on the Claude API as an indie developer, nearly every review finding and incident traced back to those two things. This article walks through building a detector, but puts its weight on ledger operations and continuous measurement, with the implementation code to match.

Think in round trips, not one-way trips

If you treat masking as "strip it before sending," you will break every summarization or assistant use case. When Claude replies with Contacted <PERSON_001>, that string is meaningless to the user until you turn <PERSON_001> back into Taro Tanaka. Masking is not a one-way send-time step — it's a round trip where masking (outbound) and restoration (inbound) are a matched pair.

Seen as a round trip, the protagonist is not the detector but the ledger that maps tokens to originals. The ledger has to satisfy three things at once.

First, it must be consistent across turns and requests. The same Taro Tanaka must always get the same <PERSON_001>, or Claude can't tell it's the same person and output quality degrades. Second, it must be encrypted. The ledger is literally a map of which token is whom, so a leaked ledger is worse than leaked raw PII. Third, it must be shareable across instances, because production never runs in a single process.

If you only need irreversible masking (collapse everything to <PERSON>), you don't need a ledger at all. The ledger exists only for restoration. That's why "do we default to reversible?" is the first fork that decides your operational cost. When I'm unsure whether restoration will be needed, I choose reversible: dropping to irreversible later is one line, but going from irreversible back to reversible is impossible because the original is already gone.

Two detection layers are enough — Luhn-filtered regex and a Haiku NER

Before the ledger, lock down detection in its minimal form. In my deployments, two layers were plenty: regex (Layer 1: identifiers) and a lightweight-model NER (Layer 2: quasi-identifiers). Layer 3, the kind that needs contextual inference, I don't try to machine-process — I push it to input guidance in the UI.

The key in the regex layer is to filter card-like digit runs through a Luhn check. Catching "13+ digits" alone sweeps up ISBNs, JANs, and tracking numbers, and summarization accuracy visibly drops.

// pii-detect.ts — Layer 1 (identifiers). Tokens use a uniform <CATEGORY_NNN> shape
// Rationale: false positives hurt model quality, misses hurt privacy. Luhn curbs card false positives
export type Span = { start: number; end: number; category: string; text: string };
 
const REGEXES: Record<string, RegExp> = {
  EMAIL: /\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b/g,
  PHONE_JP: /(?:\+?81[-\s]?|0)\d{1,4}[-\s]?\d{1,4}[-\s]?\d{3,4}/g,
  CREDIT_CARD: /\b(?:\d[ -]*?){13,19}\b/g,
};
 
export function detectLayer1(input: string): Span[] {
  const spans: Span[] = [];
  for (const [category, regex] of Object.entries(REGEXES)) {
    for (const m of input.matchAll(regex)) {
      const text = m[0];
      if (category === "CREDIT_CARD" && !isLuhnValid(text.replace(/[ -]/g, ""))) continue;
      spans.push({ start: m.index!, end: m.index! + text.length, category, text });
    }
  }
  return spans;
}
 
function isLuhnValid(num: string): boolean {
  if (!/^\d{13,19}$/.test(num)) return false;
  let sum = 0, alt = false;
  for (let i = num.length - 1; i >= 0; i--) {
    let n = parseInt(num[i], 10);
    if (alt) { n *= 2; if (n > 9) n -= 9; }
    sum += n; alt = !alt;
  }
  return sum % 10 === 0;
}

Names and addresses won't fall to regex. Here I use Claude Haiku as an NER. Since the NER input also contains PII, keeping it inside a Claude call I control is easier to justify in an audit than shipping it to a generic cloud NER. The crucial part is to not over-trust the output and to pin a fixed model version (see pitfall 3 below).

// pii-ner.ts — Layer 2 (quasi-identifiers: names, addresses). Pin the model to a dated ID
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
 
const NER_MODEL = "claude-haiku-4-5-20251001"; // dated ID to freeze behavior
const NER_SYSTEM = `You are a Japanese named-entity extractor. Extract only PERSON and ADDRESS_JP spans
from the input and return strictly this JSON. No guessing, no invention. Omit anything you cannot classify.
{"entities":[{"text":"string","category":"PERSON"|"ADDRESS_JP"}]}`;
 
export async function detectLayer2(input: string): Promise<{ text: string; category: string }[]> {
  const res = await client.messages.create({
    model: NER_MODEL, max_tokens: 1024, system: NER_SYSTEM,
    messages: [{ role: "user", content: input }],
  });
  const text = res.content.filter((b) => b.type === "text").map((b: any) => b.text).join("");
  const match = text.match(/\{[\s\S]*\}/);
  if (!match) return []; // nothing extracted -> rely on the rescan to catch misses
  try {
    const parsed = JSON.parse(match[0]) as { entities?: { text: string; category: string }[] };
    return (parsed.entities ?? []).filter((e) => e.text && ["PERSON", "ADDRESS_JP"].includes(e.category));
  } catch {
    return []; // on parse failure, skip and let measurement catch the leak
  }
}

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦Encrypting the restore ledger with AES-GCM and sharing it across instances

✦Quantifying leak rate every day with a golden dataset and shadow rescan

✦Killing the three operational bugs: stream splitting, token translation, NER drift

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Make the ledger a first-class citizen — an encrypted shared store

Now the real subject. You replace detected spans with tokens and accumulate the mapping in a ledger. Holding the ledger in an in-memory Map is fine for a sample but breaks in production immediately: in a multi-instance setup, if the process that handled turn 1 differs from the one handling turn 3, the ledger isn't shared and restoration fails.

The standard approach is to store an encrypted JSON ledger per conversation in a shared store (Redis, Cloudflare KV, etc.). Encryption is non-negotiable. As noted, the ledger is a map to where the PII lives, so leaving it in plaintext in a Redis snapshot or slow log is a worse incident than leaking the raw PII itself.

// pii-ledger.ts — seal the restore ledger with AES-256-GCM into a shared store
// The key is injected from a KMS. Don't keep a plaintext key resident in-process for long
import { createCipheriv, createDecipheriv, randomBytes, createHash } from "node:crypto";
 
const KEY = Buffer.from(process.env.LEDGER_KEY_BASE64!, "base64"); // 32 bytes
 
export function sealLedger(ledger: Record<string, string>): string {
  const iv = randomBytes(12);
  const cipher = createCipheriv("aes-256-gcm", KEY, iv);
  const body = Buffer.concat([cipher.update(JSON.stringify(ledger), "utf8"), cipher.final()]);
  const tag = cipher.getAuthTag();
  return Buffer.concat([iv, tag, body]).toString("base64"); // iv(12)+tag(16)+body
}
 
export function openLedger(sealed: string): Record<string, string> {
  const buf = Buffer.from(sealed, "base64");
  const iv = buf.subarray(0, 12), tag = buf.subarray(12, 28), body = buf.subarray(28);
  const decipher = createDecipheriv("aes-256-gcm", KEY, iv);
  decipher.setAuthTag(tag);
  const out = Buffer.concat([decipher.update(body), decipher.final()]);
  return JSON.parse(out.toString("utf8"));
}
 
// Per-conversation key. Hash the ID so the key isn't guessable from a raw ID
export const ledgerKey = (conversationId: string) =>
  `pii:ledger:${createHash("sha256").update(conversationId).digest("hex").slice(0, 32)}`;

GCM is chosen because it bundles tamper detection (an auth tag) with encryption. With CBC, a ledger that silently corrupts goes undetected and your restoration quietly breaks. Don't hard-code the key in env; inject it from a KMS at startup, and during rotation allow decryption with the old key for a window — a two-key scheme keeps you safe.

The final wrapper — round trip and multi-turn consistency in one place

With two detection layers and an encrypted ledger, you can fold everything into a wrapper so callers never think about PII. The core ideas are "collapse identical strings to the same token across the whole conversation" and "restore by replacing every ledger entry in one pass."

// pii-pipeline.ts — final Claude API wrapper. Inherit the ledger per conversation and restore the reply
import Anthropic from "@anthropic-ai/sdk";
import { detectLayer1 } from "./pii-detect";
import { detectLayer2 } from "./pii-ner";
import { sealLedger, openLedger, ledgerKey } from "./pii-ledger";
import type { Redis } from "ioredis";
 
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY! });
 
export async function callClaude(
  redis: Redis, conversationId: string,
  messages: { role: "user" | "assistant"; content: string }[],
): Promise<string> {
  const key = ledgerKey(conversationId);
  const sealed = await redis.get(key);
  const ledger: Record<string, string> = sealed ? openLedger(sealed) : {};
  const reverse = new Map(Object.entries(ledger).map(([t, o]) => [o, t]));
  const counters: Record<string, number> = {};
  for (const t of Object.keys(ledger)) {
    const cat = t.slice(1, t.lastIndexOf("_"));
    counters[cat] = Math.max(counters[cat] ?? 0, parseInt(t.slice(t.lastIndexOf("_") + 1), 10) || 0);
  }
  const register = (raw: string, category: string): string => {
    const ex = reverse.get(raw);
    if (ex) return ex;
    counters[category] = (counters[category] ?? 0) + 1;
    const token = `<${category}_${String(counters[category]).padStart(3, "0")}>`;
    ledger[token] = raw; reverse.set(raw, token);
    return token;
  };
 
  const masked = [];
  for (const m of messages) {
    // Run NER (Layer 2) first, then mask Layer 1 + Layer 2 by longest-string-first to avoid offset drift
    const ents = await detectLayer2(m.content);
    const items = [
      ...ents.map((e) => ({ text: e.text, category: e.category })),
      ...detectLayer1(m.content).map((s) => ({ text: s.text, category: s.category })),
    ].sort((a, b) => b.text.length - a.text.length);
    let body = m.content;
    for (const it of items) body = body.split(it.text).join(register(it.text, it.category));
    masked.push({ role: m.role, content: body });
  }
 
  // Persist the encrypted ledger (TTL = expected max conversation lifetime, e.g. 24h)
  await redis.set(key, sealLedger(ledger), "EX", 60 * 60 * 24);
 
  const res = await client.messages.create({
    model: "claude-sonnet-4-6", max_tokens: 2048,
    system: "Strings shaped like <PERSON_001> (i.e. <CATEGORY_NNN>) must be emitted verbatim — never translate or alter them.",
    messages: masked,
  });
  const raw = res.content.filter((b) => b.type === "text").map((b: any) => b.text).join("");
 
  // Restore: turn every ledger token back into its original
  let restored = raw;
  for (const [token, original] of Object.entries(ledger)) restored = restored.split(token).join(original);
  return restored;
}

The round trip the wrapper intends looks like this. Because the ledger is keyed by conversation, Taro Tanaka converges to the same token across turns.

input:  "Please confirm the next visit with Tanaka. Contact: tanaka@example.com"
sent:   "Please confirm the next visit with <PERSON_001>. Contact: <EMAIL_001>"
reply:  "Reached out to <PERSON_001> via <EMAIL_001>"
output: "Reached out to Tanaka via tanaka@example.com"

Measure leak rate daily — golden dataset and shadow rescan

The most important thing in production isn't how refined the detector is — it's whether you can say, every day, that nothing is leaking. I run two measurements in parallel.

The first is a golden dataset in CI. Prepare 100–500 synthetic texts with PII deliberately embedded, and measure recall and false-positive rate daily. If the numbers regress from the previous day, block the merge. The thresholds are a tradeoff between "recall that targets zero leaks" and "false-positive rate that preserves accuracy"; I bias toward recall.

# golden_eval.py — daily CI check that no PII trace remains in masked output
# Uses synthetic data only (no raw PII) and surfaces misses (false negatives) as recall
import json, re, sys
from statistics import mean
 
EMAIL = re.compile(r"\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b")
PHONE = re.compile(r"(?:\+?81[-\s]?|0)\d{1,4}[-\s]?\d{1,4}[-\s]?\d{3,4}")
 
def leaked_count(masked: str) -> int:
    return len(EMAIL.findall(masked)) + len(PHONE.findall(masked))
 
def run(path: str, threshold: float = 0.99) -> int:
    cases = [json.loads(l) for l in open(path, encoding="utf-8")]
    recalls = []
    for c in cases:
        masked = c["masked"]            # masker output (pre-generated)
        expected = c["expected"]        # number of PII items embedded
        found = masked.count("<") - leaked_count(masked)
        recalls.append(min(found, expected) / max(expected, 1))
    r = mean(recalls) if recalls else 0.0
    leaks = sum(1 for c in cases if leaked_count(c["masked"]) > 0)
    print(f"recall={r:.4f} leaks={leaks}/{len(cases)}")
    return 0 if (r >= threshold and leaks == 0) else 1
 
if __name__ == "__main__":
    sys.exit(run(sys.argv[1]))

The second is a production shadow rescan. Sample 0.1% of all requests into structured logs, and have a separate job rescan only the masked body for residual identifier patterns. The crucial constraint: the rescan never looks at raw PII. By targeting only masked bodies, it catches signs of leakage without creating a second exposure. When the rescan finds a residue, add that input pattern to the golden dataset and update the detector — that loop steadily lowers leak rate month over month.

Pitfalls I hit in production

Even with a correct design, some bugs only show up in operation. Here are the ones I actually hit.

The first is token translation. Unless you state it in the system prompt, Claude will occasionally return <PERSON_001> translated as Person 001. The reverse lookup then misses, and Person 001 is shown to the user. That's exactly why the wrapper's system prompt says to emit the format verbatim.

The second is token splitting during streaming. text_delta can deliver <PERSON_ and 001> in separate chunks, and restoring per chunk fails. Buffer chunks and restore once a <...> closes, or do it all after the stream ends. The key is to never substitute partial fragments incrementally.

The third is NER model update drift. Bumping the Claude version used for NER subtly shifts the extraction criteria, moving your miss and false-positive rates. So pin a dated ID like claude-haiku-4-5-20251001, and on any update, regress against the golden dataset before switching. A detector whose behavior silently changes is the scariest kind of change in PII masking.

On top of that, the June 2026 Claude API billing change — which moved headless execution and agent delegation onto a separate credit pool — affects designs that lean heavily on Haiku for NER. Measure your NER call frequency and cache (key on the hash of the masked body) early in the rollout.

Where to start

If you're adding this today, a single move is enough: slot detectLayer1 in right before the Claude call. Just machine-masking email, phone, and PAN clears half the findings you'll get in internal review. Next, add encrypted ledger storage (sealLedger/openLedger) to make restoration safe, and finally start running the golden-dataset CI. Stack it in that order and you reach an operation that lowers leak probability every month — without chasing perfect detection.

Perfect masking doesn't exist. What matters is closing the round trip with an encrypted ledger and keeping a state where you can state, in numbers, every day, that nothing leaked. Only once you've built that far can you entrust business data containing PII to Claude with confidence.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.