⬡ API & SDK/2026-06-17Advanced

Stop Terminology Drift in Localized Apps: A Consistent Localizable.strings Pipeline with the Batch API and a Cached Glossary

Translating UI strings one at a time invites inconsistency. Pair Claude's Message Batches API with a prompt-cached glossary to translate Localizable.strings across 10+ languages consistently, with measured costs and the pitfalls I hit in production.

api-sdk¹⁰ batch-api³ prompt-caching⁷ localization² ios¹⁴

✦ Premium Article

I learned this the hard way through an App Store rejection. The iOS wallpaper app I run as an indie developer supports more than ten languages, and on the settings screen I had translated "壁紙" as Wallpaper, but in the widget hint it had come out as Background. The cause was mundane: every time I added a string, I translated just that one string. Translate them one at a time and each one is done at a different moment, in a different context, so the same concept ends up with different words.

This kind of drift is less a translation-quality problem than a translation-procedure problem. Hire a human translator and the same thing happens if you never hand them a glossary. So the approach I settled on was to let Claude cache a glossary and a style guide, then translate every entry in Localizable.strings in one pass through the Message Batches API. This article walks through that design and implementation, the cost I actually measured when I pushed thousands of strings through it, and the pitfalls I hit when writing the results back.

Why I stopped translating one string at a time

A Localizable.strings file is just a flat list of key-value pairs:

"settings.wallpaper.title" = "壁紙の品質";
"widget.hint.background" = "ウィジェットに壁紙を設定";
"paywall.cta.primary" = "プレミアムを始める";

The trouble is that translating these in separate requests gives Claude (or a person) the freedom to render "壁紙" as Wallpaper here and Background there. UI consistency is not the sum of individually correct translations. The constraint "how do we unify the word wallpaper across the whole app" has to be fixed at the input stage of translation, not patched up afterward.

I chose to express that constraint in three layers. First, a glossary that declares "this source term must always map to this target term." Second, a style guide that sets tone — buttons in the imperative, settings titles as noun phrases. Third, an explicit placeholder-protection rule: never move %@ or %1$d. All three layers are identical for every string, so resending them with each request is pure waste. That is exactly where prompt caching earns its keep.

Turning the glossary and style guide into a cacheable system prompt

Prompt caching lets you attach cache_control to a system block so the prefix up to that point is cached. A glossary can run to hundreds of lines, but once it is cached, later requests reuse it at cache-read pricing — far cheaper than ordinary input tokens. Even with thousands of strings to translate, the glossary is read at full cost exactly once and then referenced repeatedly at the discounted rate.

# glossary.py — build the glossary and style guide as a cacheable system block
import anthropic
 
client = anthropic.Anthropic()  # reads ANTHROPIC_API_KEY from the environment
 
# glossary: source term -> fixed translation per language; the app must always use these
GLOSSARY = {
    "壁紙": {"en": "Wallpaper", "fr": "Fond d'écran", "de": "Hintergrundbild"},
    "プレミアム": {"en": "Premium", "fr": "Premium", "de": "Premium"},
    "広告を非表示": {"en": "Remove Ads", "fr": "Supprimer les pubs", "de": "Werbung entfernen"},
}
 
STYLE_GUIDE = """\
- Buttons and CTAs: imperative and short (e.g. Remove Ads, Start Premium).
- Settings titles: noun phrases (e.g. Wallpaper Quality).
- Match politeness level to each language's conventions; do not over-formalize.
- Never alter format specifiers like %@ %1$@ %d %1$d, including their order.
- Preserve \\n and \\t exactly as they appear.
"""
 
def build_cached_system(target_lang: str) -> list:
    glossary_lines = "\n".join(
        f'  "{src}" => "{langs.get(target_lang, "")}"'
        for src, langs in GLOSSARY.items()
        if langs.get(target_lang)
    )
    instructions = (
        f"You translate app UI strings from Japanese into {target_lang}.\n\n"
        f"## Glossary (always use these target terms)\n{glossary_lines}\n\n"
        f"## Style guide\n{STYLE_GUIDE}"
    )
    # the prefix of a block carrying cache_control becomes cacheable
    return [{
        "type": "text",
        "text": instructions,
        "cache_control": {"type": "ephemeral"},
    }]

The key idea is to keep the glossary and style guide as a fixed, unchanging preamble that is separate from the strings being translated. By placing the variable part — the actual strings — in messages rather than system, the system block keeps hitting the cache. Rewrite system per string and the cache misses every time. I failed to honor that separation at first and spent a frustrating afternoon convinced "caching doesn't work" while my hit rate sat at zero.

The pricing dynamics of Message Batches and prompt caching move quickly, so I build my cost reasoning on top of async cost design for the Batch API and halving your monthly bill with prompt caching.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦If your translations drift from screen to screen, you'll learn to cache a glossary that forces the same term everywhere across every string

✦You can decide how far the Batch API's 50% discount stacked with prompt caching actually cuts the cost of thousands of strings, from measured numbers

✦You'll get a validated pipeline that writes translations back into Localizable.strings without breaking %@, plural variants, or newline codes

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Guaranteeing a clean write-back with structured output

When you write translations straight back into Localizable.strings, the scariest failure is Claude smuggling in extra decoration — a "Here's the translation:" preamble, or quotes added or removed. To prevent that, I pin the output schema with tool_choice so the model returns only key/value pairs as JSON.

# force the output through a tool call so no free-form text leaks in
TRANSLATE_TOOL = {
    "name": "emit_translations",
    "description": "Return the translated key-value pairs",
    "input_schema": {
        "type": "object",
        "properties": {
            "translations": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "key": {"type": "string"},
                        "value": {"type": "string"},
                    },
                    "required": ["key", "value"],
                },
            }
        },
        "required": ["translations"],
    },
}
 
def build_user_message(entries: dict) -> str:
    # entries: {"settings.wallpaper.title": "壁紙の品質", ...}
    lines = [f'{k}\t{v}' for k, v in entries.items()]
    return (
        "Translate the following keys and source text, and return them via the "
        "emit_translations tool. Never change the keys.\n\n" + "\n".join(lines)
    )

Forcing this tool call with tool_choice means the response comes back as a structured tool_use block with no stray text mixed in. For the schema-validation and repair-loop side of this, see validating and repairing structured output against a schema.

Submitting every language and string with the Message Batches API

At this point the shape of a single correct request is settled. Next comes scale. Hitting a synchronous API in sequence for ten languages times hundreds or thousands of strings burns both time and money. The Message Batches API processes up to 100,000 requests asynchronously in one job, at half the price of the synchronous API. For a task like UI-string translation — no urgency, but high volume — it is about as good a fit as you will find.

# build_batch.py — assemble and submit batch requests per language
from anthropic.types.messages.batch_create_params import Request
from anthropic.types.message_create_params import MessageCreateParamsNonStreaming
 
def chunk(d: dict, size: int):
    items = list(d.items())
    for i in range(0, len(items), size):
        yield dict(items[i:i + size])
 
def submit_batch(all_entries: dict, target_langs: list[str], chunk_size: int = 40):
    requests = []
    for lang in target_langs:
        system = build_cached_system(lang)
        for idx, group in enumerate(chunk(all_entries, chunk_size)):
            requests.append(Request(
                custom_id=f"{lang}-{idx}",  # lets us recover which language/chunk later
                params=MessageCreateParamsNonStreaming(
                    model="claude-opus-4-8",
                    max_tokens=4096,
                    system=system,
                    tools=[TRANSLATE_TOOL],
                    tool_choice={"type": "tool", "name": "emit_translations"},
                    messages=[{"role": "user", "content": build_user_message(group)}],
                ),
            ))
    batch = client.messages.batches.create(requests=requests)
    print(f"submitted batch: {batch.id}, requests={len(requests)}")
    return batch.id

Encoding the language and chunk index into custom_id is the trick that keeps you sane. Batch results do not necessarily come back in submission order, so the custom_id is how you reconstruct which response belongs to which language and chunk. I keep the chunk size around 40 to avoid a request whose output tokens exceed max_tokens and get truncated mid-way. In practice, the more long strings your app has, the smaller you want those chunks.

Collecting results and writing them back safely

A batch finishes in anywhere from a few minutes to 24 hours. Poll for completion, gather only the succeeded results, and write them back.

# collect.py — wait for the batch and write results into Localizable.strings
import time
 
def wait_and_collect(batch_id: str) -> dict:
    while True:
        batch = client.messages.batches.retrieve(batch_id)
        if batch.processing_status == "ended":
            break
        time.sleep(30)  # poll until the job ends
 
    results: dict[str, dict] = {}  # {lang: {key: value}}
    for entry in client.messages.batches.results(batch_id):
        lang = entry.custom_id.rsplit("-", 1)[0]
        if entry.result.type != "succeeded":
            print(f"⚠️ failed: {entry.custom_id} ({entry.result.type})")
            continue
        for block in entry.result.message.content:
            if block.type == "tool_use":
                for pair in block.input["translations"]:
                    results.setdefault(lang, {})[pair["key"]] = pair["value"]
    return results

Always run validation right before the write-back. Skip it and a translation with a missing format specifier ships to production and crashes. The one that actually bit me, in the Law of Attraction app, was a %@ quietly dropped from a translation — only users of that one language crashed, on one specific screen, and it was miserable to reproduce.

import re
 
PLACEHOLDER = re.compile(r'%(?:\d+\$)?[@dfsu]')
 
def validate(src: str, dst: str) -> list[str]:
    errors = []
    # do the format specifiers match the source in count and kind?
    if sorted(PLACEHOLDER.findall(src)) != sorted(PLACEHOLDER.findall(dst)):
        errors.append("placeholder mismatch")
    # is the newline count preserved?
    if src.count("\\n") != dst.count("\\n"):
        errors.append("newline mismatch")
    # are quotes balanced (so we don't break .strings syntax)?
    if dst.count('"') % 2 != 0:
        errors.append("unbalanced quote")
    return errors
 
def write_strings(lang: str, src_entries: dict, translated: dict, out_path: str):
    lines, failed = [], []
    for key, src_val in src_entries.items():
        dst_val = translated.get(key)
        if dst_val is None:
            failed.append(key)
            continue
        problems = validate(src_val, dst_val)
        if problems:
            failed.append(f"{key} ({', '.join(problems)})")
            continue
        escaped = dst_val.replace('"', '\\"')
        lines.append(f'"{key}" = "{escaped}";')
    with open(out_path, "w", encoding="utf-8") as f:
        f.write("\n".join(lines) + "\n")
    print(f"[{lang}] wrote {len(lines)} / failed {len(failed)}")
    return failed

Keys rejected by validate are not written back; they go to a re-translation queue. Rather than aiming to get everything perfect in one shot, I let the machine reject the failures the machine can check, so a human only looks at what genuinely needs judgment. That division of labor is what keeps the pipeline from falling apart in production.

What the cost actually felt like

This is the part worth paying to read, so I'll be candid about what I observed. In one run of roughly 2,400 strings across 6 languages (14,400 translations total), because the glossary and style guide are shared within each language, prompt-cache hits cut the input-side cost by what felt like 60–70%. Compared with the days when I sent the glossary at full cost on every request, you can feel the cache paying off more the more strings you have. Layer the Batch API's 50% discount on top and the whole run came in at less than half what the same volume cost me when I processed it sequentially through the synchronous API.

The exact figures shift with model pricing, glossary length, and chunk size, so I strongly recommend running one small batch on your own app and measuring. My rule of thumb is simple: the longer the glossary, the more target languages, and the more strings, the bigger this design's cost advantage. Conversely, for a tiny fix — a few dozen strings in one language — the batch's wait time is more annoying than helpful, so I translate those on the spot with the synchronous API. Pick the tool by the job.

Four pitfalls that are easy to hit

In the order I tripped over them:

Rewriting system per string, so caching never works. Fix the glossary as an immutable preamble and put the variable strings in messages. The cache matches on the system prefix, so mixing the two guarantees a miss every time.
Cache TTL expiring. The standard cache lapses after a few minutes. Submit your batch in a tight burst so the requests land while the cache is still warm; drip them in slowly and you pay to re-cache.
Translating plural variants naively one-to-one. Languages differ in how many plural categories they have. If you handle .stringsdict, translate each variant and keep the variants intact on write-back.
Forgetting tool_choice and letting free text leak in. Without a forced tool call, explanatory text can wrap the translation. Structured output is enforced, not requested.

Your next step

Pull just five terms whose drift bothers you out of your own Localizable.strings, build a glossary from them, and run one small batch of a few dozen strings in a single language. Once you can see the cache-hit behavior and the cost, you'll know whether to roll it out to every language. I was half-skeptical that the glossary was worth the effort on my first run — but seeing the settings screen and the widget finally agree on the word wallpaper convinced me this procedure is worth keeping. I hope it helps other indie developers wrestling with the same multilingual headache.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.