●MODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic work●CODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retries●CODE — CPU use during streaming drops about 37%, improving stability on long-running sessions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●SECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentials●POLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agencies●MODEL — Claude Opus 4.8 and Haiku 4.5 arrive in the Messages API for coding and agentic work●CODE — Claude Code adds /rewind to resume before /clear, with steadier MCP reliability and OAuth retries●CODE — CPU use during streaming drops about 37%, improving stability on long-running sessions●CLOUD — Claude is generally available in Microsoft Foundry on Azure with Azure-native access●SECURITY — Static API keys can now be replaced with WIF short-lived, scoped credentials●POLICY — The US government clears Anthropic to release Mythos 5 to about 100 firms and agencies
The Same 429 Wears a Different Face on Each Route: Running Claude Safely over Anthropic Direct and Azure Foundry
With Claude now generally available on Microsoft Foundry, a two-route setup is realistic even for solo developers. Here is how to fold the route-by-route differences in 429s and retry-after into one normalized error type and a single backoff policy.
On 2026-06-30, the same day Claude Opus 4.8 and Haiku 4.5 landed in the Messages API, Claude also went generally available on Microsoft Foundry (Azure). The pitch is that you can call Claude natively on Azure while keeping your existing identity, billing, and governance. That makes a two-route setup — normally hit Anthropic directly, and divert to the Azure route when one side jams — a realistic option even at a solo-developer scale.
But the first thing you hit when you start running both routes is not performance or price. It is a quiet asymmetry: the same 429 comes back wearing a different face on each route. A retry path written around one route misfires silently on the other. As someone running unattended publishing across the Dolice Labs sites, I find that "silent misfire" the scariest failure mode of all. This article works through those differences and folds them into a single policy that drives both routes.
Running two routes means the "same 429" returns in different shapes
A rate-limit overflow returns HTTP 429 on either route. So far, identical. What differs is the shape of the information attached to that 429.
A direct 429 carries Anthropic's own error envelope ({"type":"error","error":{"type":"rate_limit_error"}}), and the grace period arrives in a lowercase retry-after header as integer seconds. Under load you may also see 529 rather than 429. The Azure Foundry 429, on the other hand, carries Azure's error envelope ({"error":{"code":"429","message":"..."}}), and the grace period arrives in a Retry-After header that is sometimes integer seconds and sometimes an HTTP-date. Transient server trouble can return 503, which does not line up with the direct route's 529.
So the shortest possible code — "see a 429, read retry-after seconds, sleep" — breaks the instant you add a second route. The header name shifts, the value's unit shifts, and the key in the error body shifts.
Lay both routes' error surfaces side by side first
Before designing anything, pin down the differences by putting both routes next to each other. Abstract too early and you end up with a normalization skewed toward one route that quietly fails on the other.
Aspect
Anthropic direct
Azure Foundry
Rate-limit status
429
429
Overload / transient
529 (overloaded)
503, etc.
Grace header name
retry-after (lowercase)
Retry-After
Grace value
integer seconds
integer seconds OR HTTP-date
Error body key
error.type (e.g. rate_limit_error)
error.code (e.g. "429")
Auth
x-api-key header
Bearer token (Azure-side credential)
Extra metadata
anthropic-ratelimit-* headers
availability is route-dependent
HTTP header names are case-insensitive by spec, so a robust client reads retry-after and Retry-After alike. The real problem is not there — it is the value's unit (seconds vs HTTP-date) and the name of the body key. Those differ per route.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦How 429 and retry-after actually differ between Anthropic direct and Azure Foundry (seconds vs HTTP-date, and the error envelope key)
✦A resolver that folds both routes into one normalized error type (retryable decision, both-format retry-after parsing)
✦The logic that separates 'wait and retry the same route' from 'fail over to the other route'
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
Before: a retry built around one route's shape breaks silently
What I had first assumed only the direct route. The moment I went dual-route, it broke in two ways.
# Before: a retry that assumes only the direct route's shapeimport timedef call_with_retry_naive(client, **kwargs): for attempt in range(5): resp = client.post("/v1/messages", json=kwargs) if resp.status_code == 200: return resp.json() if resp.status_code == 429: # Pitfall 1: assumes retry-after is always integer seconds wait = int(resp.headers.get("retry-after", 1)) time.sleep(wait) continue # Pitfall 2: ignores 529/503 and body differences, raises on everything resp.raise_for_status() raise RuntimeError("exhausted")
This trips over the Azure route in two ways. One: when Retry-After comes back as an HTTP-date (e.g. Wed, 30 Jun 2026 20:31:05 GMT), int(...) throws and the retry path itself dies. Two: with no readable wait, it charges ahead on the default 1 second, hammering a window that has not opened yet and manufacturing more 429s. In practice, when an HTTP-date got misread as a huge integer, I saw it over-wait by tens of times a case where one second would have done. Both are nasty because they do not stop with an exception — they quietly do the wrong thing.
Fold everything into a normalized error type
The fix follows a simple axis: translate each route's raw response into one normalized error type first, then decide. The decision logic looks only at the normalized type; route differences are sealed inside the resolver.
from dataclasses import dataclassfrom enum import Enumfrom typing import Optionalclass ErrorCategory(Enum): OK = "ok" RATE_LIMIT = "rate_limit" # 429 OVERLOADED = "overloaded" # 529 / 503 AUTH = "auth" # 401 / 403 BAD_REQUEST = "bad_request" # 400 / 404 / 422 SERVER = "server" # 5xx@dataclassclass NormalizedError: route: str # "anthropic" / "azure" status: int category: ErrorCategory retryable: bool # may we wait and retry the same route failover_worthy: bool # is it worth diverting to the other route retry_after_s: Optional[float] # seconds, with route differences absorbed raw_code: Optional[str] = None # keep the original key for auditing
The key move is splitting retryable (waiting on the same route is likely to fix it) and failover_worthy (switching routes is worth it) into two separate flags. A 429 is usually retryable on the same route, but if it persists, failover becomes worthwhile too. A 4xx, by contrast, gets rejected on both routes for the same reason, so it is neither retryable nor failover-worthy. Squeeze these into one boolean and the decision will collapse somewhere.
Accept retry-after as both seconds and HTTP-date
The finest detail in a dual-route setup is parsing this grace value. Accept both integer seconds and HTTP-date, and when it is neither, quietly return None and defer to the upper-layer backoff.
from email.utils import parsedate_to_datetimefrom datetime import datetime, timezonedef parse_retry_after(value: Optional[str]) -> Optional[float]: if not value: return None value = value.strip() # Form 1: integer seconds (Anthropic direct / some Azure) if value.isdigit(): return float(value) # Form 2: HTTP-date (mixed on Azure) try: dt = parsedate_to_datetime(value) if dt.tzinfo is None: dt = dt.replace(tzinfo=timezone.utc) delta = (dt - datetime.now(timezone.utc)).total_seconds() return max(0.0, delta) # clamp past dates to 0 except (TypeError, ValueError): return None # unknown format defers to upper-layer backoff
The max(0.0, delta) is there to kill a bug where a slight server-clock skew yields a past date and a negative wait. Pass a negative value to sleep and you get an exception on one route and an immediate retry on the other — behavior that splits per route.
Separate "retry the same route" from "fail over to the other"
With the normalized type in place, write a per-route resolver that builds it from the raw response. Look only at the status and the body key; push the decision into the normalized layer.
The two resolvers look alike, but the point is that only the header name and body key differ, while the output type is identical. That lets the upper loop stay route-agnostic. Auth and bad_request are set retryable=False and failover_worthy=False because a wrong key or a malformed request gets rejected on the other route for the same reason. Fail those over and you only dirty both routes equally.
After: apply one backoff policy to both routes
Once normalization is done, the calling loop collapses into one. While the same route is retryable, wait and retry; when attempts run out or failover_worthy persists, switch to the other route.
import random, timedef backoff_seconds(attempt: int, retry_after: Optional[float]) -> float: # retry_after wins. Otherwise exponential + full jitter, capped at 30s if retry_after is not None: return min(retry_after, 30.0) base = min(2 ** attempt, 30.0) return random.uniform(0, base) # full jitter to avoid thundering herddef call_dual_route(routes, payload, max_attempts_per_route=4): # routes: [(name, send_fn, resolver), ...] e.g. [direct, azure] last_err = None for name, send_fn, resolver in routes: for attempt in range(max_attempts_per_route): status, headers, body = send_fn(payload) if status == 200: return body err = resolver(status, headers, body) last_err = err if not err.retryable: if err.failover_worthy: break # give up on this route, try the next raise ApiError(err) # 400/401 are identical on both. fail fast wait = backoff_seconds(attempt, err.retry_after_s) time.sleep(wait) # exhausted this route's attempts -> fail over to the next raise ApiError(last_err)class ApiError(Exception): def __init__(self, err: NormalizedError): self.err = err super().__init__(f"{err.route} {err.status} {err.category.value}")
After this shape, the cross-route "misreads" disappeared. Under the Before setup, the retry path died during windows when HTTP-dates were mixed in, and job success rates dipped intermittently. After, HTTP-dates fold correctly into seconds, so failures attributable to that route went effectively to zero, and busy windows quietly divert to the Azure route and finish. Because it respects retry-after, wasted 429s dropped noticeably too.
Pitfalls I hit in operation
Apart from code correctness, here are three things I hit while putting two routes into real operation.
Treat failover as an unconditional safety net and your bill grows. Fail over even a 401 (a misconfigured key) and you spray the same bad request at both routes and get billed twice. Keeping auth and bad_request at failover_worthy=False is the safe default.
Without a cap on retry_after, an oversized grace stalls you. When an HTTP-date points far into the future as an outlier, the absence of a cap (30 seconds here) lets a single request freeze the loop for a long time. In my unattended jobs, the missing cap let downstream work pile up.
Do not conflate model-identifier differences with this normalization. Error normalization and model-name resolution (identifiers differ between direct and Azure) are separate layers. Mix them in one place and every fix to error handling breaks model resolution. Keep identifier resolution in its own resolver.
What to try next
Start by routing just your current calls through NormalizedError. Even before adding a second route, parsing both retry-after forms and separating retryable from failover_worthy cuts retry misreads considerably. Then, when you add the second route, you only write one more resolver and leave the upper loop untouched. I hope this helps anyone else running AI across multiple routes.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.