⬡ API & SDK/2026-06-30Advanced

The Same 429 Wears a Different Face on Each Route: Running Claude Safely over Anthropic Direct and Azure Foundry

With Claude now generally available on Microsoft Foundry, a two-route setup is realistic even for solo developers. Here is how to fold the route-by-route differences in 429s and retry-after into one normalized error type and a single backoff policy.

Claude API⁹⁵ Azure Foundry rate limits⁴ retry⁶ failover

✦ Premium Article

On 2026-06-30, the same day Claude Opus 4.8 and Haiku 4.5 landed in the Messages API, Claude also went generally available on Microsoft Foundry (Azure). The pitch is that you can call Claude natively on Azure while keeping your existing identity, billing, and governance. That makes a two-route setup — normally hit Anthropic directly, and divert to the Azure route when one side jams — a realistic option even at a solo-developer scale.

But the first thing you hit when you start running both routes is not performance or price. It is a quiet asymmetry: the same 429 comes back wearing a different face on each route. A retry path written around one route misfires silently on the other. As someone running unattended publishing across the Dolice Labs sites, I find that "silent misfire" the scariest failure mode of all. This article works through those differences and folds them into a single policy that drives both routes.

Running two routes means the "same 429" returns in different shapes

A rate-limit overflow returns HTTP 429 on either route. So far, identical. What differs is the shape of the information attached to that 429.

A direct 429 carries Anthropic's own error envelope ({"type":"error","error":{"type":"rate_limit_error"}}), and the grace period arrives in a lowercase retry-after header as integer seconds. Under load you may also see 529 rather than 429. The Azure Foundry 429, on the other hand, carries Azure's error envelope ({"error":{"code":"429","message":"..."}}), and the grace period arrives in a Retry-After header that is sometimes integer seconds and sometimes an HTTP-date. Transient server trouble can return 503, which does not line up with the direct route's 529.

So the shortest possible code — "see a 429, read retry-after seconds, sleep" — breaks the instant you add a second route. The header name shifts, the value's unit shifts, and the key in the error body shifts.

Lay both routes' error surfaces side by side first

Before designing anything, pin down the differences by putting both routes next to each other. Abstract too early and you end up with a normalization skewed toward one route that quietly fails on the other.

Aspect	Anthropic direct	Azure Foundry
Rate-limit status	429	429
Overload / transient	529 (overloaded)	503, etc.
Grace header name	retry-after (lowercase)	Retry-After
Grace value	integer seconds	integer seconds OR HTTP-date
Error body key	error.type (e.g. rate_limit_error)	error.code (e.g. "429")
Auth	x-api-key header	Bearer token (Azure-side credential)
Extra metadata	anthropic-ratelimit-* headers	availability is route-dependent

HTTP header names are case-insensitive by spec, so a robust client reads retry-after and Retry-After alike. The real problem is not there — it is the value's unit (seconds vs HTTP-date) and the name of the body key. Those differ per route.

✦

Thank you for reading this far.

Continue Reading

What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.

WHAT YOU'LL LEARN

✦How 429 and retry-after actually differ between Anthropic direct and Azure Foundry (seconds vs HTTP-date, and the error envelope key)

✦A resolver that folds both routes into one normalized error type (retryable decision, both-format retry-after parsing)

✦The logic that separates 'wait and retry the same route' from 'fail over to the other route'

Secure payment via Stripe · Cancel anytime

✦

Unlock This Article

Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.

Unlock all articles with Membership →

Before: a retry built around one route's shape breaks silently

What I had first assumed only the direct route. The moment I went dual-route, it broke in two ways.

# Before: a retry that assumes only the direct route's shape
import time
 
def call_with_retry_naive(client, **kwargs):
    for attempt in range(5):
        resp = client.post("/v1/messages", json=kwargs)
        if resp.status_code == 200:
            return resp.json()
        if resp.status_code == 429:
            # Pitfall 1: assumes retry-after is always integer seconds
            wait = int(resp.headers.get("retry-after", 1))
            time.sleep(wait)
            continue
        # Pitfall 2: ignores 529/503 and body differences, raises on everything
        resp.raise_for_status()
    raise RuntimeError("exhausted")

This trips over the Azure route in two ways. One: when Retry-After comes back as an HTTP-date (e.g. Wed, 30 Jun 2026 20:31:05 GMT), int(...) throws and the retry path itself dies. Two: with no readable wait, it charges ahead on the default 1 second, hammering a window that has not opened yet and manufacturing more 429s. In practice, when an HTTP-date got misread as a huge integer, I saw it over-wait by tens of times a case where one second would have done. Both are nasty because they do not stop with an exception — they quietly do the wrong thing.

Fold everything into a normalized error type

The fix follows a simple axis: translate each route's raw response into one normalized error type first, then decide. The decision logic looks only at the normalized type; route differences are sealed inside the resolver.

from dataclasses import dataclass
from enum import Enum
from typing import Optional
 
class ErrorCategory(Enum):
    OK = "ok"
    RATE_LIMIT = "rate_limit"   # 429
    OVERLOADED = "overloaded"   # 529 / 503
    AUTH = "auth"               # 401 / 403
    BAD_REQUEST = "bad_request" # 400 / 404 / 422
    SERVER = "server"           # 5xx
 
@dataclass
class NormalizedError:
    route: str                          # "anthropic" / "azure"
    status: int
    category: ErrorCategory
    retryable: bool                     # may we wait and retry the same route
    failover_worthy: bool               # is it worth diverting to the other route
    retry_after_s: Optional[float]      # seconds, with route differences absorbed
    raw_code: Optional[str] = None      # keep the original key for auditing

The key move is splitting retryable (waiting on the same route is likely to fix it) and failover_worthy (switching routes is worth it) into two separate flags. A 429 is usually retryable on the same route, but if it persists, failover becomes worthwhile too. A 4xx, by contrast, gets rejected on both routes for the same reason, so it is neither retryable nor failover-worthy. Squeeze these into one boolean and the decision will collapse somewhere.

Accept retry-after as both seconds and HTTP-date

The finest detail in a dual-route setup is parsing this grace value. Accept both integer seconds and HTTP-date, and when it is neither, quietly return None and defer to the upper-layer backoff.

from email.utils import parsedate_to_datetime
from datetime import datetime, timezone
 
def parse_retry_after(value: Optional[str]) -> Optional[float]:
    if not value:
        return None
    value = value.strip()
    # Form 1: integer seconds (Anthropic direct / some Azure)
    if value.isdigit():
        return float(value)
    # Form 2: HTTP-date (mixed on Azure)
    try:
        dt = parsedate_to_datetime(value)
        if dt.tzinfo is None:
            dt = dt.replace(tzinfo=timezone.utc)
        delta = (dt - datetime.now(timezone.utc)).total_seconds()
        return max(0.0, delta)   # clamp past dates to 0
    except (TypeError, ValueError):
        return None              # unknown format defers to upper-layer backoff

The max(0.0, delta) is there to kill a bug where a slight server-clock skew yields a past date and a negative wait. Pass a negative value to sleep and you get an exception on one route and an immediate retry on the other — behavior that splits per route.

Separate "retry the same route" from "fail over to the other"

With the normalized type in place, write a per-route resolver that builds it from the raw response. Look only at the status and the body key; push the decision into the normalized layer.

def resolve_anthropic(status: int, headers: dict, body: dict) -> NormalizedError:
    ra = parse_retry_after(headers.get("retry-after"))
    etype = (body.get("error") or {}).get("type")
    if status == 429:
        cat, retryable, fo = ErrorCategory.RATE_LIMIT, True, True
    elif status == 529:
        cat, retryable, fo = ErrorCategory.OVERLOADED, True, True
    elif status in (401, 403):
        cat, retryable, fo = ErrorCategory.AUTH, False, False
    elif 400 <= status < 500:
        cat, retryable, fo = ErrorCategory.BAD_REQUEST, False, False
    elif status >= 500:
        cat, retryable, fo = ErrorCategory.SERVER, True, True
    else:
        cat, retryable, fo = ErrorCategory.OK, False, False
    return NormalizedError("anthropic", status, cat, retryable, fo, ra, etype)
 
def resolve_azure(status: int, headers: dict, body: dict) -> NormalizedError:
    ra = parse_retry_after(headers.get("Retry-After"))
    code = str((body.get("error") or {}).get("code", ""))
    if status == 429:
        cat, retryable, fo = ErrorCategory.RATE_LIMIT, True, True
    elif status == 503:
        cat, retryable, fo = ErrorCategory.OVERLOADED, True, True
    elif status in (401, 403):
        cat, retryable, fo = ErrorCategory.AUTH, False, False
    elif 400 <= status < 500:
        cat, retryable, fo = ErrorCategory.BAD_REQUEST, False, False
    elif status >= 500:
        cat, retryable, fo = ErrorCategory.SERVER, True, True
    else:
        cat, retryable, fo = ErrorCategory.OK, False, False
    return NormalizedError("azure", status, cat, retryable, fo, ra, code)

The two resolvers look alike, but the point is that only the header name and body key differ, while the output type is identical. That lets the upper loop stay route-agnostic. Auth and bad_request are set retryable=False and failover_worthy=False because a wrong key or a malformed request gets rejected on the other route for the same reason. Fail those over and you only dirty both routes equally.

After: apply one backoff policy to both routes

Once normalization is done, the calling loop collapses into one. While the same route is retryable, wait and retry; when attempts run out or failover_worthy persists, switch to the other route.

import random, time
 
def backoff_seconds(attempt: int, retry_after: Optional[float]) -> float:
    # retry_after wins. Otherwise exponential + full jitter, capped at 30s
    if retry_after is not None:
        return min(retry_after, 30.0)
    base = min(2 ** attempt, 30.0)
    return random.uniform(0, base)   # full jitter to avoid thundering herd
 
def call_dual_route(routes, payload, max_attempts_per_route=4):
    # routes: [(name, send_fn, resolver), ...]  e.g. [direct, azure]
    last_err = None
    for name, send_fn, resolver in routes:
        for attempt in range(max_attempts_per_route):
            status, headers, body = send_fn(payload)
            if status == 200:
                return body
            err = resolver(status, headers, body)
            last_err = err
            if not err.retryable:
                if err.failover_worthy:
                    break            # give up on this route, try the next
                raise ApiError(err)  # 400/401 are identical on both. fail fast
            wait = backoff_seconds(attempt, err.retry_after_s)
            time.sleep(wait)
        # exhausted this route's attempts -> fail over to the next
    raise ApiError(last_err)
 
class ApiError(Exception):
    def __init__(self, err: NormalizedError):
        self.err = err
        super().__init__(f"{err.route} {err.status} {err.category.value}")

After this shape, the cross-route "misreads" disappeared. Under the Before setup, the retry path died during windows when HTTP-dates were mixed in, and job success rates dipped intermittently. After, HTTP-dates fold correctly into seconds, so failures attributable to that route went effectively to zero, and busy windows quietly divert to the Azure route and finish. Because it respects retry-after, wasted 429s dropped noticeably too.

Pitfalls I hit in operation

Apart from code correctness, here are three things I hit while putting two routes into real operation.

Treat failover as an unconditional safety net and your bill grows. Fail over even a 401 (a misconfigured key) and you spray the same bad request at both routes and get billed twice. Keeping auth and bad_request at failover_worthy=False is the safe default.
Without a cap on retry_after, an oversized grace stalls you. When an HTTP-date points far into the future as an outlier, the absence of a cap (30 seconds here) lets a single request freeze the loop for a long time. In my unattended jobs, the missing cap let downstream work pile up.
Do not conflate model-identifier differences with this normalization. Error normalization and model-name resolution (identifiers differ between direct and Azure) are separate layers. Mix them in one place and every fix to error handling breaks model resolution. Keep identifier resolution in its own resolver.

What to try next

Start by routing just your current calls through NormalizedError. Even before adding a second route, parsing both retry-after forms and separating retryable from failover_worthy cuts retry misreads considerably. Then, when you add the second route, you only write one more resolver and leave the upper loop untouched. I hope this helps anyone else running AI across multiple routes.

Thank You for Reading

Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.