●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task●CODE — Claude Code ships a broad quality and reliability update with /rewind, stronger MCP resilience, and steadier OAuth handling●CODE — CPU and memory use drops during streaming and long sessions, keeping always-on automation stable●ADMIN — New org model restrictions let administrators control which models are available●MCP — Structured output, remote MCP, and session resume all get more reliable●MODEL — Claude Fable 5 is generally available, with a 1M-token context window, always-on adaptive thinking, and 128K output●LINEUP — Opus 4.8, Sonnet 4.6, and Haiku 4.5 lead the lineup; pick the right one per task
When an OAuth Token Expires, Your Unattended Run Has Nowhere to Go — A Token-Lifecycle Design That Keeps Remote MCP Alive
Remote MCP connectors are authorized via OAuth, but access tokens are short-lived. Interactive sessions can re-authorize in a browser; an unattended scheduled run has nobody to click the dialog. Here is a token-lifecycle design that owns expiry and refreshes ahead of time.
A job that was supposed to run at 2 a.m. has produced nothing by morning. Tracing the log, a single remote MCP tool call failed, and the cause was an expired access token. When you work interactively, an expired token is invisible: an authorization dialog appears, you click a button in the browser once, and everything continues. But an unattended scheduled run has nobody to click that button.
As an indie developer, I run automated posting across several sites for Dolice Labs, and more than once a freshly connected remote MCP connector worked fine, only to stall quietly some tens of hours later. The connector configuration itself was correct. What had broken was the one thing nobody was managing: the token's lifespan.
This article lays out, with working code, how to hold a remote MCP connector's OAuth tokens as state that you own — refreshing them ahead of time, both before and during a run — so your unattended jobs keep moving.
Why token expiry is invisible during interactive use
An OAuth access token is usually short-lived, expiring in anywhere from tens of minutes to a few hours. The refresh token, by contrast, is long-lived, and you can use it to obtain a new access token without a human present.
Interactive clients do this renewal silently behind the scenes. If the access token has expired, they quietly trade the refresh token for a new one; if that has also expired, they open a browser and ask the human, "Authorize?" In other words, the assumption that "a human can always click the authorize button" is exactly what turns expiry into an invisible problem.
That assumption collapses under unattended execution. Opening a browser does nothing because there is no one to click, so the moment both the access token and the refresh token expire, the job can no longer move forward. Worse, expiry comes back as a 401, which the MCP client layer tends to flatten into a generic tool error. All you see is "the tool failed," and it takes a while to realize the real reason was "authorization expired."
So for unattended operation, you cannot leave renewal to the client. You have to hold the lifecycle yourself. Let's build it up step by step.
Hold the token as state with an expiry
The first step is to persist the access token, refresh token, and expiry time together. Without an expiry time you have no idea when it will lapse, which makes refreshing ahead of time impossible.
import jsonimport osimport timeimport tempfilefrom dataclasses import dataclass, asdictfrom pathlib import Path@dataclassclass TokenSet: access_token: str refresh_token: str expires_at: float # UNIX seconds: the absolute time the access token expires def expires_in(self) -> float: return self.expires_at - time.time()class TokenStore: """Persists the token set as JSON. Writes are atomic.""" def __init__(self, path: str): self.path = Path(path) def load(self) -> TokenSet: data = json.loads(self.path.read_text(encoding="utf-8")) return TokenSet(**data) def save(self, tokens: TokenSet) -> None: # Write to a temp file in the same directory, then rename. # If the process dies mid-write, we never overwrite with a partial file. d = self.path.parent d.mkdir(parents=True, exist_ok=True) fd, tmp = tempfile.mkstemp(dir=d, suffix=".tmp") try: with os.fdopen(fd, "w", encoding="utf-8") as f: json.dump(asdict(tokens), f) os.replace(tmp, self.path) os.chmod(self.path, 0o600) # only the owner may read the token file finally: if os.path.exists(tmp): os.unlink(tmp)
The quietly important part is that the save is atomic (temp file plus rename). If the process dies while saving tokens, a half-written file is left behind and the next load fails. Renewal happens on every unattended run, so a break here becomes a breeding ground for silent stalls. The chmod 0o600 narrows the file's permissions because tokens are secrets.
Token endpoints usually return expires_in (seconds remaining) rather than expires_at (an absolute time), so convert it to an absolute time with time.time() + expires_in the instant you receive it and store that. Keeping seconds-remaining forces you to separately remember when you fetched it, which is a source of drift.
✦
Thank you for reading this far.
Continue Reading
What follows includes implementation code, benchmarks, and practical content we hope you'll find useful. This site runs without ads — server and development costs are supported entirely by members like you. If it's been helpful, we'd be truly grateful for your support.
WHAT YOU'LL LEARN
✦Owning the access token's expiry yourself and refreshing ahead of time using a skew window
✦A token store resilient to refresh-token rotation invalidation and clock skew
✦A retreat design that fails loudly with a structured skip instead of stalling silently
Secure payment via Stripe · Cancel anytime
✦
Unlock This Article
Get full access to the rest of this article. Buy once, read anytime. This site is ad-free — your support goes directly toward keeping it running.
If you decide based only on whether the token is "expired right now," you hit a race where it lapses in the few seconds between your check and the call. To avoid this, refresh proactively once it drops below a margin (the skew), even while it is still valid.
import requestsTOKEN_ENDPOINT = "https://example-mcp-provider.com/oauth/token"CLIENT_ID = os.environ["MCP_OAUTH_CLIENT_ID"]CLIENT_SECRET = os.environ["MCP_OAUTH_CLIENT_SECRET"]# Refresh 120 seconds before expiry. The longer the run, the larger this should be.REFRESH_SKEW_SECONDS = 120class ReauthRequired(Exception): """The refresh token itself has expired; a human re-authorization is needed."""def refresh_tokens(store: TokenStore, current: TokenSet) -> TokenSet: resp = requests.post( TOKEN_ENDPOINT, data={ "grant_type": "refresh_token", "refresh_token": current.refresh_token, "client_id": CLIENT_ID, "client_secret": CLIENT_SECRET, }, timeout=15, ) # invalid_grant means the refresh token itself is dead. Re-auth required. if resp.status_code == 400 and "invalid_grant" in resp.text: raise ReauthRequired("refresh token is no longer valid") resp.raise_for_status() payload = resp.json() new_tokens = TokenSet( access_token=payload["access_token"], # If rotation returns a new refresh_token, swap it in. # Some providers don't return one; in that case keep the existing one. refresh_token=payload.get("refresh_token", current.refresh_token), expires_at=time.time() + payload["expires_in"], ) store.save(new_tokens) return new_tokensdef ensure_fresh_token(store: TokenStore) -> TokenSet: tokens = store.load() if tokens.expires_in() <= REFRESH_SKEW_SECONDS: tokens = refresh_tokens(store, tokens) return tokens
That single line, refresh_token=payload.get("refresh_token", current.refresh_token), earns its keep in production. Many providers use refresh-token rotation: every renewal returns a new refresh token and invalidates the old one. If you fail to save the new one, your next renewal returns invalid_grant, and from there you cannot recover unattended.
Set the skew value by working backward from the job's length. For a light task that finishes in one call, 120 seconds is plenty; for a job that runs for tens of minutes, a token valid at the start will lapse partway through. The next section handles that mid-run expiry.
Prepare for long jobs that expire mid-run
Even if you refresh before starting, a long job will see its access token expire during execution. The defense is two layers. First, route every token-needing call through ensure_fresh_token right before it. Then, if a 401 still slips through, force a refresh and retry exactly once.
def call_mcp_tool(store: TokenStore, tool_name: str, arguments: dict) -> dict: tokens = ensure_fresh_token(store) def _do(access_token: str) -> requests.Response: return requests.post( "https://example-mcp-provider.com/mcp/call", headers={"Authorization": f"Bearer {access_token}"}, json={"name": tool_name, "arguments": arguments}, timeout=30, ) resp = _do(tokens.access_token) # Even with proactive refresh, server-side clock skew can return a 401. # Only then do we force a refresh and retry a single time. if resp.status_code == 401: tokens = refresh_tokens(store, tokens) resp = _do(tokens.access_token) resp.raise_for_status() return resp.json()
The key is to cap the 401 retry at exactly one. If a freshly refreshed token returns 401 again, it is not expiry — it is something else, like insufficient permissions or a scope mismatch. Looping refresh-and-retry forever only piles load onto the token endpoint and eventually trips rate limits, blocking even the renewals you genuinely need. Try once; if it fails, raise an exception and handle it higher up.
A mid-run pitfall that is easy to miss is contention when several jobs share one token store. If two processes simultaneously decide "it's about to expire" and refresh at the same time, rotation invalidates one's refresh token first, and the other dies with invalid_grant. If you run concurrent jobs on the same host, wrap the refresh in a file lock so they cannot run at once.
import fcntlfrom contextlib import contextmanager@contextmanagerdef refresh_lock(lock_path: str): with open(lock_path, "w") as lf: fcntl.flock(lf, fcntl.LOCK_EX) # other processes wait until refresh ends try: yield finally: fcntl.flock(lf, fcntl.LOCK_UN)
Wrap the refresh_tokens call in this lock, and after acquiring the lock re-read the latest expiry with store.load() before deciding whether a refresh is still needed. That way you avoid duplicating a refresh that another process completed while you were waiting.
When the refresh itself fails, do not stall silently
This is the most important design decision. A ReauthRequired — the refresh token has expired too — cannot self-recover unattended. If you simply die with an exception here, the job is never even recorded as "failed," and it stays stuck for the entire wait until the next run.
In unattended operation, the loss comes not from being unrecoverable but from being unnoticed. So even when you retreat, leave a structured trace a human can pick up later.
import sysimport datetimedef run_job_with_token_guard(store: TokenStore, run_job) -> int: try: run_job(store) # the job body; uses call_mcp_tool internally return 0 except ReauthRequired as e: record = { "event": "mcp_reauth_required", "connector": "example-mcp-provider", "detail": str(e), "ts": datetime.datetime.now(datetime.timezone.utc).isoformat(), "action": "skipped_run; human re-authorization needed", } # Emit the structured log to stderr, not to "success-looking" stdout. print(json.dumps(record, ensure_ascii=False), file=sys.stderr) notify_owner(record) # email/notification — a channel you will actually see # Make "not a clean exit" explicit via the exit code. return 75 # EX_TEMPFAIL: temporary failure, recoverable after re-auth
Three things are deliberate here. First, make the unrecoverable state a dedicated exception (ReauthRequired) that is distinguishable from a generic error. Second, route the trace to stderr and a notification rather than stdout, so it cannot be mistaken for "success." Third, declare a temporary failure through the exit code, telling both the human and the scheduler that this is the kind of stop that recovers once you re-authorize.
The body of notify_owner can be anything. A single line into a channel you are sure to glance at within the day — a daily summary or a message notification — is enough. What matters is that a "needs re-auth" state does not pile up unseen.
Pitfalls you are likely to hit in operation
Even once the design is settled, a few things will trip you in operation. Here are the ones I actually ran into.
Pitfall
Symptom
Fix
Not saving the rotated refresh token
Works a few times, then always invalid_grant
Save the refresh_token from every renewal response; design for rotation
Clock skew with the server
Sporadic 401s despite proactive refresh
Use a larger skew and pair it with the one-shot forced refresh on 401
Concurrent-run contention
Only one of the jobs dies with invalid_grant
Wrap refresh in a file lock; re-read expiry after acquiring it
Corrupted token file
JSON load throws; all jobs stop right at startup
Make saves atomic with temp-file-plus-rename
Clock skew in particular is a hard pitfall to spot. If your local clock and the provider's clock differ by tens of seconds, you think the token is "still valid" while the other side has decided it is "already expired." A slightly larger proactive skew absorbs this drift. Proactive refresh and the one-shot forced refresh on 401 are not alternatives — only together do they make you resilient to both drift and mid-run expiry.
A first step to try
If you have a remote MCP process running unattended right now, start by logging a single line with the token's expiry time. Inside ensure_fresh_token, just record whether you refreshed and how many seconds remained. After a few days of logs, you can see in numbers how often your job refreshes its token and where the proactive skew should sit. Build the refresh and lock out from there, and the kind of stall that quietly happens at midnight should drop off considerably.
I hope this helps anyone else wrestling with token expiry in unattended operation. Thank you for reading.
Share
Thank You for Reading
Claude Lab is ad-free, supported entirely by members like you. We publish practical guides daily with implementation code, benchmarks, and production-ready patterns. If you've found it useful, we'd love to have you on board.