# OpenRouter 可靠性与自动故障转移：请求如何持续成功

- 来源：OpenRouter：Announcements（RSS）
- 发布时间：2026-06-13 00:00
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmqbpxbo803t1slamvkhw62lr
- 原文链接：https://openrouter.ai/blog/insights/reliability-failover

## AI 摘要

OpenRouter 默认启用提供商故障转移（provider failover），模型回退（model fallbacks）则为选择加入。这两层机制分别应对不同类型的故障：提供商故障转移在 API 调用失败时自动切换至其他提供商，模型回退则在指定模型不可用时切换到备选模型。公告详细说明了各层的工作原理以及故障转移的停止条件。

## 正文

OpenRouter Reliability & Automatic Failover: How Requests Keep Succeeding

OpenRouter · 6/12/2026

Tl;dr

Why Do LLM API Requests Fail?

Do You Pay for Failed Requests on OpenRouter?

Provider Failover vs Model Fallbacks

How Provider-Layer Failover Keeps One Model Up

How to Set Up Model Fallbacks

How OpenRouter Routes Around Outages in Real Time

What Failover Does NOT Cover

OpenRouter vs Portkey: Two Approaches to Failover

Configuring Failover for Production: A Checklist

Frequently Asked Questions

Calling one provider directly means a single point of failure. When it goes down, your users get errors, and you find out from a support ticket an hour later. That’s the problem OpenRouter solves: it routes every request to keep it succeeding, across providers automatically and across models when configured.

With OpenRouter, you build reliability into your app with 2 separate configurations. Provider failover is automatic and on by default. Model fallbacks are opt-in.

The 2 layers cover different failures; if every provider for a primary model fails simultaneously, provider failover has nowhere to go. Model fallbacks are the second line of defense.

Here’s a config worth starting from on every project. Copy it and adjust:

from openrouter import OpenRouter client = OpenRouter(api_key="") completion = client.chat.send( model="anthropic/claude-sonnet-4.6", models=["openai/gpt-5.4-mini"], # fallback if the primary fails messages=[{"role": "user", "content": "Summarize this incident report."}], )

from openrouter import OpenRouter client = OpenRouter(api_key="") completion = client.chat.send( model="anthropic/claude-sonnet-4.6", models=["openai/gpt-5.4-mini"], # fallback if the primary fails messages=[{"role": "user", "content": "Summarize this incident report."}], )

Tl;dr

LLM requests fail for predictable reasons: provider outages, rate limits (429), context-length errors, and content-moderation refusals.

Reliability comes in 2 layers: provider-layer failover (on by default, recovers within one model) and model-layer fallbacks (opt-in via a models array, recovers across models).

models

A routing layer detects provider health in real time and steers around outages, so worst-case uptime beats any single provider you’d integrate directly.

Failover walks your models list in order. Once the list is exhausted, the last error comes back, so order it with a reliable floor model last.

models

You don’t pay for a request that ultimately fails, but users have reported edge cases (some 429 paths, partial outputs) that still consume credits, so watch your activity log and set spend limits.

Restricting providers with only/ignore/order trades reliability for control: a narrower candidate set means fewer fallbacks.

only

ignore

order

Why Do LLM API Requests Fail?

Provider outages, rate limits (429), context-length validation errors, and content-moderation refusals are the predictable reasons an LLM request fails. A single direct provider integration has no recovery path for any of them, so each one becomes a user-facing error.

The simplest example is rate limits. You call one provider directly, hit its per-minute ceiling, and your only options are to back off, queue, or fail. None of that helps the user staring at a spinner.

There’s a reason the community calls a routing layer “the DNS of AI”: it stays up because it has more than one place to send the request.

Each of those 4 failure modes maps to a specific recovery layer in OpenRouter, and knowing which layer handles what is how you configure reliability correctly.

The 4 failure modes, mapped to a recovery layer

Here’s what fails, and which layer of OpenRouter recovers it.

Failure modeWhat it looks likeRecovered byProvider outage / downtime5xx, timeouts, dropped connectionsProvider-layer failover (next provider)Rate limiting (429)“Too Many Requests” from the providerProvider-layer failover, then model fallbackContext-length errorPrompt exceeds the model’s windowModel-layer fallback (try a larger-context model)Moderation refusalFiltered model refuses to replyModel-layer fallback (try an unfiltered model)

The first 2 are infrastructure problems a second provider solves. The last 2 are model problems that a second model solves.

Do You Pay for Failed Requests on OpenRouter?

Short answer: no. When a request ultimately fails after failover is exhausted, you aren’t billed; you pay only for the successful run (zero-completion insurance).

This makes retries cheap to design for: a fallback chain that burns through 3 providers before succeeding costs you one successful completion. You can be aggressive with fallbacks without watching the meter on every failed try.

The exception you should plan for

Real-world edge cases exist, and we’d rather you read about them here than find out from your billing dashboard. Some users have reported cases where error 429 consumed credits, or where partial outputs were counted despite an error. So the policy is “pay only for successful runs,” but a few 429 paths and partial outputs have slipped through.

Honest trade-off: zero-completion insurance is real, but it isn’t airtight. Check your activity log to confirm what you were charged for, and set hard spend limits so an edge case can’t run up a bill. Design with a spend cap instead of assuming every failed request is free.

Provider Failover vs Model Fallbacks

OpenRouter recovers from failures in 2 distinct layers. Provider-layer failover is automatic and on by default; model-layer fallbacks are opt-in through a models array. One keeps a single model alive across providers, the other moves to a different model entirely.

models

OpenRouter fails over between providers automatically, and you shape the candidate set with ignore, only, and order. You don’t write retry logic for the common case.

ignore

only

order

ignore blocks specific providers by slug. only restricts to an allow-list. order sets an explicit try-this-first sequence.

ignore

only

order

All 3 narrow the candidate set, so use them deliberately; fewer eligible providers mean fewer fallback options.

Provider-layer failoverModel-layer fallbacksWhat it recoversOutage or 429 on the provider serving your modelA whole model being unavailable, plus context-length and moderation refusalsDefaultOn (allow_fallbacks: true)Off until you set a models arrayConfig that controls itallow_fallbacks, order, only, ignoremodels array (priority order)Scope of recoverySame model, different providerDifferent model entirely

allow_fallbacks: true

models

allow_fallbacks

order

only

ignore

models

That’s the static view. The diagram below shows what actually happens at runtime: how a single request moves through both layers, and where it exits as a success or a final error.

Provider-layer failover: one model, many providers

A single model like Claude Sonnet 4.6 is often served by several providers. If the provider OpenRouter picks returns a 5xx or rate-limits, it automatically tries the next provider for that same model. This is governed by allow_fallbacks, which defaults to true (provider-selection docs).

allow_fallbacks

true

Zero config. You get this the moment you send a request.

Model-layer fallbacks: when a whole model is unavailable

If every provider for your primary model fails, provider-layer failover has nowhere left to go. That’s where the models array takes over: OpenRouter moves to the next model in your list (model-fallbacks docs). This layer is opt-in because it changes which model answers, which is a decision only you can make.

models

A context-length error or a moderation refusal also triggers this layer, since those are problems a different provider can’t fix.

The 2 layers work together, but they work differently behind the scenes. Provider-layer failover runs automatically with no setup. Here’s what it’s actually doing with each request.

How Provider-Layer Failover Keeps One Model Up

For each model, OpenRouter load-balances across providers to maximize uptime using a published 3-step rule: prioritize providers with no significant outages in the last 30 seconds, pick the lowest-cost stable candidate weighted by the inverse square of price, and keep the rest as fallbacks (provider-selection docs). This is a reliability mechanism first and a cost mechanism second.

In practice, any provider that errored in the last 30 seconds drops to the back of the line, and among stable providers, the cheapest one is picked first at roughly the square of the price difference. Reliability-first, cost-second, automatic.

The 30-second outage window is the part that matters for uptime. A provider that hiccuped in the last half-minute drops out of the front of the line automatically, with no action from you.

Walking the load-balancing math

The docs’ worked example shows how reliability and cost work together. Say Provider A costs $1/M tokens, Provider B costs $2/M, and Provider C costs $3/M, and B recently saw a few outages.

OpenRouter routes to A first, and A is roughly 9x more likely to be tried before C because of the inverse-square weighting (1/3² = 1/9). If A fails, C is next. B, the recently-flaky one, is tried last: the outage history pushes the unreliable provider to the back without excluding it.

That’s the default behavior. But if you already know a provider is bad, you don’t have to wait for the routing math to figure it out.

Controlling the candidate set

You can shape which providers are eligible, which is how you block an unreliable provider:

order: try providers in an explicit sequence, e.g. order: ["anthropic", "together"].

order

order: ["anthropic", "together"]

only: an allow-list of provider slugs for the request.

only

ignore: a block-list, e.g. provider: { ignore: ["deepinfra"] } to skip an endpoint you’ve found serves an over-quantized model.

ignore

provider: { ignore: ["deepinfra"] }

allow_fallbacks: false: hard-stop to your chosen providers, no automatic backups.

allow_fallbacks: false

Honest trade-off: narrowing with only, ignore, or order “may significantly reduce fallback options and limit request recovery” (provider-selection docs, verbatim). Every provider you exclude is one fewer place to recover. Restricting the pool buys you control and costs you reliability, so prune deliberately.

only

ignore

order

Bounding worst-case latency without losing the pool

If you need predictable latency, set preferred_max_latency or preferred_min_throughput with percentile cutoffs over a rolling 5-minute window (provider-selection docs). Endpoints that miss the threshold get deprioritized, rather than excluded. Here’s what that looks like combined with ignore:

preferred_max_latency

preferred_min_throughput

ignore

completion = client.chat.send( model="deepseek/deepseek-v4-flash", provider={ "preferred_max_latency": {"p90": 3}, # prefer ") completion = client.chat.send( model="anthropic/claude-sonnet-4.6", models=["openai/gpt-5.4-mini", "google/gemini-3.5-flash"], # floor model last provider={ "ignore": ["deepinfra"], # exclude a known-bad endpoint "preferred_max_latency": {"p90": 3}, # bound worst-case latency # allow_fallbacks stays true by default }, messages=[{"role": "user", "content": "Summarize this thread."}], ) print(completion.model) # confirm which model answered

from openrouter import OpenRouter client = OpenRouter(api_key="") completion = client.chat.send( model="anthropic/claude-sonnet-4.6", models=["openai/gpt-5.4-mini", "google/gemini-3.5-flash"], # floor model last provider={ "ignore": ["deepinfra"], # exclude a known-bad endpoint "preferred_max_latency": {"p90": 3}, # bound worst-case latency # allow_fallbacks stays true by default }, messages=[{"role": "user", "content": "Summarize this thread."}], ) print(completion.model) # confirm which model answered

Get an API key, and the failover defaults are already on. We recommend adding a models array on day one. It’s the cheapest safety net you’ll set up.

models

Frequently Asked Questions

How does OpenRouter handle failover when a provider goes down?

For a single model served by multiple providers, OpenRouter automatically tries the next provider when the chosen one returns a 5xx or rate-limits. This provider-layer failover is on by default (allow_fallbacks: true) and requires no configuration (provider-selection docs).

allow_fallbacks: true

What is the difference between provider failover and model fallbacks?

Provider-layer failover keeps one model alive by switching providers, and it’s automatic. Model-layer fallbacks switch to a different model entirely via a models array, and they’re opt-in. The first recovers from provider outages and rate limits; the second also recovers from context-length errors and moderation refusals.

models

What triggers an automatic fallback on OpenRouter?

4 conditions: downtime, rate-limiting, context-length validation errors, and moderation flags for filtered models (model-fallbacks docs). Downtime and rate limits are handled at the provider layer first, then the model layer; context-length and moderation are handled at the model layer.

Does OpenRouter charge for failed requests?

No. You pay only for the successful run; a request that fails after failover is exhausted is not billed (zero-completion insurance). Plan for one documented exception: users have reported that some 429 paths and partial outputs still consumed credits, so set spend limits and check your activity log.

How reliable is OpenRouter for production use?

It routes around provider outages in real time using a 30-second health window and published per-model uptime (uptime-optimization docs), which makes worst-case uptime better than any single provider integrated directly. It isn’t zero-risk: an August 2025 gateway outage showed the routing layer has its own dependencies. Design retries and monitor status.openrouter.ai.

How do I set up fallback models on OpenRouter?

Pass a models array in priority order. The OpenRouter SDKs take models as a first-class field, e.g. models=["openai/gpt-5.4-mini"]; with the OpenAI SDK pass it through extra_body (model-fallbacks docs).

models

models

models=["openai/gpt-5.4-mini"]