# OpenRouter 推出 Response Caching：相同请求零成本缓存

- 来源：OpenRouter：Announcements（RSS）
- 发布时间：2026-05-01 02:00
- AIHOT 分数：58
- AIHOT 标记：精选
- AIHOT 链接：https://aihot.virxact.com/items/cmq9zl2hb0gpfslld1sjoke4v
- 原文链接：https://openrouter.ai/blog/announcements/response-caching

## 精选理由

OpenRouter 这波缓存功能很实在，相同请求零费用，对频繁调用的开发者是省钱利器。可惜 42 天前的东西，现在只能当文档翻翻。

## AI 摘要

OpenRouter 新增 Response Caching 头部，用于缓存完全相同的 API 请求，使后续请求的响应时间大幅缩短，且缓存调用完全免费。

## 正文

Response Caching: Zero Cost for Identical Requests

Brian Thomas · 4/30/2026

On this page

What it does

Reduces response times from seconds to milliseconds

Enable it with a request header or with presets

Where it helps most

Available now across most generation endpoints

You can now add X-OpenRouter-Cache: true to your chat completions, responses, messages, or embeddings requests to start caching identical calls. The first call hits the provider and gets billed normally. Every identical call after that returns the same response in a tiny fraction of the time, with zero tokens billed.

View the response caching docs

What it does

Response caching sits in front of the model provider. When you send a request with caching enabled, OpenRouter hashes the request body, model, API key, and streaming mode into a cache key. If an identical request was made before and hasn’t expired, the cached response comes back immediately. No provider call, no token consumption, no charge.

Both streaming and non-streaming requests work. Cached streaming responses replay through the same pipeline, so your client code doesn’t need to change. Text, images, audio, documents, and tool calls all cache normally. Multimodal inputs (base64 images, audio clips, file attachments) are included in the cache key hash. One caveat: very large multimodal payloads that get offloaded internally for processing aren’t eligible for caching. Standard-sized requests cache fine.

Response caching is separate from prompt caching. Prompt caching (which many providers offer natively) reduces the cost of the prompt portion when messages share a common prefix. Response caching skips the provider entirely and returns the full response from OpenRouter’s edge cache.

Reduces response times from seconds to milliseconds

Cached responses come back in 80-300ms, most of which is serialization and network. The cache lookup itself averages 4ms. For comparison, a typical uncached request to Gemini 2.5 Flash takes about 1.3 seconds, Kimi K2.6 takes 4.6 seconds, and GPT-5.5 takes 9.1 seconds. Cache hits are billed at zero: no prompt tokens, no completion tokens, no charge.

Enable it with a request header or with presets

Add the X-OpenRouter-Cache: true header to each API call you want to be eligible:

curl https://openrouter.ai/api/v1/chat/completions \
-H "Authorization: Bearer $OPENROUTER_API_KEY" \
-H "Content-Type: application/json" \
-H "X-OpenRouter-Cache: true" \
-d '{
"model": "google/gemini-2.5-flash",
"messages": [{"role": "user", "content": "What is the meaning of life?"}]
}'

Presets. Enable caching for all requests using a specific preset by setting cache_enabled: true in the preset config. No header needed on individual requests.

You can control how long responses stay cached with X-OpenRouter-Cache-TTL (1 second to 24 hours, default 5 minutes). Need a fresh response? Send X-OpenRouter-Cache-Clear: true to bust the cache for that specific request.

Response headers tell you what happened: X-OpenRouter-Cache-Status: HIT or MISS, plus X-OpenRouter-Cache-Age and X-OpenRouter-Cache-TTL so you can see exactly how the cache is performing.

Where it helps most

Agent retries. When an agent workflow fails partway through, you can retry from the top. Cached steps return instantly and for free, so you only pay for the new work.

Test suites. Run your LLM-backed tests repeatedly without burning tokens. After the first run populates the cache, subsequent runs are deterministic and free.

Repeated context processing. If your app sends the same prompt to the same model (same system prompt, same user input, same parameters), only the first call costs anything.

Available now across most generation endpoints

The cache is scoped to your API key. Different keys (even under the same account) don’t share cache entries.

The feature works across /chat/completions, /responses, /messages, and /embeddings. Other endpoints — legacy /completions, /audio/speech (TTS), /audio/transcriptions (STT), /rerank, and video generation — are not yet supported. It’s currently in beta, and we’re watching how it performs before locking down the API surface.

Cache hits don’t count toward provider rate limits (since the request never reaches the provider), and they’re visible in your Activity log with a cache indicator for easy monitoring.

Full details in the docs.
