Gemini API 推出 Flex 与 Priority 层级,平衡成本与可靠性
阅读原文· blog.googleGoogle 为 Gemini API 新增 Flex 和 Priority 两个推理层级,帮助开发者在成本与延迟之间灵活取舍。
New ways to balance cost and reliability in the Gemini API
Apr 02, 2026
Introducing Flex and Priority inference: advanced controls for developers to optimize costs and reliability through a single, unified interface.
Your browser does not support the audio element.
Today, we are adding two new service tiers to the Gemini API: Flex and Priority. These new options give you granular control over cost and reliability through a single, unified interface.
As AI evolves from simple chat into complex, autonomous agents, developers typically have to manage two distinct types of logic:
Background tasks: High-volume workflows like data enrichment or "thinking" processes that don't need instant responses.
Interactive tasks: User-facing features like chatbots and copilots where high reliability is needed.
Until now, supporting both meant splitting your architecture between standard synchronous serving and the asynchronous Batch API. Flex and Priority help to bridge this gap. You can now route background jobs to Flex and interactive jobs to Priority, both using standard synchronous endpoints. This eliminates the complexity of async job management while giving you the economic and performance benefits of specialized tiers.