The Information 报道,OpenAI 已将某些现有模型的推理成本降低一半以上,未登录 ChatGPT 的流量仅运行在几百块 Nvidia GPU 上。可能技术手段包括量化、KV-cache 优化、批处理、投机解码和路由简单查询。若属实,这将成为核心竞争杠杆,可提升毛利率、扩大使用限制或降低 API 定价压力。背景方面,OpenAI 调整后毛利率从 2024 年的 40% 降至 2025 年的 33%,推理成本翻四倍。预计 2026 年 Q1 毛利率回升至 39%,年底目标 52%。Anthropic 毛利率约 44%,前沿实验室尚未达到成熟软件公司的经济水平。
The Information reports that OpenAI has cut inference costs by more than half on some existing models, while logged-out ChatGPT traffic ran on only a couple hundred Nvidia GPUs.
The obvious guesses include quantization, KV-cache changes, batching, speculative decoding, and routing easy queries cheaper.
If true, it will be a huge core competitive lever, lower cost can raise margins, expand usage limits, or reduce pressure on API pricing.
For some context, OpenAI's adjusted gross margin fell to 33% in 2025 from 40% in 2024, after inference costs quadrupled.