OpenAI 据报告发现新的推理优化技术,将模型运行成本降低超过一半。据 The Information,工程师本月初称,这些技术曾仅用数百块 Nvidia GPU 就支撑了免费用户的 ChatGPT 访问。具体方法不明,可能涉及量化、KV 缓存、批处理、路由简单查询至更便宜模型等组合。商业层面:OpenAI 一季度毛利率 39%,目标年底达到 52%。更低推理成本可改善利润率、提升 ChatGPT 使用限制或缓解 API 定价压力。OpenAI 的护城河正转向推理和成本优势,尤其相对于 Anthropic。
OpenAI reportedly found new inference optimizations that more than halved the cost of running its models!
According to The Information, engineers told colleagues this month that the techniques helped power ChatGPT for visitors without free or paid accounts using only a couple hundred Nvidia GPUs at one point.
The exact method is unclear. It could involve quantization, KV caching, batching, routing simpler queries to cheaper models, or some mix of all of those.
The business angle is bigger than the technical detail: OpenAI ended Q1 with a 39% gross margin and wants to reach 52% by year-end. Lower inference costs give it room to either improve margins, raise ChatGPT usage limits, or cut API pricing pressure on developers.