# OpenAI 将部分模型推理成本降低过半，未登录 ChatGPT 仅用几百张 GPU

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-07-01 05:15
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmr15w2vg0067slh4mun44zqy
- 原文链接：https://x.com/rohanpaul_ai/status/2072066442623852965

## AI 摘要

The Information 报道，OpenAI 已将某些现有模型的推理成本降低一半以上，未登录 ChatGPT 的流量仅运行在几百块 Nvidia GPU 上。可能技术手段包括量化、KV-cache 优化、批处理、投机解码和路由简单查询。若属实，这将成为核心竞争杠杆，可提升毛利率、扩大使用限制或降低 API 定价压力。背景方面，OpenAI 调整后毛利率从 2024 年的 40% 降至 2025 年的 33%，推理成本翻四倍。预计 2026 年 Q1 毛利率回升至 39%，年底目标 52%。Anthropic 毛利率约 44%，前沿实验室尚未达到成熟软件公司的经济水平。

## 正文

The Information reports that OpenAI has cut inference costs by more than half on some existing models， while logged-out ChatGPT traffic ran on only a couple hundred Nvidia GPUs.

The obvious guesses include quantization， KV-cache changes， batching， speculative decoding， and routing easy queries cheaper.

If true， it will be a huge core competitive lever， lower cost can raise margins， expand usage limits， or reduce pressure on API pricing.

For some context， OpenAI's adjusted gross margin fell to 33% in 2025 from 40% in 2024， after inference costs quadrupled.

Some reporting now puts Q1-2026 at 39%， with a 52% target by year-end.

Anthropic looks similar at roughly 44%， so frontier labs remain far below mature software economics.

---

theinformation .com/newsletters/ai-agenda/openai-discovers-new-way-cut-inference-costs-half