OpenAI 将访客 ChatGPT 用户的推理成本降低逾一半
阅读原文· the-decoder.com本月早些时候,OpenAI 工程师将现有 AI 模型的推理成本削减了一半以上,并应用于未注册账户的访客 ChatGPT 用户。所需 Nvidia GPU 数量降至仅几百块。同时,DeepSeek 开源了一种新方法,可将推理请求速度提高 60% 至 85%。这些节省的资源可用于扩大服务规模、优化模型或改善响应速度。
OpenAI reportedly cut response costs for guest ChatGPT users by more than half
OpenAI engineers told colleagues earlier this month that they'd managed to cut inference costs—the expense of running existing AI models—by more than half. That's according to a person familiar with the discussions, as reported by The Information.
OpenAI applied the new optimizations to ChatGPT, specifically for visitors who don't have an account. The number of Nvidia GPUs needed to serve those users dropped to just a few hundred. It's not clear how many were required before or what techniques OpenAI used to pull it off. Guest users can only access a very limited set of ChatGPT features, so whether these gains would carry over to the full product is an open question.
Deepseek also just dropped a new open-source method that can speed up inference requests by 60 to 85 percent. The freed-up resources could go toward scaling services, better models, faster responses, or bigger margins. But since data center buildouts are moving slowly, gains like these will probably give labs more breathing room rather than cut into chip demand.