swyx 🔜 @aiDotEngineer@swyx

2026-06-28 03:16·4小时前

AI 摘要

swyx引用OpenAI研究员Noam Brown的观点，强调任何评估报告都应保持恒定推理预算。由于开源模型每美元可获得的token量远超闭源API，因此发布开源模型时，应按主流推理提供商的美元成本（而非token数量）来报告思考水平。该观点源自@saranormous与Noam Brown的播客，他们讨论了大规模测试时计算的后果——模型被给予1000万美元预算处理单一任务，并探讨了基准测试失效、计算预算扩展、能力随投入增长及安全等问题。

An interesting way to take Noam at his word in regards to always keeping a constant inference budget for any eval reporting -

is that open models have a lot more dollar per token mileage than closed model APIs. So anyone launching an open model today or situationally incentivized toward open models should obviously report thinking levels measured by dollar inference on popular inference providers， instead of by number of tokens on the x axis

sarah guoReally fun to hang again with my friend 🃏 @polynoamial (OpenAI research scientist, our first guest ever on @NoPriorsPod in early 2023) to talk about the implic...

大佬观点开源生态评测/基准

在 X 查看原推

swyx 🔜 @aiDotEngineer@swyx · X

2026-06-28 03:16·4小时前

在 X 看原推· x.com

AI 摘要

An interesting way to take Noam at his word in regards to always keeping a constant inference budget for any eval reporting -

sarah guoReally fun to hang again with my friend 🃏 @polynoamial (OpenAI research scientist, our first guest ever on @NoPriorsPod in early 2023) to talk about the implic...

大佬观点开源生态评测/基准

在 X 查看原推x.com