IMO DeepSeek v4 展现了十足的自信与能力,它没有进行基准刷分,没有关注某些无意义的最终运行成本,甚至没有投入推理最优的计算资源。 只是亮相,展示了SOTA的长上下文效率技术(CSA、HCA、mHC,以pro版本8%的成本实现flash,而pro版本成本仅为opus的14%),发布了全球最佳的开源基础模型,然后潇洒离场。 后续训练请自行处理。留给智能体实验室去收拾残局吧。喝彩。
IMO DeepSeek v4 demonstrated utter confidence and competence by not benchmaxxing, not focusing on some BS final run cost, not even spending inference-optimal compute.
just showed up, demonstrated SOTA long context efficiency techniques (CSA, HCA, mHC, flash at 8% cost of pro, which itself is 14% cost of opus), dropped the best open base models in the world, peaced out.
BYO posttraining. leave that to the agent labs to pick up the scraps. bravo.