上周,Kimi K2.6、MiMo V2.5 Pro和DeepSeek V4 Pro三大领先开源模型发布,在Artificial Analysis Intelligence Index上得分达52-54分,与顶尖闭源模型GPT-5.5的60分差距缩小至6分以内,相比一年前22分的开源模型进步显著。这些模型均为万亿参数规模的MoE架构。然而,在复杂推理、智能体编码及知识准确性方面,开源模型与闭源模型仍存在明显差距。例如在HLE、CritPt和TerminalBench Hard等专项评估中得分大幅落后;在Omniscience评估中,DeepSeek V4 Pro的幻觉问题尤为突出。
All three leading open weights models were released last week. Progress continues for open weights models alongside proprietary ones, with the gap to GPT-5.5, the leading proprietary model, sitting at 6 points on the Artificial Analysis Intelligence Index
@Kimi_Moonshot's Kimi K2.6 (Reasoning) and @Xiaomi's MiMo V2.5 Pro (Reasoning) tie as the leading open weights models on the Artificial Analysis Intelligence Index at 54, with @deepseek_ai's DeepSeek V4 Pro (Reasoning, Max Effort) at 52. This places the best open weights models within 3-6 points of the leading proprietary models: @OpenAI's GPT-5.5 (xhigh) at 60, and @Google's Gemini 3.1 Pro Preview and @AnthropicAI's Claude Opus 4.7 (Adaptive Reasoning, Max Effort) at 57.