Gemini 3 Pro 是首个在 ARC-AGI-2 上达到至少 23% 的模型,它在 2025 年 11 月就做到了(实际得分 31%)。 所以闭源与开源模型之间 8-12 个月的差距似乎仍然存在。但它们也更参差不齐,有些任务表现更好,有些则更差。
Gemini 3 Pro was the first model to achieve at least 23% on ARC-AGI-2, which it did in November, 2025 (it actually scored 31%).
So the 8-12 month gap between closed and open weights models still seems to hold. But they are also more jagged, better at some tasks, worse at others.