我采用了 @ArtificialAnlys 最新的 AA-Briefcase 评分(基本上是让 AI 完成为期数周、复杂度高的咨询任务),并绘制了开放与封闭模型的前沿曲线: 1) 令人意外的是,进展迅速! 2) 开放权重差距清晰可见。
I took the new AA-Briefcase scores from @ArtificialAnlys (basically having the AI do multi-week consulting gigs with a lot of complexity) and graphed the frontier curve for open and closed models: 1) Surprise, rapid gains! 2) The open weights gap is clear https://artificialanalysis.ai/evaluations/aa-briefcase