鉴于所有 AI 公司混乱的命名方案,我让人制作了一张图表,展示模型名称中每 0.1 版本在 GPQA 上的提升(估算值,因为模型名称会跳过版本号)。 从未有过比 Claude 3.7 命名更不当的模型,它本应该是 4.4。https://t.co/ZynramTEpG
Given the messy naming scheme used by all the AI companies, I caused a chart to be made showing the gain in GPQA per 0.1 version in model names (estimated, since model names skip version numbers).
There has never been a more misnamed model that Claude 3.7, should have been 4.4.