Haider.@haider1

2026-04-08 06:30·86天前

AI 摘要

我仍然无法释怀看看这些基准测试结果： > swe-bench 已验证：mythos 93.9% vs opus 4.6 80.8% > swe-bench 专业版：mythos 77.8% vs opus 4.6 53.4% > swe-bench 多语言版：mythos 87.3% vs opus 4.6 77.8% > swe-bench 多模态版：mythos 59.0% vs opus 4.6 27.1% > terminal-bench 2.0：mythos 82.0% vs opus 4.6 65.4%

i still can't get over this

look at those benchmark results：

swe-bench verified： mythos 93.9% vs opus 4.6 80.8%

swe-bench pro： mythos 77.8% vs opus 4.6 53.4%

swe-bench multilingual： mythos 87.3% vs opus 4.6 77.8%

swe-bench multimodal： mythos 59.0% vs opus 4.6 27.1%

terminal-bench 2.0： mythos 82.0% vs opus 4.6 65.4%

现象/趋势编码评测/基准

在 X 查看原推导出 Markdown

Haider.@haider1 · X

39导出 Markdown