我仍然无法释怀 看看这些基准测试结果: > swe-bench 已验证:mythos 93.9% vs opus 4.6 80.8% > swe-bench 专业版:mythos 77.8% vs opus 4.6 53.4% > swe-bench 多语言版:mythos 87.3% vs opus 4.6 77.8% > swe-bench 多模态版:mythos 59.0% vs opus 4.6 27.1% > terminal-bench 2.0:mythos 82.0% vs opus 4.6 65.4%
i still can't get over this
look at those benchmark results:
swe-bench verified: mythos 93.9% vs opus 4.6 80.8%
swe-bench pro: mythos 77.8% vs opus 4.6 53.4%
swe-bench multilingual: mythos 87.3% vs opus 4.6 77.8%
swe-bench multimodal: mythos 59.0% vs opus 4.6 27.1%
terminal-bench 2.0: mythos 82.0% vs opus 4.6 65.4%