Hao AI Lab@haoailab

2025-07-25 03:11·343天前

AI 摘要

[Lmgame Bench] 🧐 Kimi-k2-0711-preview 在数学、编程和工具使用智能体基准测试中表现出色。但我们发现，对于像 Kimi-k2 这样的非推理模型，游戏环境仍然是一个挑战，在 Lmgame Bench 上，它在我们排行榜评估的所有19个模型中仅排名第18。

【Lmgame Bench】 🧐 Kimi-k2-0711-preview shows stellar performance on math， coding and tool-using agentic benchmarks. But we found gaming environments still serves as a challenge for non-reasoning models like Kimi-k2， on Lmgame Bench， it ranks only #18 out of all 19 models we evaluated on our leaderboard.

智能体推理评测/基准

在 X 查看原推导出 Markdown

Hao AI Lab@haoailab · X

导出 Markdown

2025-07-25 03:11·343天前

在 X 看原推· x.com

AI 摘要

智能体推理评测/基准