[Lmgame Bench] 🧐 Kimi-k2-0711-preview 在数学、编程和工具使用智能体基准测试中表现出色。但我们发现,对于像 Kimi-k2 这样的非推理模型,游戏环境仍然是一个挑战,在 Lmgame Bench 上,它在我们排行榜评估的所有19个模型中仅排名第18。
【Lmgame Bench】 🧐 Kimi-k2-0711-preview shows stellar performance on math, coding and tool-using agentic benchmarks. But we found gaming environments still serves as a challenge for non-reasoning models like Kimi-k2, on Lmgame Bench, it ranks only #18 out of all 19 models we evaluated on our leaderboard.