Qwen@Alibaba_Qwen

2026-05-22 18:04·41天前

AI 摘要

近期一项针对前沿AI模型在真实智能体任务上的测试显示，Qwen 3.7-Max在效果与成本方面全面领先。该任务要求模型自主编写并迭代优化一个能自我训练的Tetris机器人程序。在10轮自我改进中，Qwen 3.7-Max仅花费1.32美元，便将机器人性能提升了56%。相比之下，Claude Opus 4.7花费12.15美元提升了28%，GPT-5.5花费2.85美元提升了7%。结果表明，在需要长时间自主推理、代码阅读与迭代的复杂智能体循环场景中，Qwen Max具备极强的成本效益比与自我改进能力。

👀👀

atomic.chatQwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each m...

智能体推理评测/基准

在 X 查看原推导出 Markdown

Qwen@Alibaba_Qwen · X

66导出 Markdown

2026-05-22 18:04·41天前

在 X 看原推· x.com

AI 摘要

👀👀

atomic.chatQwen 3.7-max beats Opus 4.7 and GPT-5.5 We tested three frontier models on a real agentic task: write a Tetris bot that plays the game and trains itself. Each m...

智能体推理评测/基准