Tmax：终端智能体的简单配方

2026-06-22 08:00·11天前

AI 摘要

Tmax是当前最强的开源终端智能体RL训练配方。仅9B参数即在下游基准Terminal-Bench 2.0上达到27%准确率，超越此前更大模型。研究团队利用难度控制、角色和验证器多样化策略生成数据，并开源了比此前任何已发布终端智能体数据集大2.5倍以上的数据集。基于该数据，使用简单的结果驱动RL训练开放权重模型。代码、数据和模型均已开源。

原文 · 未翻译

Terminal-using agents have quickly become the most popular downstream application of language models (LMs). Despite their prevalence, relatively little academic work has examined RL-based training of these models, likely due to difficult benchmarks, a lack of data, and a lack of simple baseline recipes. We present Tmax, the strongest open RL recipe for terminal agents to date, bringing open data recipes closer to the frontier. While simple, our recipe achieves 27\% on Terminal-Bench 2.0 with only 9B parameters, outperforming much larger models from prior work. Concretely, we generate data using a novel taxonomy, combining difficulty control, personas, and verifier diversification, which allows us to cheaply generate large amounts of terminal environments for RL and SFT training. We open-source our terminal dataset, which is over 2.5x larger than previously released terminal-agent datasets. We then train open-weight models using RL with our data, using a simple, outcome-only recipe. We release our data, models, and code as a strong baseline for future open academic work on terminal agents at https://github.com/hamishivi/tmax.

HuggingFace Daily Papers（社区热门论文）

55导出 Markdown

Tmax：终端智能体的简单配方

2026-06-22 08:00·11天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

Tmax： 终端智能体的简单配方

Tmax： 终端智能体的简单配方

Tmax：终端智能体的简单配方

Tmax：终端智能体的简单配方