# Spreadsheet-RL：通过强化学习提升大语言模型在现实电子表格任务中的智能体能力

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-21 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpgadnmr0dqssljw48b4ucrt
- 原文链接：https://arxiv.org/abs/2605.22642

## AI 摘要

本文提出Spreadsheet-RL，一个用于在真实微软Excel环境中训练专业电子表格智能体的强化学习微调框架。该框架包含从网络论坛自动收集起始-目标电子表格数据对的流水线，并发布了涵盖金融、供应链等领域的Domain-Spreadsheet基准数据集。其核心的Spreadsheet Gym环境通过Python沙箱暴露丰富的Excel功能，并设计了专用工具集与路由规则。实验表明，Spreadsheet-RL显著提升了模型性能：Qwen3-4B模型在SpreadsheetBench上的Pass@1从12.0%提升至23.4%，在Domain-Spreadsheet上从8.4%提升至17.2%，展示了其在电子表格自动化及更广泛数据交互任务中的应用潜力。

## 正文

Spreadsheet systems (e.g., Microsoft Excel, Google Sheets) play a central role in modern data-centric workflows. As AI agents grow increasingly capable of automating complex tasks, such as controlling computers and generating presentations, building an AI-driven spreadsheet agent has emerged as a promising research direction. Most existing spreadsheet agents rely on specialized prompting over general-purpose LLMs; while this design has potentials on simple spreadsheet operations, it struggles to manage the complex, multi-step workflows typical of real-world applications. We introduce Spreadsheet-RL, a reinforcement learning (RL) fine-tuning framework designed to train specialized spreadsheet agents within a realistic Microsoft Excel environment. Spreadsheet-RL features an automated pipeline for scalable collection of paired start-goal spreadsheets from online forums, as well as domain-specific evaluation tasks in areas such as finance and supply chain management, which we compile into the new Domain-Spreadsheet benchmark dataset. It also includes a Spreadsheet Gym environment designed for multi-turn RL: Spreadsheet Gym exposes extensive Excel functionality through a Python sandbox, along with a refined harness that incorporates a comprehensive tool set and carefully designed tool-routing rules for spreadsheet tasks. Through comprehensive experiments, we show that Spreadsheet-RL substantially enhances AI agent's performance on both general and domain-specific spreadsheet tasks: it improves Qwen3-4B-Thinking-2507's Pass@1 on SpreadsheetBench from 12.0% to 23.4%, and raises Pass@1 from 8.4% to 17.2% on our curated Domain-Spreadsheet dataset. These results highlight Spreadsheet-RL's strong potential for generalization and real-world adoption in spreadsheet automation, and broadly, its promise for advancing LLM-based interactions with data interfaces in everyday work.
