# 电子表格下一步操作预测评估基准与框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-11 08:00
- AIHOT 分数：38
- AIHOT 链接：https://aihot.virxact.com/items/cmqjhshag03g9slmhvkfdtzxv
- 原文链接：https://arxiv.org/abs/2606.13802

## AI 摘要

针对电子表格中预测用户后续操作的功能缺失，该研究提出新的评估基准。手动从公开语料库整理52个操作序列（共计12K条操作），通过参数化启发式与LLM精炼生成。在线评估方法在每个用户操作后要求模型进行预测，接受或拒绝预测结果，接受则更新后续操作，直至目标电子表格达成。基线预测器涵盖零样本LLM、微调SLM与经典模型。实验分析了已保存操作与假阳性、效率、用户画像、触发条件和上下文等关键属性。

## 正文

Predictive code completion greatly accelerates how quickly developers work. In spreadsheets, despite being much more common, such auto-completion features are virtually non-existent. To address this gap, we introduce a benchmark for systems that observe a sequence of user actions in a spreadsheet and predict future actions. Two challenges are (1) the absence of edit histories in public spreadsheet corpora and (2) the complex space of spreadsheet actions (spatial, temporal, composite). To address (1), we manually curate 52 sequences of 12K actions that recreate spreadsheets from public corpora, seeded by parametrized heuristics and LLM refinement. To address (2), we propose an online evaluation that expects a prediction after each user action, accepts or rejects that prediction, updates the future actions upon acceptance, and repeats this until the target spreadsheet is obtained. We use multiple baseline predictors (including zero-shot LLMs, fine-tuned SLMs, and classical models) and analyze different properties that our benchmark teaches us, including but not limited to: properties of saved actions and false positives, efficiency, effect of user profiles, effect of triggers, and effect of context.