# AgentOdyssey：用于测试时持续学习智能体的开放式长周期文本游戏生成

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-29 08:00
- AIHOT 分数：43
- AIHOT 链接：https://aihot.virxact.com/items/cmqzjr3n5007cslqvjgeclmpl
- 原文链接：https://arxiv.org/abs/2606.24893

## AI 摘要

AgentOdyssey 是一个程序化生成开放式文本游戏的评估框架，用于衡量智能体在测试时的持续学习能力。游戏包含丰富实体、世界动态和长周期任务，要求智能体在部署中交替进行学习与推理。评估体系不仅跟踪游戏进度，还诊断世界知识获取、情景记忆、探索多样性及模型成本。实验显示，即使最强基础模型驱动的智能体也远低于人类水平，而短期记忆对多种智能体范式有显著提升作用。

## 正文

For agents to learn continuously from interaction with the world at test time, they must be able to explore effectively, acquire new world knowledge and skills, retain relevant episodic experiences, and plan over long horizons. To evaluate these key abilities of test-time continual learning agents, we introduce AgentOdyssey, a novel evaluation framework that procedurally generates open-ended text games with rich entities, world dynamics, and long-horizon tasks. Critically, AgentOdyssey goes beyond the conventional machine learning assumption that learning does not occur at test time by placing agents in a continuous, long-horizon setting that interleaves learning and inference throughout deployment. We further propose a multifaceted evaluation methodology that measures not only game progress but also offers diagnostic tests on world knowledge acquisition, episodic memory, object and action exploration, action diversity, and model cost. We evaluate diverse agent paradigms in the generated games. Our experimental results reveal critical limits in agents' key abilities, as well as factors that influence their meaningful horizon. Although performance scales with stronger base models, even the top agent remains far below human performance, leaving substantial headroom for improvement. Among agent mechanisms, we find that short-term memory benefits multiple agent paradigms and is an important component of agent test-time training.