基于物理模拟器强化学习求解物理奥赛题

2026-04-13 08:00·81天前

AI 摘要

研究团队利用物理模拟器生成随机场景与合成问答数据，通过强化学习训练大语言模型，使其掌握物理推理能力。该方法实现了零样本从模拟到现实的迁移，仅在合成数据上训练即可让模型在国际物理奥林匹克（IPhO）问题上提升 5-10 个百分点的准确率。这一突破证明物理模拟器可作为可扩展的数据来源，帮助模型超越互联网问答数据的限制，获得深度物理推理技能。

原文 · 未翻译

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.

HuggingFace Daily Papers（社区热门论文）

导出 Markdown

基于物理模拟器强化学习求解物理奥赛题

2026-04-13 08:00·81天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译