# AutoResearch AI：面向科学发现的AI驱动科研自动化

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-22 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmpm0ai190kfasl01d1mhf5g2
- 原文链接：https://arxiv.org/abs/2605.23204

## AI 摘要

本综述探讨AI系统如何将科学研究从提供孤立辅助，推向覆盖文献综述、假设生成、实验、验证和报告等环节的工作流自动化。提出了“AutoResearch”概念，即AI驱动的科研自动化发展谱系，其中“Vibe Research”代表人类主导的提示词辅助与验证阶段，而新兴的AI主导系统则试图协调更多发现环节，但尚未实现稳健自主性。当前系统在自主性、领域覆盖和验证机制上仍显碎片化，并面临证据保存、可复现性等挑战。文章围绕文献基础、假设形成、实验工具使用、反馈验证和报告交流五个工作流条件展开分析，并提出了从新颖性、有效性、影响、可靠性和溯源五个维度进行评估的框架。

## 正文

Scientific research is being reshaped by AI systems that move beyond isolated assistance toward longer-horizon workflows spanning literature grounding, hypothesis generation, experimentation, validation, reporting, and revision. This shift marks a transition from task-level AI for science to workflow-level research automation. Yet current systems remain fragmented, differing in autonomy, domain scope, execution environment, validation mechanism, and human oversight, while still struggling with evidence preservation, reproducibility, weak-direction rejection, provenance tracking, cross-domain robustness, and accountable scientific closure. This survey examines these developments through AutoResearch, defined as the developmental spectrum of AI-powered scientific workflow automation. Within it, Vibe Research denotes the human-steered region of prompt-based assistance and human-verified execution, whereas emerging AI-led systems coordinate larger portions of the discovery loop without achieving robust autonomy. We analyze how research systems redistribute control, evidence, execution, validation, and accountability across workflows and organize the field around five workflow conditions: literature and research grounding; hypothesis formation and planning; experimentation and tool use; feedback, validation, and review; and reporting and knowledge communication. We further synthesize AI scientist systems, mixed-initiative co-research frameworks, benchmarks, domain deployments, and open-source infrastructures. Finally, we propose five evaluation dimensions--novelty, validity, impact, reliability, and provenance--and show that AutoResearch autonomy is domain-conditioned, being more credible in structured, executable, and rapidly verifiable settings but limited in embodied, delayed, heterogeneous, ethical, or institutionally accountable contexts.