# AutoScientists：用于长期科学实验的自组织智能体团队

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-27 08:00
- AIHOT 分数：63
- AIHOT 链接：https://aihot.virxact.com/items/cmpovigyt09hyslv4tieo4vs4
- 原文链接：https://arxiv.org/abs/2605.28655

## AI 摘要

AutoScientists 是一个用于长期计算科学实验的去中心化 AI 智能体团队。智能体通过解读共享实验状态，围绕有前景的假设自组织成团队，在使用计算资源前审查提案，并共享成功与失败经验以减少冗余探索。该系统在生物医学机器学习、语言模型训练优化和蛋白质适应性预测三个领域，于匹配预算下均优于先前 AI 智能体。具体而言，其在 BioML-Bench 24 个任务上的平均排行榜百分位达 74.4%，比之前最强 AI 智能体提升 +8.33%；在 GPT 训练优化中达到目标的速度是 Autoresearch 的 1.9 倍，并发现了 7 项有效改进；在 ProteinGym 适应性预测中，其发现的一个 ACE2-Spike 结合方法使 Spearman 相关性比当前 SOTA 模型提升 +12.5%。

## 正文

Scientific research proceeds through iterative cycles of hypothesis generation, experiment design, execution, and revision. AI agents can automate parts of this process, but existing approaches typically follow a single research trajectory or coordinate through a central planner with fixed objectives. As a result, they struggle to sustain parallel exploration, adapt as experimental evidence changes, or preserve knowledge of failed directions over long-running experiments. We introduce AutoScientists, a decentralized team of AI agents for long-running computational scientific experimentation. Agents interpret a shared experimental state, self-organize into teams around promising hypotheses, critique proposals before using experimental compute, and share successes and failures to reduce redundant exploration. Under matched experimental budgets, AutoScientists improves over prior AI agents across biomedical machine learning, language-model training optimization, and protein fitness prediction. On BioML-Bench, spanning biomedical imaging, protein engineering, single-cell omics, and drug discovery, AutoScientists achieves a mean leaderboard percentile of 74.4% across 24 tasks, improving over the strongest AI agent by +8.33%. On GPT training optimization, AutoScientists reaches a target validation bits-per-byte 1.9x faster than Autoresearch and continues discovering improvements from a starting champion where the single-agent approach finds none (7 vs. 0 accepted improvements). On ProteinGym fitness prediction, AutoScientists discovers a method for ACE2-Spike binding that improves over the current state-of-the-art model by +12.5% in Spearman correlation. Applied without modification across all 217 ProteinGym assays, the same method improves over the prior state of the art by +6.5% (Spearman correlation).
