# SciOrch：训练轻量8B模型编排专家LLM解决前沿科学推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-14 23:45
- AIHOT 分数：42
- AIHOT 链接：https://aihot.virxact.com/items/cmqj4xfyn000nslmhttvtib2d
- 原文链接：https://arxiv.org/abs/2606.15872

## AI 摘要

SciOrch框架训练一个轻量8B模型，用于编排多款前沿大语言模型进行科学推理。它通过API调用将问题分解、委托给商业模型并合成最终答案，训练采用基于MCTS的轨迹生成与GRPO风格优化。在240题测试集（SGI-Reasoning与Scientists' First Exam）上，SciOrch达到56.66%平均准确率，超过最强单个商用模型3.74%，超过最强多智能体基线3.33%，同时API成本不到多智能体方法的一半。

## 正文

Frontier scientific reasoning remains a major challenge for large language models (LLMs), where even the strongest commercial systems fall short of expert-level performance. A closer look at model behavior reveals substantial complementarity that single-model evaluation hides: different frontier models excel on different question types, and no single model captures the full picture. We present SciOrch, a framework that trains a lightweight 8B model to orchestrate frontier LLMs for scientific reasoning. The orchestrator decomposes each question, delegates sub-problems to selected commercial models through API calls, and synthesizes a final answer. Training such an orchestrator is fundamentally harder than conventional agentic RL: each action triggers an API call that is expensive in both dollar cost and latency, making standard online rollouts infeasible. We address this with MCTS-based approach, producing diverse orchestration trajectories, extracting per-node single-turn samples, and optimizing the orchestrator with GRPO-style training. On a 240-question test set spanning SGI-Reasoning and Scientists' First Exam, SciOrch reaches 56.66% average accuracy, outperforming the strongest single commercial model by 3.74% and the strongest multi-agent baseline by 3.33%. It also attains the best accuracy on both SGI and SFE with less than half the API cost of typical multi-agent methods.