斯坦福论文论证,在相等推理令牌预算下,单个LLM解决多跳问题通常比多代理系统更有效。核心在于单代理能保持完整的内部思维链,而多代理需将思维分割为消息传递与交接,每次交接都压缩信息并导致丢失,这以数据处理不等式为形式化解释。实验在多个模型和数据集上验证,预算匹配时单代理表现等同或优于多种多代理设置。多代理的常见增益可能源于额外计算或评估偏差,而非架构优势。论文建议,多跳推理应默认从强单代理开始,仅当单代理上下文受干扰退化时,才将多代理结构作为修复策略使用。
New Stanford paper argues that, under equal reasoning budgets, one LLM usually solves multi-hop problems better than many coordinated ones.
The core point is almost embarrassingly simple.
A single agent keeps the whole problem in one internal chain of thought, while a multi-agent system has to slice that chain into messages, summaries, and handoffs.
Every handoff is a compression step.
And once reasoning is compressed, some information is easier to drop than to recover, which is why the paper leans on the Data Processing Inequality as a formal explanation rather than just an empirical hunch.