Google 新论文提出“验证债务”概念:AI 加快论文产出,但人工核查成为瓶颈。为此推出智能体验证(agentic verification)方案,并开发 Paper Assistant Tool 原型系统。该系统将论文拆解为多个部分,深入检查难点并汇总审稿意见,聚焦证明错误、实验漏洞、缺失对比等客观错误,而非直接给出接收/拒稿决策。在数学与计算机科学已知错误测试中,该工具比单次模型调用发现更多证明错误;在 STOC 和 ICML 的面向作者试点中,许多作者据此修复了严重理论缺陷或补充了实验。论文指出科学审稿可能需要独立 AI 栈以应对日益自动化的论文生成。
Big new paper release of Google for external agentic verification for science.
Science now needs AI review agents because AI is making papers faster than humans can check them.
The problem is that AI can help produce more research, but the slow part is still checking whether the work is actually correct.
The paper frames this as verification debt, where every faster research workflow creates more claims, proofs, experiments, and comparisons that someone still has to inspect.
Its main proposal is agentic verification, where AI agents help review papers by splitting them into parts, checking difficult sections deeply, and combining the findings into a review.