# Google LEAP 框架提升通用 LLM 形式化数学证明性能至 70%

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-06-05 06:09
- AIHOT 分数：70
- AIHOT 链接：https://aihot.virxact.com/items/cmq02cili028usltrs03rag7g
- 原文链接：https://x.com/rohanpaul_ai/status/2062658115603218493

## AI 摘要

Google 新论文 LEAP 提出智能体框架，通过规划证明、分解子目标、复用已有引理并利用 Lean 验证器反馈，将通用 LLM 在形式化数学证明上的性能从不到 10% 提升至 70%。传统单次完整证明在长难题上表现极差，而 LEAP 将证明存储为有向图结构，先规划再逐步验证。在 Putnam 2025 竞赛中，LEAP 成功解出全部 12 道题；在包含 60 道 IMO 风格题目的 Lean 基准测试中，也实现了上述性能跃升。

## 正文

Another great paper from Google.

Shows general LLMs can solve formal math by planning proofs and checking each step. Raised general LLM performance from under 10% to 70%.

A general LLM failed badly when asked to write full formal proofs in 1 try， but became much stronger when it planned， split the work into smaller claims， reused past claims， and learned from Lean's feedback.

The paper shows the weakness was not just the model's math ability， but the way it was being used - the absence of structured interaction with a verifier.

The key idea is that the model does not try to write one giant perfect proof at once， because that usually fails on long and tricky problems.

Instead， LEAP stores the proof as a graph of goals and subgoals， so useful lemmas can be reused instead of rediscovered every time.

The authors tested LEAP on Putnam 2025 and a new Lean benchmark built from 60 IMO-style problems， where ordinary one-shot proof writing did very poorly.

LEAP solved all 12 Putnam 2025 problems and raised general LLM performance on the Lean IMO benchmark from under 10% to 70%.

----

Link - arxiv. org/abs/2606.03303

Title： "LEAP： Supercharging LLMs for Formal Mathematics with Agentic Frameworks"
