# FVSpec：真实世界基于属性测试的Lean挑战

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-31 08:00
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmqffbx8r00yosl2ag5m7m9hu
- 原文链接：https://arxiv.org/abs/2606.01008

## AI 摘要

从真实世界Python仓库抓取11039个基于属性测试（PBT），自动将其中的2772个翻译为9415个带`sorry`占位符的Lean 4规范（平均每个PBT约3个形式化）。翻译采用三智能体LLM流水线，需建模Python语义、推断命令式PBT中的逻辑属性并处理依赖类型编程。所有抓取代码、智能体代码及数据均已开源，为AI辅助真实软件形式化验证提供基准。

## 正文

We present a benchmark for evaluating AI models and agents on real-world formal software verification tasks. We first scrape 11,039 property-based tests (PBTs) from real-world Python repositories, then automatically translate 2,772 of them (25%) into 9,415 Lean 4 specifications with sorry placeholders (about 3 formalizations/PBT; we retain multiple attempts when none dominates on quality metrics). Translating PBTs into Lean specifications is challenging: it requires modeling Python semantics in Lean, inferring the logical property encoded in an imperative PBT, and handling the inherent difficulties of dependently-typed programming in a seldom-used language. We describe a three-agent LLM pipeline for transpiling PBTs into Lean specifications, evaluate coverage and quality metrics, and provide baselines for proof generation using several automated and model based approaches. All code (scraper and agents) and data (PBTs and Lean specifications) are open source. Our benchmark aims to drive progress on the underexplored problem of AI-assisted formal verification of real-world software, which is of increasing interest as AI produces more and more of the world's code.
