# Mistral AI 发布首个面向 Lean 4 的开源代码代理 Leanstral

- 来源：Mistral AI：News（网页）
- 发布时间：2020-12-05 00:00
- AIHOT 分数：86
- AIHOT 链接：https://aihot.virxact.com/items/cmoejssx5005zslnn33wvqd7o
- 原文链接：https://mistral.ai/news/leanstral

## AI 摘要

Mistral AI 发布了首个面向 Lean 4 证明助手的开源代码代理 Leanstral。该代理采用稀疏架构，仅激活 60 亿参数，专为在真实形式化代码库中操作而训练。在 FLTEval 评估中，其表现优异：仅需两次尝试即以 26.3 分超越多个大型开源模型，同时成本效益显著。例如，达到此分数时，Leanstral 成本为 36 美元，远低于 Claude Sonnet 的 549 美元。其权重基于 Apache 2.0 许可开源，并已集成至 Mistral Vibe 平台提供免费 API。

## 正文

Leanstral: Open-Source foundation for trustworthy vibe-coding | Mistral AI

ProductsSolutionsResearchBlogCustomersCompany

Contact Sales

Try Studio Leanstral: Open-Source foundation for trustworthy vibe-coding

Research

First open-source code agent for Lean 4.

Mar 16, 2026

Mistral AI

AI agents have proven to be highly capable tools at code generation. Yet, as we push these models to high-stakes domains, ranging from frontier research mathematics to mission-critical software, we encounter a scaling bottleneck: the human review. The time and specialized expertise required to manually verify become the primary impedance of engineering velocity.

We envision a more helpful generation of coding agents to both carry out their tasks and formally prove their implementations against strict specifications. Instead of debugging machine-generated logic, humans dictate what they want. Today, we are taking the first major step toward that vision. Introducing Leanstral

We release Leanstral, the first open-source code agent designed for Lean 4. Lean4 is a proof assistant capable of expressing complex mathematical objects such as perfectoid spaces and software specifications like properties of Rust fragments. Unlike existing proving systems that act as wrappers around large generalist models or focus on single math problems, Leanstral is designed to be highly efficient (with 6B active parameters) and trained for operating in realistic formal repositories. Open and accessible: We release Leanstral weights under an Apache 2.0 license, in an agent mode within Mistral vibe, and through a free API endpoint. We will also release a tech report detailing our training approach, and a new evaluation suite FLTEval, to move evaluations beyond their focus on competition math. Efficient and mighty: We use a highly sparse architecture for Leanstral, and optimise it for proof engineering tasks. Leveraging parallel inference with Lean as a perfect verifier, Leanstral is both performant and cost-efficient against existing closed-source competitors. Upgradable via MCP: Leanstral supports arbitrary MCPs through vibe, and was specifically trained to achieve maximal performance with the frequently used lean-lsp-mcp. Evaluation

To reflect usefulness in realistic proof engineering scenarios, we benchmark Leanstral for completing all formal proofs and correctly defining new mathematical concepts in each PR to the FLT project, instead of isolated mathematical problems. We compare Leanstral against leading coding agents (Claude Opus 4.6, Sonnet 4.6, Haiku 4.5) and open-source models (Qwen3.5 397B-A17B, Kimi-K2.5 1T-A32B, GLM5 744B-A40B). Leanstral vs. OSS Models

Leanstral-120B-A6B demonstrates a significant efficiency advantage over its much larger open-source peers. While models like GLM5-744B-A40B and Kimi-K2.5-1T-32B struggle to scale, capping their FLTEval scores at approximately 16.6 and 20.1 respectively, Leanstral outperforms them both with just a single pass.

Even Qwen3.5-397B-A17B, the strongest OSS competitor shown, requires 4 passes to reach a score of 25.4. In contrast, Leanstral achieves a superior score of 26.3 with half that investment (pass@2) and continues to scale linearly, reaching 29.3 at the same cost level. Leanstral vs. Claude Family

Leanstral serves as a high-value alternative to the Claude suite, offering competitive performance at a fraction of the price: Leanstral pass@2 reaches a score of 26.3, beating Sonnet by 2.6 points, while costing only $36 to run, compared to Sonnet’s $549. At pass@16, Leanstral reaches a score of 31.9, comfortably beating Sonnet by 8 points. While Claude Opus 4.6 remains the leader in quality, it carries a staggering cost of $1,650, 92 times higher than running Leanstral.

In our benchmarking, we used Mistral Vibe as the scaffold with no modifications specifically for the evaluation.

| Model | Cost ($) | Score | | --- | --- | --- | | Haiku | 184 | 23.0 | | Sonnet | 549 | 23.7 | | Opus | 1,650 | 39.6 | | Leanstral | 18 | 21.9 | | Leanstral pass@2 | 36 | 26.3 | | Leanstral pass@4 | 72 | 29.3 | | Leanstral pass@8 | 145 | 31.0 | | Leanstral pass@16 | 290 | 31.9 | Case studies Answering stackexchange posts about changes in newest Lean version

When breaking changes hit a new Lean release, migrating code can be a massive headache. We fed Leanstral a real-world question from the Proof Assistants Stack Exchange about a script that mysteriously stopped compiling in Lean 4.29.0-rc6 (which we did not train with due to its recency). The culprit was a rewrite (`rw`) tactic that suddenly failed to match patterns involving a simple type alias, initially written as `def T2 := List Bool`.

Instead of taking a stab in the dark, Leanstral rolled up its sleeves. It successfully built test code to recreate the failing environment and diagnosed the underlying issue with definitional equality. The model correctly identified that because def creates a rigid definition requiring explicit unfolding, it was actively blocking the rw tactic from seeing the underlying structure it needed to match.

The fix it proposed was simple: just swap `def` for `abbrev`. Because `abbrev` creates a transparent alias that is immediately definitionally equal to the original type, the `rw` tactic could once again perfectly match the pattern `(L2 n).length` in the proof. Leanstral completes the job and explains the rationale to the user perfectly. Reasoning about programs

We copied over definitions in Rocq from https://www.cs.princeton.edu/courses/archive/fall10/cos441/sf/Imp.html and asked Leanstral to convert to Lean. It did so successfully, even implementing custom notation. Example snippet:

It could also translate to Lean and then prove some properties about programs in this language when just given the Rocq statement (without proof): Demand Proof. Try Leanstral Today.

Leanstral is available today for everyone to use. Zero-Setup in Mistral Vibe: We’ve integrated Leanstral directly into Mistral Vibe for immediate, zero-setup vibe coding and proving. Use `/leanstall` to active. Then to use Leanstral press `Shift+Tab` until the model displays as Leanstral, Alternately, use `vibe --agent lean`. Labs API: Access the model via our free/near-free API endpoint `labs-leanstral-2603`. We are keeping this endpoint highly accessible for a limited period to gather realistic feedback and observability data to fuel the next generation of verified code models. Own the Weights: Download the Apache 2.0 licensed model and run it on your own metal.

Documentation - Sign Up for Mistral Vibe Share this article

[](https://www.linkedin.com/sharing/share-offsite/?url=https://mistral.ai/news/leanstral)[](https://www.facebook.com/sharer/sharer.php?u=https://mistral.ai/news/leanstral)[](https://twitter.com/intent/tweet?url=https://mistral.ai/news/leanstral) More from Mistral AI News Models AI Services The next chapter of AI is yours.

Try le ChatBuild on AI StudioTalk to an expert

[](https://mistral.ai/)

[](https://apps.apple.com/us/app/le-chat-by-mistral-ai/id6740410176)[](https://play.google.com/store/apps/details?id=ai.mistral.chat)

Mistral AI © 2026 Why Mistral

About usOur customersCareersContact us Explore

AI solutionsPartnersResearchDocumentation Build

StudioLe ChatVibeMistral Compute Legal

Terms of servicePrivacy policyPrivacy choicesData processing agreementLegal noticeBrand

en Mistral AI © 2026

[](https://x.com/mistralai)[](https://www.linkedin.com/company/mistralai/)[](https://discord.gg/mistralai)
