莱比锡的基准测试（Benchmarks in Leipzig）

2026-06-07 04:11·26天前·root-parent

AI 摘要

一篇题为“Benchmarks in Leipzig”的学术论文于2026年6月6日发布在 arXiv 上，并在 Hacker News 上获得 101 个点赞。该论文关注莱比锡相关的基准测试研究，但其具体方法、数据集及结果未在当前摘要页面中详述。该条目来自 buzzing.cc 对 Hacker News 热门帖子的中文翻译，提供了原文链接（arXiv）及 HN 讨论页。

原文 · 未翻译

Mathematics > History and Overview

Title:Benchmarks in Leipzig

Abstract:Between April 1 and May 15, 2026, a group of 49 mathematicians compiled a dataset of research-level mathematics questions with known answers. Most of the work was done during the 3-day workshop *Benchmarks in Leipzig* with 35 participants at the Max Planck Institute for Mathematics in the Sciences in Leipzig, Germany. We present the resulting collection of 100 questions. We evaluated these questions in three stages: a single attempt by five state-of-the-art LLMs, followed by a 20-runs-per-model evaluation with three of these models, and finally a 3-run attempt with two heavy-thinking models. After Stage 1, 41 questions remained completely unsolved; after Stage 2, this count dropped to 16; and we concluded Stage 3 with only 2 unsolved questions. This demonstrates that the mathematical reasoning capabilities of LLMs are becoming impressive.

Comments: 8 pages including 8 benchmark statistics tables + 20 pages appendix containing the 100 Leipzig Benchmark questions Subjects: History and Overview (math.HO); Artificial Intelligence (cs.AI); Algebraic Geometry (math.AG); Combinatorics (math.CO); Representation Theory (math.RT) Cite as: arXiv:2606.05818 [math.HO] (or arXiv:2606.05818v1 [math.HO] for this version) https://doi.org/10.48550/arXiv.2606.05818 Focus to learn more arXiv-issued DOI via DataCite (pending registration)

Submission history

Access Paper:

View PDF

HTML (experimental)

TeX Source

Current browse context:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

Author

Venue

Institution

Topic

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Hacker News 热门（buzzing.cc 中文翻译）

48导出 Markdown

莱比锡的基准测试（Benchmarks in Leipzig）

2026-06-07 04:11·26天前·root-parent

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

Mathematics > History and Overview

Title:Benchmarks in Leipzig

莱比锡的基准测试 （Benchmarks in Leipzig）

莱比锡的基准测试 （Benchmarks in Leipzig）

莱比锡的基准测试（Benchmarks in Leipzig）

莱比锡的基准测试（Benchmarks in Leipzig）