Multi-LCB：将LiveCodeBench扩展到多种编程语言

2026-06-18 08:00·3天前

AI 摘要

Multi-LCB 是一个新基准，将 LiveCodeBench（LCB）从 Python 扩展到 12 种编程语言，保持原有污染控制和评估协议，并自动跟踪 LCB 的未来更新。对 24 个 LLM 的指令遵循与推理能力评估揭示了 Python 过拟合、语言特定污染以及多语言性能的显著差异，直接暴露了当前 LLM 在多语言代码生成上的关键短板。

原文 · 未翻译

LiveCodeBench (LCB) has recently become a widely adopted benchmark for evaluating large language models (LLMs) on code-generation tasks. By curating competitive programming problems, constantly adding fresh problems to the set, and filtering them by release dates, LCB provides contamination-aware evaluation and offers a holistic view of coding capability. However, LCB remains restricted to Python, leaving open the question of whether LLMs can generalize across the diverse programming languages required in real-world software engineering. We introduce Multi-LCB, a benchmark for evaluating LLMs across twelve programming languages, including Python. Multi-LCB transforms Python tasks from the LCB dataset into equivalent tasks in other languages while preserving LCB's contamination controls and evaluation protocol. Because it is fully compatible with the original LCB format, Multi-LCB will automatically track future LCB updates, enabling systematic assessment of cross-language code generation competence and requiring models to sustain performance well beyond Python. We evaluated 24 LLMs for instruction and reasoning on Multi-LCB, uncovering evidence of Python overfitting, language-specific contamination, and substantial disparities in multilingual performance. Our results establish Multi-LCB as a rigorous new benchmark for multi-programming-language code evaluation, directly addressing LCB's primary limitation and exposing critical gaps in current LLM capabilities.

arXiv编码论文/研究评测/基准

HuggingFace Daily Papers（社区热门论文）

Multi-LCB：将LiveCodeBench扩展到多种编程语言

2026-06-18 08:00·3天前

AI 摘要

原文 · 保持原样，未翻译

arXiv编码论文/研究评测/基准

阅读原文

Multi-LCB： 将LiveCodeBench扩展到多种编程语言

Multi-LCB： 将LiveCodeBench扩展到多种编程语言

Multi-LCB：将LiveCodeBench扩展到多种编程语言

Multi-LCB：将LiveCodeBench扩展到多种编程语言