# QuanBench+：面向LLM量子代码生成的统一多框架基准测试

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-03-25 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnygovrl0036sl13na2s1hf2
- 原文链接：https://arxiv.org/abs/2604.08570

## AI 摘要

研究团队发布QuanBench+基准测试，首次统一评估LLM在Qiskit、PennyLane和Cirq三大量子计算框架上的代码生成能力。该基准包含42个涵盖量子算法、门分解和态制备的对齐任务。测试显示，模型一次性通过率最高达59.5%（Qiskit）、54.8%（Cirq）和42.9%（PennyLane），经反馈修复后分别提升至83.3%、76.2%和66.7%。结果表明，当前LLM仍高度依赖框架特定知识，跨框架量子代码生成的可靠性问题尚未解决。

## 正文

Large Language Models (LLMs) are increasingly used for code generation, yet quantum code generation is still evaluated mostly within single frameworks, making it difficult to separate quantum reasoning from framework familiarity. We introduce QuanBench+, a unified benchmark spanning Qiskit, PennyLane, and Cirq, with 42 aligned tasks covering quantum algorithms, gate decomposition, and state preparation. We evaluate models with executable functional tests, report Pass@1 and Pass@5, and use KL-divergence-based acceptance for probabilistic outputs. We additionally study Pass@1 after feedback-based repair, where a model may revise code after a runtime error or wrong answer. Across frameworks, the strongest one-shot scores reach 59.5% in Qiskit, 54.8% in Cirq, and 42.9% in PennyLane; with feedback-based repair, the best scores rise to 83.3%, 76.2%, and 66.7%, respectively. These results show clear progress, but also that reliable multi-framework quantum code generation remains unsolved and still depends strongly on framework-specific knowledge.
