卢森堡大学与LIH研究揭示,LLM在结构化约束推理中存在关键缺陷。通过最优潮流问题测试发现,各类模型约束满足率停滞于55%-60%,主要瓶颈是无法满足电力系统物理约束方程。研究表明,模型仅学会"解的形状"却未真正执行约束搜索,导致输出看似合理(格式正确、误差小)却物理不可行。监督微调虽改善表面指标,但无法提升物理可行性;强化学习亦效果有限。研究警示:流畅近似不等于约束优化,"看起来合理"是危险标准。
New University of Luxembourg+LIH paper reveals a critical gaps in LLMs' ability to handle structured reasoning under constraints
It checks if LLMs can solve Optimal Power Flow problems end to end, and finds that they mostly cannot do so physically coherently.
Across models and sizes, constraint satisfaction stayed stuck at about 55 to 60 percent.
The interesting result here is not that LLMs miss a hard engineering problem.
It is that they miss it in a very specific way.
Optimal Power Flow is a brutal test of real reasoning because it is not just about getting numbers close to a target, but about satisfying a web of physical constraints at the same time, from generator limits to bus voltages to the power-flow equations themselves.