# 几何潜推理使LLM生成更短

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-01 08:00
- AIHOT 分数：67
- AIHOT 链接：https://aihot.virxact.com/items/cmpwib79x0368slsnxp1hc372
- 原文链接：https://arxiv.org/abs/2606.02248

## AI 摘要

研究提出几何潜推理方法，将推理建模为模型预训练嵌入空间中的几何路径逼近问题，使用轻量级过渡头预测方向更新。在Qwen3模型上评估发现，该方法能诱导模型生成显著更短的输出，用连续潜步骤替代早期显式推理后，模型常以更少总步数得出正确答案。研究揭示了连续轨迹作为紧凑中间推理状态，暴露了潜计算预算、输出长度与准确率之间的新权衡。

## 正文

Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.