# NF-CoT：基于归一化流的潜在推理框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：68
- AIHOT 链接：https://aihot.virxact.com/items/cmq0bomkt04pfsltrt43xa8py
- 原文链接：https://arxiv.org/abs/2606.06447

## AI 摘要

NF-CoT 在大语言模型骨干内实例化 TARFlow 风格的归一化流，为从显式 CoT 蒸馏的紧凑连续思想定义可处理概率模型。连续思想位置由 NF head 生成，文本位置由同一因果流中的标准 LM head 生成。该设计保留因果自回归生成、概率采样、KV 缓存兼容性和精确似然估计，并支持潜在推理空间的直接策略梯度优化。在代码生成基准上，NF-CoT 相比显式 CoT 和先前潜在推理方法提高了通过率，同时显著降低了中间推理成本。

## 正文

Large language models often improve reasoning by generating explicit chain-of-thought (CoT), demonstrating the importance of intermediate computation. However, textual CoT forces this computation through a discrete, serial, and communication-oriented token stream: each reasoning step must be verbalized before the model can proceed, even when the underlying update is semantic, uncertain, or only partially formed. Latent reasoning offers a higher-bandwidth alternative by performing intermediate computation in compact continuous states before committing to text. Yet existing latent-reasoning methods often sacrifice key advantages that make CoT effective in autoregressive language models, including native left-to-right generation, probabilistic sampling, compatibility with KV-cache decoding, and tractable likelihood estimation. We propose NF-CoT, a latent reasoning framework that preserves these advantages by modeling continuous thoughts with normalizing flows. NF-CoT instantiates a TARFlow-style normalizing flow inside the LLM backbone, defining a tractable probability model over compact continuous thoughts distilled from explicit CoT. Continuous-thought positions are generated by an NF head, while text positions are generated by the standard LM head within the same causal stream. This design provides exact likelihoods for latent thoughts, enables probabilistic left-to-right decoding with the original KV cache, and supports direct policy-gradient optimization in the latent reasoning space. On code-generation benchmarks, NF-CoT improves pass rates over explicit-CoT and prior latent-reasoning baselines while substantially reducing intermediate-reasoning cost.