Thinking Before Constraining: A Unified Decoding Framework for Large Language Models
阅读原文· arxiv.org本文提出一种名为 In-Writing 的混合解码框架,旨在解决大语言模型自由生成推理丰富但缺乏结构,与约束解码格式统一但可能过早限制推理能力的矛盾。该框架在单次调用中结合二者:模型首先进行无约束推理,待生成特定触发词后再应用结构化解码,从而将推理与格式化明确分离。此方法能有效消除“过早触发”这一失败模式。在涵盖分类与推理任务的多项数据集评估中,In-Writing 相比自然生成,准确率最高提升了27%,性能优于现有方法。相关代码已开源:https://github.com/Nokia-Bell-Labs/InWriting。
Natural generation allows Large Language Models (LLMs) to produce free-form responses with rich reasoning, yet the lack of structure makes outputs difficult to verify. Conversely, constrained decoding ensures standardized formats but can inadvertently restrict reasoning capabilities by imposing constraints too early in the generation process. We propose a hybrid approach, namely In-Writing, that combines free-form reasoning and structured generation in a single call. The model first performs unconstrained reasoning and only applies structured decoding after a trigger token is generated, explicitly decoupling reasoning from formatting. We establish that our trigger-token strategies are able to virtually eradicate premature triggering, a failure mode in which constrained decoding interrupts on-going reasoning. Evaluations across diverse datasets covering classification and reasoning tasks demonstrate that our approach outperforms the state-of-the-art by achieving accuracy gains of up to 27% over natural generation. Our code are available at: https://github.com/Nokia-Bell-Labs/InWriting.