学习扩散语言模型的去掩码策略

2026-07-02 08:00·23小时前

AI 摘要

研究人员提出使用强化学习训练扩散语言模型（dLLM）的采样策略。该方法将掩码扩散采样形式化为马尔可夫决策过程，以 dLLM 为环境，采用单层 Transformer 策略网络将 token 置信度映射为去掩码决策。实验表明，在半自回归（块）生成中，该策略匹配了最先进的启发式方法，在全扩散设置中则超越后者。

原文 · 未翻译

research area Methods and Algorithms, research area Speech and Natural Language Processingconference ICML

content type paperpublished July 2026

Learning Unmasking Policies for Diffusion Language Models

AuthorsMetod Jazbec*†, Theo X. Olausson*‡, Louis Béthune, Pierre Ablin, Michael Kirchhof, João Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi

View publication

Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger block sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy based on a single-layer transformer that maps dLLM token confidences to unmasking decisions. Our experiments show that these trained policies match the performance of state-of-the-art heuristics when combined with semi-autoregressive (block) generation, while outperforming them in the full-diffusion setting.

* Equal Contributors
† University of Amsterdam
‡ Massachusetts Institute of Technology
** Work done while at Apple

Related readings and updates.

Residual Context Diffusion Language Models

July 2, 2026research area Speech and Natural Language Processingconference ICML

Diffusion Large Language Models (dLLMs) have emerged as a promising alternative to purely autoregressive language models because they can decode multiple tokens in parallel. However, state-of-the-art block-wise dLLMs rely on a “remasking” mechanism that decodes only the most confident tokens and discards the rest, effectively wasting computation. We demonstrate that recycling computation from the discarded tokens is beneficial, as these tokens…

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

January 21, 2026research area Speech and Natural Language Processingconference ICLR

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are particularly useful for code generation. However, current training and inference mechanisms for dLLMs in coding are still under-explored. To demystify the decoding behavior of dLLMs and unlock their potential for coding,…

Discover opportunities in Machine Learning.

Our research in machine learning breaks new ground every day.

Work with us

Apple Machine Learning Research（RSS）

46导出 Markdown