Introspective Diffusion Language Models(内省扩散语言模型)正式发布,该架构在传统扩散模型基础上引入内省机制,支持生成过程中的自我评估与优化。项目技术细节与代码已开源至introspective-diffusion.github.io。该研究成果在Hacker News技术社区获得100点关注度,于2026年4月14日公开。
原文 · 未翻译
Introspective DiffusionLanguage Models
Abstract
Diffusion language models (DLMs) offer a compelling promise: parallel token generation could break the sequential bottleneck of autoregressive (AR) decoding. Yet in practice, DLMs consistently lag behind AR models in quality.
We argue that this gap stems from a fundamental failure of introspective consistency: AR models agree with what they generate, whereas DLMs often do not. We introduce the Introspective Diffusion Language Model (I-DLM), which uses introspective strided decoding (ISD) to verify previously generated tokens while advancing new ones in the same forward pass.
Empirically, I-DLM-8B is the first DLM to match the quality of its same-scale AR counterpart, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 with half the parameters, while delivering 2.9-4.1x throughput at high concurrency. With gated LoRA, ISD enables bit-for-bit lossless acceleration.
Why Introspective Consistency?
We identify three fundamental bottlenecks in current DLMs:
The I-DLM Method
Introspective-Consistency Training
Convert pretrained AR models via causal attention, logit shift, and an all-masked objective.
Introspective Strided Decoding
Generate N tokens per forward pass while verifying prior tokens via the p/q acceptance criterion.
AR-Compatible Serving
Strict causal attention enables direct integration into SGLang with no custom infrastructure.
Results
I-DLM is the first DLM to match same-scale AR quality while surpassing all prior DLMs across 15 benchmarks.
End-to-End Quality
Blue = best non-AR 1 means parallel decoding actually saves total compute vs. AR. This is why I-DLM's throughput scales with concurrency while SDAR and LLaDA plateau in the throughput figure above.
Per-Position Acceptance Breakdown
Acceptance compounds geometrically: position k has probability $p^{k-1}$. Position 1 is always accepted (logit shift).
Documentation & Resources
Everything you need to train, serve, and deploy I-DLM. Click any card to expand.
Introspective Diffusion Language Models(内省扩散语言模型)正式发布,该架构在传统扩散模型基础上引入内省机制,支持生成过程中的自我评估与优化。项目技术细节与代码已开源至introspective-diffusion.github.io。该研究成果在Hacker News技术社区获得100点关注度,于2026年4月14日公开。
原文 · 保持原样,未翻译
Introspective DiffusionLanguage Models
Abstract
Diffusion language models (DLMs) offer a compelling promise: parallel token generation could break the sequential bottleneck of autoregressive (AR) decoding. Yet in practice, DLMs consistently lag behind AR models in quality.
We argue that this gap stems from a fundamental failure of introspective consistency: AR models agree with what they generate, whereas DLMs often do not. We introduce the Introspective Diffusion Language Model (I-DLM), which uses introspective strided decoding (ISD) to verify previously generated tokens while advancing new ones in the same forward pass.
Empirically, I-DLM-8B is the first DLM to match the quality of its same-scale AR counterpart, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 with half the parameters, while delivering 2.9-4.1x throughput at high concurrency. With gated LoRA, ISD enables bit-for-bit lossless acceleration.
Production deployment with SGLang
Gated LoRA for bit-for-bit output
Available models and weights
Reproduce our evaluations
Installation
git clone https://github.com/Introspective-Diffusion/I-DLM.git cd I-DLM/inference bash install.sh
See inference/README.md for detailed environment setup.
@article{yu2026introspective, title={Introspective Diffusion Language Models}, author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon and Dao, Tri and Athiwaratkun, Ben and Zou, James and Lai, Fan and Xu, Chenfeng}, journal={arXiv preprint arXiv:2604.11035}, year={2026} }
Why Introspective Consistency?
We identify three fundamental bottlenecks in current DLMs:
The I-DLM Method
Introspective-Consistency Training
Convert pretrained AR models via causal attention, logit shift, and an all-masked objective.
Introspective Strided Decoding
Generate N tokens per forward pass while verifying prior tokens via the p/q acceptance criterion.
AR-Compatible Serving
Strict causal attention enables direct integration into SGLang with no custom infrastructure.
Results
I-DLM is the first DLM to match same-scale AR quality while surpassing all prior DLMs across 15 benchmarks.
End-to-End Quality
Blue = best non-AR 1 means parallel decoding actually saves total compute vs. AR. This is why I-DLM's throughput scales with concurrency while SDAR and LLaDA plateau in the throughput figure above.
Per-Position Acceptance Breakdown
Acceptance compounds geometrically: position k has probability $p^{k-1}$. Position 1 is always accepted (logit shift).
Documentation & Resources
Everything you need to train, serve, and deploy I-DLM. Click any card to expand.
Setup, dependencies, and environment
Run I-DLM inference in 5 minutes
Introspective-consistency training recipe
Strided decoding algorithm and config
Production deployment with SGLang
Gated LoRA for bit-for-bit output
Available models and weights
Reproduce our evaluations
Installation
git clone https://github.com/Introspective-Diffusion/I-DLM.git cd I-DLM/inference bash install.sh
See inference/README.md for detailed environment setup.
@article{yu2026introspective, title={Introspective Diffusion Language Models}, author={Yu, Yifan and Jian, Yuqing and Wang, Junxiong and Zhou, Zhongzhu and Zhuang, Donglin and Fang, Xinyu and Yanamandra, Sri and Wu, Xiaoxia and Wu, Qingyang and Song, Shuaiwen Leon and Dao, Tri and Athiwaratkun, Ben and Zou, James and Lai, Fan and Xu, Chenfeng}, journal={arXiv preprint arXiv:2604.11035}, year={2026} }