具有内省能力的扩散语言模型

2026-04-14 21:13·79天前·zagwdt

AI 摘要

Introspective Diffusion Language Models（内省扩散语言模型）正式发布，该架构在传统扩散模型基础上引入内省机制，支持生成过程中的自我评估与优化。项目技术细节与代码已开源至introspective-diffusion.github.io。该研究成果在Hacker News技术社区获得100点关注度，于2026年4月14日公开。

原文 · 未翻译

Introspective DiffusionLanguage Models

Abstract

Diffusion language models (DLMs) offer a compelling promise: parallel token generation could break the sequential bottleneck of autoregressive (AR) decoding. Yet in practice, DLMs consistently lag behind AR models in quality.

We argue that this gap stems from a fundamental failure of introspective consistency: AR models agree with what they generate, whereas DLMs often do not. We introduce the Introspective Diffusion Language Model (I-DLM), which uses introspective strided decoding (ISD) to verify previously generated tokens while advancing new ones in the same forward pass.

Empirically, I-DLM-8B is the first DLM to match the quality of its same-scale AR counterpart, outperforming LLaDA-2.1-mini (16B) by +26 on AIME-24 and +15 on LiveCodeBench-v6 with half the parameters, while delivering 2.9-4.1x throughput at high concurrency. With gated LoRA, ISD enables bit-for-bit lossless acceleration.

Why Introspective Consistency?

We identify three fundamental bottlenecks in current DLMs:

The I-DLM Method

Introspective-Consistency Training

Convert pretrained AR models via causal attention, logit shift, and an all-masked objective.

Introspective Strided Decoding

Generate N tokens per forward pass while verifying prior tokens via the p/q acceptance criterion.

AR-Compatible Serving

Strict causal attention enables direct integration into SGLang with no custom infrastructure.

Results

I-DLM is the first DLM to match same-scale AR quality while surpassing all prior DLMs across 15 benchmarks.

End-to-End Quality

Blue = best non-AR 1 means parallel decoding actually saves total compute vs. AR. This is why I-DLM's throughput scales with concurrency while SDAR and LLaDA plateau in the throughput figure above.

Per-Position Acceptance Breakdown

Acceptance compounds geometrically: position k has probability $p^{k-1}$. Position 1 is always accepted (logit shift).

Documentation & Resources

Everything you need to train, serve, and deploy I-DLM. Click any card to expand.

Setup, dependencies, and environment

Run I-DLM inference in 5 minutes

Introspective-consistency training recipe

Strided decoding algorithm and config

Hacker News 热门（buzzing.cc 中文翻译）

导出 Markdown

具有内省能力的扩散语言模型

2026-04-14 21:13·79天前·zagwdt

阅读原文· introspective-diffusion.github.io

AI 摘要

原文 · 保持原样，未翻译

Introspective DiffusionLanguage Models

Abstract