联合图像-特征扩散中的协同进化表示
阅读原文· arxiv.org针对联合图像-特征扩散模型中语义表示空间固定不变的问题,CoReDi 框架通过协同进化机制,在训练过程中联合优化轻量级线性投影与扩散模型,动态调整表示空间以适应生成任务。该方法结合停止梯度目标、归一化和针对性正则化防止特征崩溃,增强了语义特征与图像潜变量的互补性。在 VAE 潜变量扩散和像素空间扩散的实验表明,相比固定表示空间的方法,CoReDi 实现了更快的收敛速度和更高的样本质量。
Joint image-feature generative modeling has recently emerged as an effective strategy for improving diffusion training by coupling low-level VAE latents with high-level semantic features extracted from pre-trained visual encoders. However, existing approaches rely on a fixed representation space, constructed independently of the generative objective and kept unchanged during training. We argue that the representation space guiding diffusion should itself adapt to the generative task. To this end, we propose Coevolving Representation Diffusion (CoReDi), a framework in which the semantic representation space evolves during training by learning a lightweight linear projection jointly with the diffusion model. While naively optimizing this projection leads to degenerate solutions, we show that stable coevolution can be achieved through a combination of stop-gradient targets, normalization, and targeted regularization that prevents feature collapse. This formulation enables the semantic space to progressively specialize to the needs of image synthesis, improving its complementarity with image latents. We apply CoReDi to both VAE latent diffusion and pixel-space diffusion, demonstrating that adaptive semantic representations improve generative modeling across both settings. Experiments show that CoReDi achieves faster convergence and higher sample quality compared to joint diffusion models operating in fixed representation spaces.