领域特定潜在表征提升扩散模型医学图像超分辨率保真度
阅读原文· arxiv.org医学图像超分辨率模型多沿用自然图像设计的通用变分自编码器(VAE),研究发现这是重建质量的主要瓶颈。在控制实验中,将在160万张医学图像上预训练的MedVAE替换Stable Diffusion VAE后,膝关节MRI、脑部MRI和胸部X光的PSNR提升2.91至3.29 dB(p<10^{-20}),优势集中于高频解剖细节。消融实验证实性能差距稳定且幻觉率无显著差异。自编码器重建质量可预测下游性能(R²=0.67),表明领域特定VAE的选择应优先于扩散架构优化。
Latent diffusion models for medical image super-resolution universally inherit variational autoencoders designed for natural photographs. We show that this default choice, not the diffusion architecture, is the dominant constraint on reconstruction quality. In a controlled experiment holding all other pipeline components fixed, replacing the generic Stable Diffusion VAE with MedVAE, a domain-specific autoencoder pretrained on more than 1.6 million medical images, yields +2.91 to +3.29 dB PSNR improvement across knee MRI, brain MRI, and chest X-ray (n = 1,820; Cohen's d = 1.37 to 1.86, all p < 10^{-20}, Wilcoxon signed-rank). Wavelet decomposition localises the advantage to the finest spatial frequency bands encoding anatomically relevant fine structure. Ablations across inference schedules, prediction targets, and generative architectures confirm the gap is stable within plus or minus 0.15 dB, while hallucination rates remain comparable between methods (Cohen's h < 0.02 across all datasets), establishing that reconstruction fidelity and generative hallucination are governed by independent pipeline components. These results provide a practical screening criterion: autoencoder reconstruction quality, measurable without diffusion training, predicts downstream SR performance (R^2 = 0.67), suggesting that domain-specific VAE selection should precede diffusion architecture search. Code and trained model weights are publicly available at https://github.com/sebasmos/latent-sr.