用于训练GAN的跨尺度对齐监督
阅读原文· arxiv.org现代GAN常被解读为多阶段粗到细生成,但论文指出,标准的分尺度对抗监督并未构建此层级结构:各阶段输出被独立推向真实分布,导致跨阶段输出可能并非同一生成样本,即存在“跨尺度轨迹错位”问题。为此,论文提出跨尺度对齐Transformer (CAT),在保持鉴别器分尺度评估的同时,于生成器侧引入一致性正则化,将中间输出与最终输出对齐。在条件ImageNet-256上,CAT-H/2仅训练60周期,一步推理FID-50K达到1.56,优于多个单步GAN及扩散/流模型基线。
Modern GANs often introduce adversarial supervision on intermediate generator outputs and interpret the resulting multi-stage synthesis as coarse-to-fine hierarchical generation. In this work, we challenge this interpretation. We argue that standard scale-wise adversarial supervision does not construct a proper coarse-to-fine hierarchy: each intermediate image is independently pushed toward the real distribution at its own resolution, but this scale-wise realism does not ensure that outputs across stages represent the identical generated sample. Moreover, the scale-specific image produced at each stage is not used as an explicit refinement target for the subsequent stage. Therefore, its adversarial loss can improve a scale-specific output without constraining later stages to preserve the same sample trajectory, allowing them to move toward a different sample rather than refine the previous output. We refer to this problem as a cross-scale trajectory misalignment problem. To resolve it, we propose CAT, a Cross-scale Aligned Transformer for multi-scale adversarial generation. CAT keeps the discriminator scale-wise, so each intermediate output is evaluated at its own resolution, while adding a simple generator-side consistency regularization that aligns intermediate outputs with the final output. On class-conditional ImageNet-256, CAT-H/2 achieves an FID-50K of 1.56 with one-step inference after only 60 training epochs, outperforming strong one-step GAN and diffusion/flow baselines.