# 用于训练GAN的跨尺度对齐监督

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-26 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpndm5rx0wn2sl01rw18vdi4
- 原文链接：https://arxiv.org/abs/2605.26449

## AI 摘要

现代GAN常被解读为多阶段粗到细生成，但论文指出，标准的分尺度对抗监督并未构建此层级结构：各阶段输出被独立推向真实分布，导致跨阶段输出可能并非同一生成样本，即存在“跨尺度轨迹错位”问题。为此，论文提出跨尺度对齐Transformer (CAT)，在保持鉴别器分尺度评估的同时，于生成器侧引入一致性正则化，将中间输出与最终输出对齐。在条件ImageNet-256上，CAT-H/2仅训练60周期，一步推理FID-50K达到1.56，优于多个单步GAN及扩散/流模型基线。

## 正文

Modern GANs often introduce adversarial supervision on intermediate generator outputs and interpret the resulting multi-stage synthesis as coarse-to-fine hierarchical generation. In this work, we challenge this interpretation. We argue that standard scale-wise adversarial supervision does not construct a proper coarse-to-fine hierarchy: each intermediate image is independently pushed toward the real distribution at its own resolution, but this scale-wise realism does not ensure that outputs across stages represent the identical generated sample. Moreover, the scale-specific image produced at each stage is not used as an explicit refinement target for the subsequent stage. Therefore, its adversarial loss can improve a scale-specific output without constraining later stages to preserve the same sample trajectory, allowing them to move toward a different sample rather than refine the previous output. We refer to this problem as a cross-scale trajectory misalignment problem. To resolve it, we propose CAT, a Cross-scale Aligned Transformer for multi-scale adversarial generation. CAT keeps the discriminator scale-wise, so each intermediate output is evaluated at its own resolution, while adding a simple generator-side consistency regularization that aligns intermediate outputs with the final output. On class-conditional ImageNet-256, CAT-H/2 achieves an FID-50K of 1.56 with one-step inference after only 60 training epochs, outperforming strong one-step GAN and diffusion/flow baselines.
