SEGA：基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法

2026-05-21 08:00·43天前

AI 摘要

研究团队提出一种名为 SEGA 的无训练方法，用于解决扩散 Transformer 在生成超出训练分辨率图像时性能下降的问题。该方法根据去噪过程中潜变量的空间-频谱结构，对旋转位置编码的不同频率分量进行动态、自适应的注意力缩放，从而在提升图像全局结构连贯性的同时，更好地恢复细节保真度。实验表明，SEGA 在多种目标分辨率上均能稳定提升高分辨率图像合成质量，优于当前最先进的无训练基线方法。

原文 · 未翻译

Diffusion transformers (DiTs) have emerged as a dominant architecture for text-to-image generation, yet their performance drops when generating at resolutions beyond their training range. Existing training-free approaches mitigate this by modifying inference-time attention behavior, often through Rotary Position Embeddings (RoPE) extrapolation combined with attention scaling. However, these strategies apply a uniform and content-agnostic scaling across RoPE components with distinct frequency characteristics, inducing a trade-off between preserving global structure and recovering fine detail. We introduce SEGA, a training-free method that dynamically scales attention across RoPE components according to the latent's spatial-frequency structure at each denoising step. This adaptive scaling improves both structural coherence and fine-detail fidelity. Experiments show that SEGA consistently improves high-resolution synthesis across multiple target resolutions, outperforming state-of-the-art training-free baselines.

HuggingFace Daily Papers（社区热门论文）

58导出 Markdown

SEGA：基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法

2026-05-21 08:00·43天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译

SEGA： 基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法

SEGA： 基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法

SEGA：基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法

SEGA：基于频谱-能量引导注意力的扩散 Transformer 分辨率外推方法