# SpheRoPE：基于球形RoPE的零样本无优化360度全景生成

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-30 08:00
- AIHOT 分数：44
- AIHOT 链接：https://aihot.virxact.com/items/cmr2j2gvl08hgsl8zyx5z10o5
- 原文链接：https://arxiv.org/abs/2606.32033

## AI 摘要

提出SpheRoPE框架，无需微调或优化，直接通过球形旋转位置编码（Spherical RoPE）将球面先验注入预训练扩散Transformer，实现零样本、无训练的360度全景图像与视频生成。低频率通道重参数化为3D笛卡尔坐标以编码球面流形，高频率通道进行谐波量化确保严格周期性，配合语义畸变无分类器引导（CFG）显式控制几何结构。在Flux.1、Flux.2和LTX-Video骨干上完成文生全景任务，性能达基线水平，无需任何训练。

## 正文

We present a zero-shot, training-free and optimization-free framework for generating 360 panoramic images and videos by directly injecting spherical priors into pre-trained diffusion transformers. Existing methods either rely on costly fine-tuning on scarce panoramic data that limits generalization, or leverage multi-step optimization that incurs prohibitive inference latency. We observe that contemporary generative models natively exhibit some panoramic priors from large-scale training. However, these emergent capabilities are insufficient, as the models fundamentally fail to satisfy the rigorous topological constraints imposed by equirectangular projection (ERP). We introduce a zero-shot and optimization-free approach that resolves these constraints at inference time. Spherical RoPE replaces standard rotary position embeddings: low-frequency channels are re-parameterized as 3D Cartesian coordinates to natively encode the spherical manifold, while high-frequency channels are harmonically quantized to enforce exact periodicity. Coupled with complementary Semantic Distortion classifier-free guidance (CFG) that explicitly steers geometry, we avoid retraining and inherit the full creative breadth of state-of-the-art models. Our approach generalizes across diverse backbones and 360 generation modalities. We demonstrate this across text-to-panorama using Flux.1, Flux.2, and LTX-Video backbones, achieving competitive performance against baselines, all while remaining training-free. Project page: https://orhir.github.io/SpheRoPE