几何感知图像 Flow Matching
阅读原文· arxiv.org研究发现,自然图像的语义信息主要编码在方向分量中,其范数分量可由全局平均近似,表明图像数据本质上可建模于超球面。基于此,论文提出了两种几何感知方法:利用角距离的球形最优传输流匹配(SOT-CFM)与在流形上约束动力学的球形流匹配(SFM)。实验证明,这两种方法性能优于欧几里得基线,为基于黎曼流形的建模与自然图像生成之间建立了联系。
Recent advances in generative models highlight the power of geometry-aware modeling in manifold-constrained settings. Yet, for natural images, the field remains confined to Euclidean assumptions, failing to exploit the potential of intrinsic geometric structures within the data. In this work, we investigate the geometry of natural images and observe that semantic information is predominantly encoded in directional components, while norm components can be approximated by the global average. This property holds across both RGB and latent spaces, suggesting that natural images can be effectively modeled on a hypersphere. Building on this finding, we introduce Spherical Optimal Transport Flow Matching (SOT-CFM), which utilizes angular distance, and Spherical Flow Matching (SFM), which constrains dynamics directly on the manifold. Our experiments demonstrate that these geometry-aware methods achieve superior performance against Euclidean baselines. Ultimately, this work provides a novel perspective that bridges the gap between Riemannian manifold-based modeling and natural image generation.