超越提示:面向分布外形状的无条件3D反演
阅读原文· arxiv.org当前最先进的文本到3D生成模型存在"潜在汇点陷阱":模型在特定区域对文本提示修改不敏感,导致无法通过改变输入文本来调整输出几何。这并非几何表达能力不足,而是对分布外文本指导的敏感性缺失。研究提出利用模型无条件生成先验,将几何表示与语言敏感性解耦以绕过该陷阱,实现了对分布外3D形状的高保真语义编辑,突破了现有3D流水线的局限性。
Text-driven inversion of generative models is a core paradigm for manipulating 2D or 3D content, unlocking numerous applications such as text-based editing, style transfer, or inverse problems. However, it relies on the assumption that generative models remain sensitive to natural language prompts. We demonstrate that for state-of-the-art native text-to-3D generative models, this assumption often collapses. We identify a critical failure mode where generation trajectories are drawn into latent ``sink traps'': regions where the model becomes insensitive to prompt modifications. In these regimes, changes to the input text fail to alter internal representations in a way that alters the output geometry. Crucially, we observe that this is not a limitation of the model's geometric expressivity; the same generative models possess the ability to produce a vast diversity of shapes but, as we demonstrate, become insensitive to out-of-distribution text guidance. We investigate this behavior by analyzing the sampling trajectories of the generative model, and find that complex geometries can still be represented and produced by leveraging the model's unconditional generative prior. This leads to a more robust framework for text-based 3D shape editing that bypasses latent sinks by decoupling a model's geometric representation power from its linguistic sensitivity. Our approach addresses the limitations of current 3D pipelines and enables high-fidelity semantic manipulation of out-of-distribution 3D shapes. Project webpage: https://daidedou.sorpi.fr/publication/beyondprompts