OneHOI：统一人-物交互生成与编辑

2026-04-15 08:00·79天前

AI 摘要

本文提出OneHOI，一个统一人-物交互（HOI）生成与编辑的扩散Transformer框架，将两项任务整合为基于共享结构化交互表示的条件去噪过程。核心R-DiT通过角色与实例感知HOI token、空间Action Grounding、结构化HOI注意力及HOI RoPE机制，建模动词介导关系并解耦多交互场景。基于HOI-Edit-44K等数据集联合训练，支持布局引导、任意掩码及混合条件控制，在生成与编辑任务上均达SOTA性能。

原文 · 未翻译

Human-Object Interaction (HOI) modelling captures how humans act upon and relate to objects, typically expressed as triplets. Existing approaches split into two disjoint families: HOI generation synthesises scenes from structured triplets and layout, but fails to integrate mixed conditions like HOI and object-only entities; and HOI editing modifies interactions via text, yet struggles to decouple pose from physical contact and scale to multiple interactions. We introduce OneHOI, a unified diffusion transformer framework that consolidates HOI generation and editing into a single conditional denoising process driven by shared structured interaction representations. At its core, the Relational Diffusion Transformer (R-DiT) models verb-mediated relations through role- and instance-aware HOI tokens, layout-based spatial Action Grounding, a Structured HOI Attention to enforce interaction topology, and HOI RoPE to disentangle multi-HOI scenes. Trained jointly with modality dropout on our HOI-Edit-44K, along with HOI and object-centric datasets, OneHOI supports layout-guided, layout-free, arbitrary-mask, and mixed-condition control, achieving state-of-the-art results across both HOI generation and editing. Code is available at https://jiuntian.github.io/OneHOI/.

HuggingFace Daily Papers（社区热门论文）

导出 Markdown

OneHOI：统一人-物交互生成与编辑

2026-04-15 08:00·79天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译