# Code-as-Room：通过智能体代码合成从俯视图生成3D房间

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-18 08:00
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmpc80hvh00ajsl6ko8ir6kgy
- 原文链接：https://arxiv.org/abs/2605.18451

## AI 摘要

本文提出Code-as-Room，一个基于多模态大语言模型的智能体框架，旨在从俯视图参考图像生成精确且稳定的3D室内场景。该框架将房间表示为可执行的Blender代码，通过多阶段流程解析图像中的空间关系，并合成为几何、材质与光照代码。为克服现有多智能体框架的上下文遗忘问题，引入了跨阶段记忆模块。此外，研究还建立了专用的代码式3D房间合成基准测试，实验结果证明了所提执行框架的有效性。

## 正文

Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting in a principled, multi-stage pipeline. A cross-stage memory module is maintained throughout to mitigate context forgetting inherent to existing agent-based frameworks. We further introduce a dedicated benchmark for code-based 3D room synthesis, encompassing various evaluation protocols. Based on our benchmark, comprehensive comparisons against existing agent-based methods are conducted to validate the effectiveness of our proposed execution harness.
