# 3DCodeBench：基于代码的程序化3D建模智能体评测基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-31 08:00
- AIHOT 分数：53
- AIHOT 链接：https://aihot.virxact.com/items/cmpw5g4y1007fsl1u99u6x3lh
- 原文链接：https://arxiv.org/abs/2606.01057

## AI 摘要

本文提出了3DCodeBench，一个系统性基准，用于评估视觉语言模型（VLM）智能体在3D建模软件中通过生成代码进行程序化3D建模的能力。该基准评估了12个先进VLMs将文本和图像参考转换为程序化代码的效果，并建立了基于人类偏好的排名平台3DCodeArena。研究发现，主要失败源于API不匹配，而测试时扩展（如提高思考预算和多轮精炼）能提升性能。研究强调了高质量程序化编码数据和稳健执行环境对推进VLM能力的重要性。该工作公开发布了基准数据集、评估协议与3DCodeArena平台。

## 正文

Procedural 3D modeling through code is emerging as a versatile paradigm, offering deterministic, engine-ready, and precisely editable assets that neural 3D generators inherently lack. Authoring such procedural content, however, demands deep expertise in 3D software APIs, parametric design, and code-level geometric reasoning. In this paper, we propose 3DCodeBench, a systematic benchmark for evaluating vision-language model (VLM) agents for procedural 3D generation in 3D modeling software. Specifically, 3DCodeBench evaluates how effectively 12 advanced VLMs can serve as procedural 3D modelers by translating text and image references into procedural code for 3D modeling software. Recognizing that automated metrics may not fully capture the perceptual quality of 3D shapes, we build 3DCodeArena, a ranking platform based on pairwise human preferences over generated 3D outputs. From extensive evaluations and results, we observe that: (1) Failures mostly arise from API mismatches, while successful renders still suffer from disconnected or floating 3D geometric components. (2) Test-time scaling, such as higher thinking budgets and multi-turn refinement, improves performance overall. Our findings highlight a critical need for high-quality procedural coding data to advance commercial VLMs. Furthermore, effective procedural 3D modeling requires a robust execution environment that provides high-fidelity feedback for iterative refinement. We release 3DCodeBench, including the curated large-scale dataset of multimodal (text/image) prompts, procedural code, 3D object triplets, evaluation protocol, and the public 3DCodeArena platform as a foundational toolkit for exploring VLM-based procedural 3D modelers.
