GenClaw：代码驱动的智能体图像生成

2026-05-28 08:00·36天前

AI 摘要

GenClaw提出一种代码驱动的智能体图像生成范式，让AI智能体像人类艺术家一样分步创作：先通过搜索与推理构建概念，再利用SVG、HTML、Three.js等代码渲染可执行的视觉草图，最后调用图像生成模型补充纹理、材质与真实感。该范式将代码作为连接语言推理与像素合成的可控中间画布，将图像生成从黑盒过程转变为类似人类创作的分步流程，迈向更高可控性与可解释性的视觉生成系统。

原文 · 未翻译

Image generation models have evolved from text-conditioned pixel synthesis toward multimodal agents endowed with visual comprehension and tool invocation capabilities. Yet, existing agents remain at the mercy of underlying black-box image models. Their workflow is trapped in a repetitive cycle of prompt rewriting for generation refinement, leaving them with no mechanism to directly manipulate the canvas. In essence, the potential of LLMs to serve as a genuine "brush" for precise visual construction remains largely untapped. In this paper, we propose GenClaw, a code-driven agentic image generation paradigm that empowers the agent to create like a human artist: first conceptualizing, then sketching, and finally coloring. Specifically, the agent first constructs the conceptual knowledge and context through search and reasoning. It then utilizes code (e.g., SVG, HTML, Three.js) to render executable visual sketches. Finally, it employs an image generation model to supplement textures, materials, and photorealism. In this workflow, code serves as a controllable intermediate canvas bridging linguistic reasoning and pixel synthesis, seamlessly integrating programmatic logic with the visual expressiveness of generative models. By transforming image generation from a black-box paradigm into a staged process akin to authentic human creation, GenClaw offers a step toward for highly controllable and interpretable visual generation systems.

HuggingFace Daily Papers（社区热门论文）

67导出 Markdown

GenClaw：代码驱动的智能体图像生成

2026-05-28 08:00·36天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译