# MM-WebAgent：用于网页生成的分层多模态网页代理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-16 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo2bd9a6022uslbazzfgp3aj
- 原文链接：https://arxiv.org/abs/2604.15309

## AI 摘要

MM-WebAgent 是一个面向多模态网页生成的分层代理框架，通过分层规划与迭代自我反思协调 AIGC 元素生成，解决直接集成 AIGC 工具导致的风格不一致与全局连贯性问题。该框架联合优化全局布局、局部多模态内容及其集成，并配套推出多模态网页生成基准与多级评估协议。实验表明，MM-WebAgent 在多模态元素生成与集成方面优于代码生成及代理基线方法。

## 正文

The rapid progress of Artificial Intelligence Generated Content (AIGC) tools enables images, videos, and visualizations to be created on demand for webpage design, offering a flexible and increasingly adopted paradigm for modern UI/UX. However, directly integrating such tools into automated webpage generation often leads to style inconsistency and poor global coherence, as elements are generated in isolation. We propose MM-WebAgent, a hierarchical agentic framework for multimodal webpage generation that coordinates AIGC-based element generation through hierarchical planning and iterative self-reflection. MM-WebAgent jointly optimizes global layout, local multimodal content, and their integration, producing coherent and visually consistent webpages. We further introduce a benchmark for multimodal webpage generation and a multi-level evaluation protocol for systematic assessment. Experiments demonstrate that MM-WebAgent outperforms code-generation and agent-based baselines, especially on multimodal element generation and integration. Code & Data: https://aka.ms/mm-webagent.
