# Chat2Workflow：面向自然语言生成可执行可视化工作流的基准测试

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-21 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo9h6xfu02h6sls2uria4abm
- 原文链接：https://arxiv.org/abs/2604.19667

## AI 摘要

研究团队提出Chat2Workflow基准测试，用于评估大语言模型从自然语言直接生成可执行可视化工作流的能力。该基准基于真实业务场景构建，所生成的工作流可直接部署至Dify、Coze等工业平台。实验表明，当前SOTA模型虽能理解高层意图，但在复杂需求下难以生成稳定可执行的流程；团队提出的代理框架虽将错误解决率提升5.34%，但距离工业级自动化仍有显著差距。代码已开源。

## 正文

At present, executable visual workflows have emerged as a mainstream paradigm in real-world industrial deployments, offering strong reliability and controllability. However, in current practice, such workflows are almost entirely constructed through manual engineering: developers must carefully design workflows, write prompts for each step, and repeatedly revise the logic as requirements evolve-making development costly, time-consuming, and error-prone. To study whether large language models can automate this multi-round interaction process, we introduce Chat2Workflow, a benchmark for generating executable visual workflows directly from natural language, and propose a robust agentic framework to mitigate recurrent execution errors. Chat2Workflow is built from a large collection of real-world business workflows, with each instance designed so that the generated workflow can be transformed and directly deployed to practical workflow platforms such as Dify and Coze. Experimental results show that while state-of-the-art language models can often capture high-level intent, they struggle to generate correct, stable, and executable workflows, especially under complex or changing requirements. Although our agentic framework yields up to 5.34% resolve rate gains, the remaining real-world gap positions Chat2Workflow as a foundation for advancing industrial-grade automation. Code is available at https://github.com/zjunlp/Chat2Workflow.
