# 通过判别性文本表征将单步图像生成从类别标签扩展到文本

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-20 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo8db1qc04z1slmlrpain7s2
- 原文链接：https://arxiv.org/abs/2604.18168

## AI 摘要

研究人员针对MeanFlow单步生成框架难以有效整合大语言模型文本编码器的问题，提出采用高判别性文本表征的解决方案。通过适配基于LLM的文本编码器并优化生成流程，首次实现高效的文本条件单步图像合成。实验表明，该方法在主流扩散模型上显著提升了生成性能，突破了原有类别标签条件的局限。相关代码已开源。

## 正文

Few-step generation has been a long-standing goal, with recent one-step generation methods exemplified by MeanFlow achieving remarkable results. Existing research on MeanFlow primarily focuses on class-to-image generation. However, an intuitive yet unexplored direction is to extend the condition from fixed class labels to flexible text inputs, enabling richer content creation. Compared to the limited class labels, text conditions pose greater challenges to the model's understanding capability, necessitating the effective integration of powerful text encoders into the MeanFlow framework. Surprisingly, although incorporating text conditions appears straightforward, we find that integrating powerful LLM-based text encoders using conventional training strategies results in unsatisfactory performance. To uncover the underlying cause, we conduct detailed analyses and reveal that, due to the extremely limited number of refinement steps in the MeanFlow generation, such as only one step, the text feature representations are required to possess sufficiently high discriminability. This also explains why discrete and easily distinguishable class features perform well within the MeanFlow framework. Guided by these insights, we leverage a powerful LLM-based text encoder validated to possess the required semantic properties and adapt the MeanFlow generation process to this framework, resulting in efficient text-conditioned synthesis for the first time. Furthermore, we validate our approach on the widely used diffusion model, demonstrating significant generation performance improvements. We hope this work provides a general and practical reference for future research on text-conditioned MeanFlow generation. The code is available at https://github.com/AMAP-ML/EMF.