# HP-Edit：面向图像编辑的人类偏好后训练框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-21 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo9rx7qe0496sls2qp8izup8
- 原文链接：https://arxiv.org/abs/2604.19406

## AI 摘要

本文提出HP-Edit图像编辑人类偏好后训练框架，发布涵盖8类真实任务的RealPref-50K数据集。通过预训练视觉大语言模型和少量偏好数据构建HP-Scorer评估器，用于高效扩展偏好数据集并作为奖励函数优化扩散模型。同步推出RealPref-Bench基准。实验表明，该方法显著提升Qwen-Image-Edit-2509等模型编辑质量，使输出更贴合人类偏好。

## 正文

Common image editing tasks typically adopt powerful generative diffusion models as the leading paradigm for real-world content editing. Meanwhile, although reinforcement learning (RL) methods such as Diffusion-DPO and Flow-GRPO have further improved generation quality, efficiently applying Reinforcement Learning from Human Feedback (RLHF) to diffusion-based editing remains largely unexplored, due to a lack of scalable human-preference datasets and frameworks tailored to diverse editing needs. To fill this gap, we propose HP-Edit, a post-training framework for Human Preference-aligned Editing, and introduce RealPref-50K, a real-world dataset across eight common tasks and balancing common object editing. Specifically, HP-Edit leverages a small amount of human-preference scoring data and a pretrained visual large language model (VLM) to develop HP-Scorer--an automatic, human preference-aligned evaluator. We then use HP-Scorer both to efficiently build a scalable preference dataset and to serve as the reward function for post-training the editing model. We also introduce RealPref-Bench, a benchmark for evaluating real-world editing performance. Extensive experiments demonstrate that our approach significantly enhances models such as Qwen-Image-Edit-2509, aligning their outputs more closely with human preference.