# ByG：无需配对数据的流匹配图像/视频编辑框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-02 08:00
- AIHOT 分数：60
- AIHOT 链接：https://aihot.virxact.com/items/cmpy6e0l20262slaxc2p8pzxc
- 原文链接：https://arxiv.org/abs/2606.03911

## AI 摘要

提出 Bootstrap Your Generator (ByG) 框架，用于非配对训练流匹配图像/视频编辑模型。方法从冻结的基础模型中提取指令遵循线索，结合循环一致性保持结构；通过梯度路由将下游损失反向传播到噪声训练状态，弥合训练-推理差距。在数据稀缺的图像和视频编辑任务上达到 SOTA，泛化到未见领域，性能优于用百万级配对样本训练的监督基线。实验表明，从基础模型提取的语义线索提供了稳健的训练信号，无需外部奖励模型。

## 正文

Modern generative models possess a deep understanding of visual content, yet training them for image editing typically requires massive datasets of paired examples. This limits scalability, especially for video editing where collecting paired data is prohibitively expensive. We propose Bootstrap Your Generator (ByG), a general framework for unpaired training of flow matching editing models. It leverages the base model's knowledge without any external signal. Our approach pairs instruction-following cues extracted from the frozen model with cycle-consistency for structure preservation. To make this tractable, we propose to route gradients from downstream losses over clean predictions to noisy training states. We demonstrate state-of-the-art results on challenging data-scarce image and video editing scenarios. Extensive evaluations and user studies show that our method effectively generalizes to unseen domains and outperforms supervised baselines trained on millions of samples. Analysis reveals that our gradient routing bridges the train-inference gap, and extracting semantic cues from a base model provides a robust training signal that obviates the need for external reward models.
