FID彩票：量化生成式模型评估中的隐藏随机性

2026-06-18 08:00·3天前

AI 摘要

FID是图像生成的事实标准评估指标，但大多数论文仅报告单一种子下的单个数值。本研究将FID视为训练种子和生成种子两个轴上的随机变量，在数百个SiT网络上直接测量方差。发现：重新训练模型使FID变化幅度是固定网络重新采样的3.2倍，差距来自随机初始化、数据顺序和流匹配损失的高斯噪声；增加计算或模型大小几乎不缩小分散度，FID变异系数稳定在1-2%；每格无分类器引导调优使分散度减半，但重新洗牌最优种子。建议：在每格最优引导下评估，将低于~1.3% CoV的FID差距视为无结论，报告多个训练种子的误差条。

原文 · 未翻译

The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model, or merely resample from it? In this paper, we treat FID as a random variable on a two-axis panel of training and generation seeds, and measure its variance directly on several hundred SiT networks trained on class-conditional ImageNet 256x256. We report surprising findings: (a) Retraining the model using the same recipe with a different seed moves FID 3.2x more (in Inception feature space) than redrawing samples from a fixed network. (b) That gap is driven by three factors: random initialisation, data ordering, and the per-step Gaussian noise of the flow-matching loss. (c) Increasing compute or model size barely tightens the spread, holding the FID coefficient of variation (CoV) inside a 1-2% band. (d) Per-cell classifier-free-guidance tuning halves the spread but reshuffles which seeds work best, and a lucky training seed reaches the same FID with up to 2x less compute than an unlucky one. Based on these findings, we recommend a new FID evaluation protocol: evaluate under per-cell optimal guidance, treat any FID gap below the empirically measured ~1.3% CoV as inconclusive, and report an error bar over several training seeds rather than a single FID number.

图像生成论文/研究评测/基准

HuggingFace Daily Papers（社区热门论文）

FID彩票：量化生成式模型评估中的隐藏随机性

2026-06-18 08:00·3天前

AI 摘要

原文 · 保持原样，未翻译

图像生成论文/研究评测/基准

阅读原文arxiv.org