RAEv2通过大幅简化架构并提升通用性,在文本到图像(T2I)和世界模型等任务中实现了超过10倍的收敛速度提升,同时改善了重建与生成质量。研究团队在大量实验中发现,强大的表示编码器对像素解码器至关重要。传统评估指标(如FID)已不足以全面衡量模型性能,新的评估指标(如ep@fid-k/fdr^k)揭示了生成模型领域仍存在广阔的研究空间。
check out RAEv2 led by Jas. through extensive exps, we found some really intriguing behaviors showing why strong representation encoders are key for pixel decoders. spoiler: it's not about hillclimbing fid; new metrics like ep@fid-k/fdr^k show there's a lot more left to explore!