PixVerve:推进原生超高清图像生成至100MP
阅读原文· arxiv.org本文介绍了PixVerve-95K,一个高质量、开源的超高清(UHR)文生图数据集,包含95K张图像(每张至少100M像素)及七维注释。基于此,研究团队探索了三种训练方案,成功将现有文生图基础模型扩展至原生100MP图像生成。同时,提出了PixVerve-Bench评估基准,全面评估UHR图像的视觉质量与语义对齐。实验与探索为该领域的未来突破提供了关键见解与实用策略。
Text-to-Image (T2I) models have recently seen notable progress around 1K and 2K resolution. With the extreme desire for better visual experience and the rapid development of imaging technology, the demand for Ultra-High-Resolution (UHR) image generation has grown significantly. However, UHR image generation poses great challenges due to the scarcity and complexity of high-resolution content. In this paper, we first introduce PixVerve-95K, a high-quality, open-source UHR T2I dataset curated with a carefully designed data pipeline, which contains 95K images across diverse scenarios (each image has a minimum pixel-count of 100M) and seven-dimensional annotations. Based on our large-scale image-text dataset, we take a pioneering step to extend various T2I foundation models to native 100MP generation with three training schemes. Finally, leveraging both conventional metrics and multimodal large language model-based assessments, our proposed PixVerve-Bench benchmark establishes a comprehensive evaluation protocol for UHR images encompassing visual quality and semantic alignment. Extensive experimental results on our benchmark and the constructive exploration of training strategies collaboratively provide valuable insights for future breakthroughs.