SciIR:面向科学图像推理生成的大规模训练数据集与基准
阅读原文· arxiv.org针对文本到图像模型在科学图像中语义对齐与逻辑推理的不足,本文提出SciIR框架,基于皮尔斯符号学三元组,涵盖实体结构、科学过程、科学定律三个维度。创建了SciIR-82k数据集,含超8万高质量科学图像-文本对,来自前沿论文,并引入科学推理思维链Sci-RCoT建模视觉逻辑。评估基准SciIR-Bench使用原子检查表将科学准确性转为可验证细粒度问题。实验表明当前模型推理能力不足;在SciIR-82k上微调的Qwen-Image-SciIR模型将Bench分数从35%提升至43%。
While Text-to-Image (T2I) models have shown remarkable success in generating photorealistic visual content, they still struggle with the rigorous semantic alignment and logical reasoning required for scientific imagery. Inspired by Peirce's Semiotic Triad, we introduce Scientific Image Reasoning (SciIR), a comprehensive resource for training and evaluation of scientific image generation. We formalize scientific reasoning into three core dimensions: Entity Structure (Icon), Scientific Process (Index), and Scientific Law (Symbol). Specifically, to overcome the scarcity of training data in scientific image generation, we elaborately create SciIR-82k, a large-scale dataset containing over 80,000 high-quality scientific image-text pairs from cutting-edge publications. The dataset is hierarchically organized according to the semiotic dimensions and incorporates a Scientific Reasoning Chain-of-Thought (Sci-RCoT) to explicitly model underlying visual logic. For evaluation, we propose SciIR-Bench, which aligns with these three semiotic levels and employs an Atomic Checklist to convert the outcome-oriented scientific accuracy into process-oriented, verifiable, fine-grained questions. Our extensive experiments reveal significant deficiencies in current models' scientific reasoning capabilities. Furthermore, by fine-tuning on the SciIR-82k dataset, we developed the Qwen-Image-SciIR model, which achieves a substantial improvement on the SciIR-Bench, increasing the final score from 35\% to 43\%, laying a solid foundation for future advances in scientific image generation.