# BloomBench： 基于认知的英-阿双语多模态基准

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmq5qxfd301jtsl5ioyok4dem
- 原文链接：https://arxiv.org/abs/2606.05531

## AI 摘要

研究团队提出 BloomBench（Almieyar 基准系列的一部分），首个基于人类认知的英-阿双语多模态基准。以布鲁姆分类学为框架，系统评估视觉语言模型在记忆、理解、应用、分析、评价、创造六个认知层级的表现。采用半自动化流程构建和分层混合质量保证协议，确保可扩展性与文化包容性。对现有 SOTA 模型的测试揭示：语义理解能力强，但事实回忆和创造性合成严重不足；阿拉伯语与英语之间存在显著性能差距。基准框架与数据集已开源。

## 正文

Despite the rapid progress of Vision-Language Models (VLMs), the field lacks benchmarks that rigorously diagnose their true reasoning abilities and chart meaningful progress toward human-like multimodal intelligence. Most existing evaluations focus on piecemeal or disconnected tasks, obscuring critical cognitive weaknesses and providing little insight for targeted improvement. To address this gap, we introduce BloomBench, part of the Almieyar benchmarking series, the first cognitively human-grounded, bilingual (English-Arabic) multimodal benchmark for VLMs. Grounded in Bloom's Taxonomy, BloomBench systematically evaluates six levels of cognition (Remember, Understand, Apply, Analyze, Evaluate, Create) through carefully designed image-question-answer tasks. Built with a semi-automated pipeline and validated through a stratified hybrid quality assurance protocol, it ensures scalability, cultural inclusivity, and linguistic fidelity. Leveraging this framework, we conduct a comprehensive study of state-of-the-art VLMs to diagnose their cognitive profiles. Our analysis reveals a sharp cognitive asymmetry: while state-of-the-art models achieve strong performance ceilings in semantic understanding, they struggle substantially with factual recall and creative synthesis. This demonstrates that current general multimodal proficiency masks deeper limitations in specific cognitive layers. Furthermore, our study highlights a critical performance gap between Arabic and English, exposing limitations in current cross-lingual multimodal reasoning. These findings establish a foundation for developing more cognitively aligned and inclusive VLMs. The benchmark framework and dataset is available at: https://github.com/qcri/Almieyar-Oryx-BloomBench.