MetaphorVU：迈向隐喻视频理解

2026-05-25 08:00·39天前

AI 摘要

为系统评估多模态大语言模型（MLLMs）对隐喻视频的理解能力，研究团队提出了首个专项基准测试 MetaphorVU-Bench。实验发现，当前 MLLMs 在隐喻视频理解上表现不佳，远未达到人类水平，主要缺陷在于跨域映射能力不足。为此，团队构建了一个隐喻知识图谱进行映射增强，并提出了推理时增强框架 MetaphorBoost，该框架实现了性能的持续提升。

原文 · 未翻译

Metaphorical videos are prevalent across various real-world scenarios to convey complex ideas, and understanding them typically requires high-order cognitive capabilities. The lack of systematic studies on metaphorical video understanding not only constrains the real-world applicability of MLLMs but also impedes the thorough assessment of their high-order cognitive capabilities. To bridge this gap, we propose MetaphorVU-Bench, the first systematic and comprehensive benchmark dedicated to metaphorical video understanding. Through experiments, we find current MLLMs struggle with accurate metaphorical video understanding, lagging far behind human level, primarily due to defective cross-domain mapping. Motivated by this finding, we construct a metaphor knowledge graph as mapping augmentation and propose MetaphorBoost, an inference-time enhancement framework achieving consistent performance improvement. Our benchmark, analysis, and method provide useful insights and a foundation for future research on advancing MLLMs.

HuggingFace Daily Papers（社区热门论文）

63导出 Markdown

MetaphorVU：迈向隐喻视频理解

2026-05-25 08:00·39天前

阅读原文· arxiv.org

AI 摘要

原文 · 保持原样，未翻译