# TorchUMM：面向评估、分析与后训练的统一多模态模型代码库

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-12 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmnygovrm003rsl134lk10n3d
- 原文链接：https://arxiv.org/abs/2604.10784

## AI 摘要

研究团队发布 TorchUMM，首个支持统一多模态模型（UMMs）综合评估、分析与后训练的开源代码库。该框架兼容多种架构范式与规模等级的模型，覆盖理解、生成、编辑三大核心任务维度，并整合新旧数据集以系统评估感知、推理、组合性及指令遵循能力。通过提供统一接口和标准化评估协议，TorchUMM 实现了异构模型间的公平可复现比较，助力开发者深入洞察模型优劣，加速统一多模态系统的研发迭代。代码已开源至 GitHub。

## 正文

Recent advances in unified multimodal models (UMMs) have led to a proliferation of architectures capable of understanding, generating, and editing across visual and textual modalities. However, developing a unified framework for UMMs remains challenging due to the diversity of model architectures and the heterogeneity of training paradigms and implementation details. In this paper, we present TorchUMM, the first unified codebase for comprehensive evaluation, analysis, and post-training across diverse UMM backbones, tasks, and datasets. TorchUMM supports a broad spectrum of models covering a wide range of scales and design paradigms. Our benchmark encompasses three core task dimensions: multimodal understanding, generation, and editing, and integrates both established and novel datasets to evaluate perception, reasoning, compositionality, and instruction-following abilities. By providing a unified interface and standardized evaluation protocols, TorchUMM enables fair and reproducible comparisons across heterogeneous models and fosters deeper insights into their strengths and limitations, facilitating the development of more capable unified multimodal systems. Code is available at: https://github.com/AIFrontierLab/TorchUMM.