# EMMA：从多模态数据中提取多个物理参数

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-21 08:00
- AIHOT 分数：51
- AIHOT 链接：https://aihot.virxact.com/items/cmq687vre062esl5io5wal82c
- 原文链接：https://arxiv.org/abs/2605.24047

## AI 摘要

EMMA是一个物理信息多模态框架，能从原始视频、音频和图像时间序列中直接恢复系统所有可识别的动力学参数。它利用Liquid Time-Constant网络从异质模态学习潜在动力学，并通过物理约束损失确保与微分方程一致。在超过100个场景（包括五个标准动力学基准、75段Delfys视频、真实世界轮式机器人和四旋翼系统）中，EMMA实现了稳健的多参数恢复，显著优于现有单模态和方程发现基线。代码与数据已开源。

## 正文

We introduce EMMA, a physics-informed multimodal framework that recovers all identifiable dynamical parameters of a system directly from raw video, audio, and image-based time-series observations. Unlike prior video-only approaches that struggle with occluded states, hidden actuation inputs, or assumptions about known initial conditions and coordinate frames, EMMA performs joint inference of explicit parameters, implicit dynamical components, and calibration invariants within a unified continuous-time model. EMMA leverages a Liquid Time-Constant (LTC) network to learn latent dynamics from heterogeneous modalities while a physics-constrained loss enforces consistency with the governing differential equations. A unified feature pipeline enables consistent alignment across video trajectories, acoustic signatures, and chart-derived measurements, allowing EMMA to estimate parameters under forced, implicit, and multivariate dynamics without requiring segmentation masks, differentiable rendering, or specialized sensors. Across 100+ scenarios including five standard dynamical benchmarks (75 Delfys videos), real-world rover and quadrotor systems with hidden inputs, and simulation-chart case studies spanning biological and chaotic systems, EMMA delivers robust multi-parameter recovery and significantly outperforms existing single-modality and equation-discovery baselines. Our results establish EMMA as a general, scalable solution for physics-consistent model extraction from opportunistic multimodal data. Code and data are available at: https://github.com/ImpactLabASU/EMMA-CVPR2026