# CRONOS：视频模型反事实物理一致性基准测试

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-22 08:00
- AIHOT 分数：54
- AIHOT 链接：https://aihot.virxact.com/items/cmpmfbf3y0o7bsl012yzle784
- 原文链接：https://arxiv.org/abs/2605.23699

## AI 摘要

CRONOS是一个基于干预的基准测试，旨在评估视频生成模型的反事实物理一致性，即模型能否对输入的受控视觉变化（如视角、场景、物体类别和外观）做出恰当响应。它在高保真度的 Unreal Engine 环境中构建，能够系统地对上述四个因素进行干预，同时保持物理事件（如碰撞、遮挡）不变。对近期开源视频生成模型的评估表明，它们在该测试上存在显著失败：同一物理事件的预测质量会受到物体外观、环境，尤其是视角变化的严重影响。该基准提供了一个可控且可复现的测试环境，用于诊断模型在不同干预条件下生成视频质量的变化。

## 正文

Video prediction is increasingly viewed as a path toward generalizable world models, yet it remains unclear whether these systems learn underlying causal structure or merely exploit superficial visual correlations for future prediction. We introduce CRONOS, an intervention-based benchmark designed to evaluate counterfactual physical consistency: whether a model's predictions of physical events respond appropriately to controlled changes in the visual input, such as variations of scene context, viewpoint, object appearance, and object category. Built in a photorealistic Unreal Engine environment, CRONOS enables controlled, high-fidelity generation of videos across diverse scenes and dynamics. In contrast to previous benchmarks, CRONOS systematically intervenes on four key factors - viewpoint, scene, object category, and object appearance - while keeping the underlying physical event type, such as a collision, occlusion, or fall, fixed. Our evaluation of recent open-source video generators reveals substantial failures in counterfactual physical consistency: prediction quality for the same physical event type is affected by appearance, environment, and, particularly by viewpoint changes. CRONOS provides a controlled and reproducible testbed for diagnosing how the quality of generated videos changes for different interventions, establishing a concrete target for developing models that perform consistently across changes of multiple conditions. The dataset and code are available at our project page.
