# 基于泰勒级数的时间突变帧选择算法

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-21 08:00
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmpgadnms0dqusljwmxxo9qkb
- 原文链接：https://arxiv.org/abs/2605.22678

## AI 摘要

该研究提出Swift Sampling，一种免训练的视频帧选择算法。其灵感源自人脑的预测编码机制，将视频建模为视觉潜在空间中的可微轨迹，计算特征的速度与加速度，并通过泰勒展开预测后续帧的预期路径。算法识别出大幅偏离预测轨迹的帧，即“时间信息突变帧”，作为包含关键信息的帧进行采样。该方法极其轻量，仅增加0.02倍计算开销，比主流方法低30倍。在长视频问答的多个基准测试中，它均优于均匀采样等方法，在帧预算有限时尤为有效，准确率最高可提升12.5个百分点。

## 正文

While most frames in long-form video are redundant, the critical information resides in temporal surprises: moments where the actual visual features deviate from their predicted evolution. Inspired by the human brain's predictive coding, we introduce Swift Sampling, an elegant, training-free frame selection algorithm that automatically identifies high-information moments in a video. Specifically, we model a video as a differentiable trajectory in the visual latent space and compute the velocity and acceleration of its features. Then, we apply Taylor expansion to project the expected path of subsequent frames. Frames that diverge sharply from this predicted manifold are identified as temporally surprising frames and selected for sampling. Unlike prior training-free methods that rely on auxiliary networks or video-specific hyperparameter tuning, Swift Sampling is incredibly lightweight, adding only 0.02x additional computational cost over baseline making it 30x cheaper overhead than leading baselines. Across three long-video question answering benchmarks and 10 different downstream tasks, Swift Sampling outperforms uniform sampling and prior query-agnostic baselines. It is especially powerful for long videos with limited frame budgets improving accuracy by up to +12.5 points.