# 小而可信：时间序列异常检测的高效视觉语言推理

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：69
- AIHOT 链接：https://aihot.virxact.com/items/cmpr5152v0aymslno91iycx5x
- 原文链接：https://arxiv.org/abs/2605.30344

## AI 摘要

针对大语言模型或多模态模型在时间序列异常检测中表现不佳的问题，研究者构建了高质量基准VisAnomBench。该基准基于公开时间序列数据集，并利用多个大型视觉语言模型的输出进行了增强。基于此，他们开发了参数高效的视觉语言模型VisAnomReasoner，专用于该任务。实验显示，VisAnomReasoner在VisAnomBench上的异常定位更准确，精确度和F1值分别比所有基线模型高出至少21.23和23.87个百分点。在TSB-AD-U基准上的额外实验也验证了其出色的跨基准泛化能力，精确度和F1值分别提升了9.57和13.39个百分点。

## 正文

Recent advances in Vision-Language Models (VLMs) have achieved impressive performance across many tasks, yet prior studies report unsatisfactory performance when applying large language or multimodal models to finding abnormal patterns in sequential data. Public anomaly detection benchmarks typically provide interval annotations but not natural-language rationales, making it difficult to fine-tune VLMs to produce grounded, interpretable decisions. To address this gap, we construct VisAnomBench, a curated benchmark built from public time-series datasets and augmented with high-quality anomaly explanations selected from multiple large VLMs using fine-grained, task-specific rewards. Through fine-tuning on this benchmark, we develop VisAnomReasoner, a parameter-efficient VLM for time-series anomaly detection. Experimental results on VisAnomBench show that VisAnomReasoner achieves more accurate anomaly localization and consistently outperforms all baselines, with improvements of at least 21.23 and 23.87 percentage points in precision and F1, respectively. Additional experiments on the TSB-AD-U benchmark demonstrate strong cross-benchmark generalization, with VisAnomReasoner improving precision and F1 by 9.57 and 13.39 percentage points, respectively.