# 面向高效视觉推理的自适应推理路径学习

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-04-16 08:00
- AIHOT 链接：https://aihot.virxact.com/items/cmo6ofjdx052ssl4rryk6q74s
- 原文链接：https://arxiv.org/abs/2604.14568

## AI 摘要

研究团队提出自适应视觉推理框架 AVR，将推理过程分解为视觉感知、逻辑推理和答案应用三个认知功能，使模型能根据问题难度动态选择完整推理、仅感知或直接答案三种输出格式。该框架采用改进的 FS-GRPO 算法训练，在确保准确性的同时鼓励选择最高效推理路径。实验显示，AVR 在多个视觉语言基准测试中将 token 使用量降低 50–90%，同时保持整体准确率，有效缓解了视觉推理模型的"过度思考"问题。

## 正文

Visual reasoning models (VRMs) have recently shown strong cross-modal reasoning capabilities by integrating visual perception with language reasoning. However, they often suffer from overthinking, producing unnecessarily long reasoning chains for any tasks. We attribute this issue to Reasoning Path Redundancy in visual reasoning: many visual questions do not require the full reasoning process. To address this, we propose AVR, an adaptive visual reasoning framework that decomposes visual reasoning into three cognitive functions: visual perception, logical reasoning, and answer application. It further enables models to dynamically choose among three response formats: Full Format, Perception-Only Format, and Direct Answer. AVR is trained with FS-GRPO, an adaptation of Group Relative Policy Optimization that encourages the model to select the most efficient reasoning format while preserving correctness. Experiments on multiple vision-language benchmarks show that AVR reduces token usage by 50--90\% while maintaining overall accuracy, especially in perception-intensive tasks. These results demonstrate that adaptive visual reasoning can effectively mitigate overthinking in VRMs. Code and data are available at: https://github.com/RunRiotComeOn/AVR.