# FlowTracer：追踪注意力诱导信息流的大语言模型强化学习框架

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-09 17:56
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmq7h8hbj033lsl5wbsh1rodt
- 原文链接：https://arxiv.org/abs/2606.10646

## AI 摘要

FlowTracer是一个针对大语言模型强化学习的框架，在注意力诱导的有向无环图上追踪从问题到正确答案的推理流。边容量来自聚合注意力权重，通过重新加权仅保留能到达答案区域的影响，并强制执行局部流守恒。提取信息流骨干，按流吞吐量对token评分，揭示高影响枢纽。重要性得分用于塑造token级奖励，使学习信号聚焦于路由信息的关键token，在多个推理任务上取得一致性能提升。

## 正文

Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We propose FlowTracer, an RL framework that traces answer-targeted reasoning flow on an attention-induced directed acyclic graph in which nodes correspond to tokens and edge capacities come from aggregated attention weights and derives token credit from this global structure. The edge capacities are reweighted to retain only the influence that can reach the answer region, while enforcing local flow conservation so intermediate tokens neither lose nor gain effective mass due to path length or irrelevant branches. On this graph, FlowTracer extracts an information-flow backbone connecting the question to the answer and scores tokens by flow throughput, revealing high-impact hubs and aggregation checkpoints that mediate long-range dependencies. These derived importances are used to shape token-level rewards, enabling learning signals to focus precisely on the tokens that route information toward (or away from) correct answers and delivering consistent performance gains across a range of reasoning tasks.