# 大语言模型多领域强化学习中的干扰与恢复的局部微扰理论

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-01 08:00
- AIHOT 分数：61
- AIHOT 链接：https://aihot.virxact.com/items/cmpxkya4604y2slckcquhtcy2
- 原文链接：https://arxiv.org/abs/2606.02398

## AI 摘要

研究发现，对大语言模型进行单一领域（如数学、代码）的强化学习后训练，会对其他领域产生干扰，即使全模型梯度近似正交也会发生。论文提出了一个局部微扰模型来解释此现象：干扰主要通过一个集中在低维共享冲突子空间中的二阶损害项发生。理论证明，一次简短的领域刷新可以收缩该子空间中的有害分量，从而实现选择性恢复。实验表明，在经历代码→数学→问答→创作写作的序列训练后，进行Re-Math刷新可将数学性能恢复，同时基本保持其他领域表现。

## 正文

Reinforcement learning (RL) post-training improves large language models (LLMs) on individual domains such as mathematical reasoning, code generation, question answering, and creative writing (CW), but training on one domain often degrades performance on others. Existing explanations based on catastrophic forgetting or global gradient conflict are incomplete: substantial interference can occur even when full-model gradients are nearly orthogonal. We show that single-domain RL produces sparse, small-magnitude parameter edits with weak overlap among top-changed neurons, while different domains still share substantial active computation routes on which update directions determine whether they act synergistically or conflict. Guided by this observation, we prove under a local perturbation model of multi-domain RL that later-domain training harms an earlier domain mainly through a second-order damage term, which under the observed sparse route structure concentrates in a low-dimensional shared conflict subspace. Moreover, a short domain refresh contracts the harmful component on this subspace, enabling selective recovery with limited collateral damage. Consistent with the theory, a brief Re-Math refresh after Code rightarrow Math rightarrow QA rightarrow CW recovers Math from 57.66 to 66.04 while largely preserving performance on the other domains, yielding the best average score of 66.39. Beyond refresh, a training-free rollback on a sparse proxy conflict coordinate set for the Math-QA pair partially restores Math, providing direct proxy-level evidence for localized damage. These results provide a localized mechanistic account of interference and recovery in multi-domain RL.
