# 基于角度-范数分解的激活干预几何分析

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-04 08:00
- AIHOT 分数：49
- AIHOT 链接：https://aihot.virxact.com/items/cmq6l2wnb09ensl5id1z8fo4p
- 原文链接：https://arxiv.org/abs/2606.06735

## AI 摘要

本研究通过控制实验解耦隐藏状态的径向与角度分量，发现不同激活干预方法的主要差异在于如何耦合 token 与概念方向的角度对齐及隐藏状态范数变化。在七个语言模型上，概念主要编码于角度结构，但范数对干预稳定性和下游效果仍至关重要。结果解释了概念效果相似的干预可能表现不同的原因，建议将激活干预参数化为可解释的角度和径向分量，而非单一加性系数。

## 正文

Linear activation steering has gained popularity as a simple and empirically effective way to control language model behavior. More recently, spherical steering paradigms have been proposed to address limitations of additive interventions, often motivated by the assumption that hidden-state norm does not carry concept-relevant information. In this work, we revisit this assumption through a controlled empirical study designed to disentangle the roles of angular and radial components. We show that steering methods differ mainly in how they couple two geometric effects: changing a token's angular alignment with a concept direction and changing its hidden-state norm. Across seven language models, we find that concepts are represented primarily in angular structure, supporting the motivation for spherical methods, but that norm remains important for the stability and downstream effects of steering. Our results explain why interventions with similar concept-level effects can behave differently, and suggest that activation steering should be parameterized by interpretable angular and radial components of the intervention, rather than by a single additive coefficient that entangles these two effects.
