# CHIAR-Former：明暗注意力--在黑暗中分配计算

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-06 08:00
- AIHOT 分数：59
- AIHOT 链接：https://aihot.virxact.com/items/cmq6ci8g4075nsl5i4wkvuwto
- 原文链接：https://arxiv.org/abs/2606.08327

## AI 摘要

CHIAR-Former 是一种 4 层混合 Transformer，根据每个 token 的谱熵将其路由至 DCT 谱混合或全自注意力（RBF 核混合在消融中被拒绝）。仅含 DCT+注意力的变体在 WikiText-103 上获得 Val PPL 36.54，相比全注意力基线（PPL 66.62）提升 45%，同时减少 62.5% 注意力 FLOPs。在 WikiText-2、IMDB 情感分类和 ListOps 上的评估表明，模型在大规模自然文本中因 token 多样性受益，而全注意力在小数据集和合成任务中仍占优势。

## 正文

Standard transformers apply self-attention uniformly at every layer and token, regardless of whether the input requires dynamic cross-token interaction. We propose CHIAR-Former (Chiaroscuro Attention), a 4-layer hybrid transformer that routes each token to one of three operators - DCT spectral mixing, RBF kernel mixing, or full self-attention - based on per-token spectral entropy, a theoretically justified complexity signal. Through systematic ablation on WikiText-103, we discover routing collapse: the router consistently rejects RBF in favour of DCT and attention, revealing that spectral mixing and dynamic attention are complementary and sufficient. A purpose-designed DCT+Attention-only variant achieves Val PPL 36.54 on WikiText-103 - a 45% improvement over a full-attention baseline (PPL 66.62) at 62.5% fewer attention FLOPs. We extend evaluation to WikiText-2, IMDB sentiment classification, and synthetic ListOps operations, establishing a clear operating regime: CHIAR-Former excels on large-scale naturalistic text where token diversity supports spectral specialisation, while full attention retains an edge on small datasets and synthetic pattern-matching tasks. These findings - both the wins and the losses - together define when and why spectral routing earns its keep.
