# 次二次架构对比：xLSTM在代码预训练与时间序列任务中优于Mamba-2和Gated DeltaNet

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-06-11 01:33
- AIHOT 分数：65
- AIHOT 链接：https://aihot.virxact.com/items/cmq9g2yz40bksslldte00kwxv
- 原文链接：https://arxiv.org/abs/2606.12364

## AI 摘要

在代码模型预训练、从大语言模型蒸馏代码模型以及时间序列基础模型预训练三项任务中，xLSTM、Mamba-2和Gated DeltaNet三种次二次架构中，xLSTM取得最佳整体性能。通过统一公式和机制分析发现，xLSTM的门控方案实现了更灵活稳定的记忆校正，其状态跟踪和记忆累积优势在合成长度泛化任务中得到验证。

## 正文

Transformers dominate modern sequence modeling, but their quadratic attention incurs substantial computational cost. Subquadratic architectures offer a scalable alternative. However, it remains unclear which designs yield the most effective sequence models. We compare three leading approaches: xLSTM, Mamba-2, and Gated DeltaNet. We evaluate these models on tasks with complex dependencies: (1) code-model pre-training, (2) distillation of code models from large language models, and (3) pre-training of time-series foundation models. Across these settings, xLSTM delivers the strongest overall performance. To explain xLSTM's advantage, we present a unified formulation and analyze the underlying architectural mechanisms, focusing on state tracking and memory dynamics. Our results show that xLSTM enables more flexible and stable memory correction via its gating scheme. We corroborate these findings on controlled synthetic length-generalization tasks. Overall, our findings indicate that xLSTM's gains on complex tasks stem from robust state tracking and accumulation.