# 高效预训练新范式：HRM-Text模型

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-20 08:00
- AIHOT 分数：67
- AIHOT 链接：https://aihot.virxact.com/items/cmpez2xg101ylsljwginf0rid
- 原文链接：https://arxiv.org/abs/2605.20613

## AI 摘要

本文提出HRM-Text，一种受生物系统启发的预训练新范式。它以分层循环模型取代标准Transformer，将计算解耦为慢速策略层和快速执行层，并使用指令数据进行训练。一个仅10亿参数的HRM-Text模型，使用400亿令牌、在1500美元预算内训练，即可在MMLU等多个基准上取得与2-7B开源模型竞争的成绩。相比标准方法，其训练数据量与计算量大幅减少，证明了架构与目标的协同设计能显著降低预训练门槛。

## 正文

The current pretraining paradigm for large language models relies on massive compute and internet-scale raw text, creating a significant barrier to foundational research. In contrast, biological systems demonstrate highly sample-efficient learning through multi-timescale processing, such as the functional organization of the frontoparietal loop. Taking this as inspiration, we introduce HRM-Text, which replaces standard Transformers with a Hierarchical Recurrent Model (HRM) that decouples computation into slow-evolving strategic and fast-evolving execution layers. To stabilize this deep recurrence for language modeling, we introduce MagicNorm and warmup deep credit assignment. Furthermore, instead of standard raw-text pretraining, we train exclusively on instruction-response pairs using a task-completion objective and PrefixLM masking. Serving as an empirical existence proof of efficient pretraining, a 1B-parameter HRM-Text model trained from scratch on only 40 billion unique tokens and $1,500 budget achieves 60.7% on MMLU, 81.9% on ARC-C, 82.2% on DROP, 84.5% on GSM8K, and 56.2% on MATH. Despite utilizing roughly 100-900x fewer training tokens and 96-432x less estimated compute than standard baselines, HRM-Text performs competitively with 2-7B parameter open models. These results demonstrate that co-designing architectures and objectives can radically reduce the compute-to-performance ratio, making pretraining from scratch accessible to the broader research community.
