# The three biggest hps for stable training in everything are lr， bs， and beta2. We've built up good i…

- 来源：Saining Xie (@sainingxie)
- 发布时间：2025-07-11 07:33
- AIHOT 链接：https://aihot.virxact.com/items/cmnxjn7y800fpsl9os9hfw3yh
- 原文链接：https://x.com/sainingxie/status/1943453528099258529

## AI 摘要

对于所有任务中稳定训练的三个最重要超参数是 lr、bs 和 beta2。随着时间推移，我们已经建立了关于如何调整它们的良好直觉，但这篇文章分析性地、令人信服地阐述了这一切。

## 正文

The three biggest hps for stable training in everything are lr， bs， and beta2. We've built up good intuitions on how to tune them over time， but this lays it all out analytically and convincingly.

this is definitely my new handbook for training big models on small gpus.

### 引用推文

> Micah Goldblum：🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretr...
