一篇超级久拖(3年多了?)的关于缩放定律的博文。 计算成本高昂。缩放定律是一种帮助我们在大规模运行之前,推理数据与模型大小之间最优计算分配的方法。 此文涵盖缩放定律预测了什么、计算最优分配如何运作、Kaplan 等人与 Chinchilla 的分歧点何在,以及数据限制+拟合细节如何让外推变得棘手。 https://lilianweng.github.io/posts/2026-06-24-scaling-laws/
A super long overdue (3+ years?) post on scaling laws.
Compute is expensive. Scaling laws are a way to help us reason about the optimal compute allocation between data and model size before committing to a large run.
The post covers what scaling laws predict, how compute-optimal allocation works, why Kaplan et al. and Chinchilla disagree, and how data limits + fitting details make extrapolation tricky.
https://lilianweng.github.io/posts/2026-06-24-scaling-laws/