Rohan Paul@rohanpaul_ai

2026-04-30 04:31·64天前

AI 摘要

哈佛、斯坦福、UC伯克利等顶尖实验室联合提出，深度学习正从经验优化转向可解释的科学理论。尽管神经网络架构、数据等完全公开，但其复杂互动使得预测训练过程仍依赖大量实验。作者倡导建立“学习力学”，类似物理学关注宏观规律，通过可解玩具模型、无限宽度极限、缩放定律等五种路径，揭示训练动态与性能演化的整体性法则。这一理论与专注于局部电路的机制可解释性研究形成互补，共同探索学习的全局定律。

Beautiful new paper from Harvard， Stanford， UC Berkeley and other top labs.

Shows that DeepLearning is finally becoming the kind of thing science can explain， not just optimize.

Because we still do not have a compact， predictive theory that tells us ahead of time how a neural network will learn， scale， and respond to training choices without mostly testing it first.

Not that we will soon explain every weight， but that we may learn the coarse laws governing training， representation， and performance.

That shift matters because neural nets are not hidden systems. We know the architecture， the data， the objective， and the update rule. The obstacle is not secrecy. It is the complexity of many simple parts interacting at once.

So the authors propose "learning mechanics，" a physics-like program that studies the motion of learning itself.

"Learning mechanics" is their name for a hoped-for set of broad laws， similar to how physics explains gases without tracking every molecule， that explains the overall behavior of neural nets instead of just describing one model at a time.

Physics became useful by ignoring microscopic detail when the right aggregate variables were enough， and this paper says deep learning theory is maturing in exactly that direction through solvable toy models， infinite limits， scaling laws， hyperparameter theories， and universal behaviors.

The claim is that training a neural net may be less like recipe tweaking and more like physics， where you stop tracking every tiny part and instead predict the large patterns that keep showing up.

That means studying how gradients move parameters， how representations form， and why behavior changes in regular ways as model size， data， and compute grow.

The paper says this theory is taking shape through 5 routes： solvable toy models， simplifying limits like infinite width， simple laws like scaling laws， theories of hyperparameters， and behaviors that look universal across many systems.

The central bet is that useful laws can exist even when full microscopic detail is hopeless， just like thermodynamics explains gases without tracking every molecule.

This also fits neatly beside mechanistic interpretability， because one tries to find local circuits while the other tries to find global laws of learning.

----

Paper Link - arxiv. org/abs/2604.21691

Paper Title： "There Will Be a Scientific Theory of Deep Learning"

推理数据/训练论文/研究

在 X 查看原推导出 Markdown

Rohan Paul@rohanpaul_ai · X

51导出 Markdown