# 深度学习的Hamilton-Jacobi理论

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-27 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmpwib79x0367slsn7thjinqu
- 原文链接：https://arxiv.org/abs/2605.28983

## AI 摘要

该论文将神经网络训练过程重新解释为对Hamilton-Jacobi初值问题的搜索。每次梯度下降都为粘性Hamilton-Jacobi方程选择初始数据，使得其Hopf-Cole传播器最佳拟合观测数据。此对应关系在log-sum-exp层中是精确的，对残差网络、Transformer及各类循环架构（RNN、LSTM、SSM）等更广泛的网络结构则是结构性的。一个变形参数ε统一了神经网络、热带代数、粘性偏微分方程与凸优化四个视角。定量结论包括：泛化率下界、由ε控制的对抗鲁棒性、将反向传播解释为残差网络Hamilton系统的共态方程，以及具有闭式解O(N)的影响函数。

## 正文

In this paper, training a neural network is identified, exactly, as a search through Hamilton--Jacobi initial-value problems: each gradient step selects the initial data of a viscous Hamilton--Jacobi equation whose Hopf--Cole propagator best fits the observations; at inference, the input is the spatial point at which that solution is evaluated and the initial condition is already encoded in the weights. The correspondence is exact for log-sum-exp layers and structural for broader architectures: residual networks, transformers, and recurrent architectures (RNNs, LSTMs, SSMs) each discretize the same class of Hamilton--Jacobi equations, with architecture-dependent Hamiltonian and viscosity. A single deformation parameter varepsilon unifies all four perspectives (network, tropical algebra, viscous PDE, convex optimization) in a commutative diagram closed under Lipschitz conditions. Quantitative consequences include: the minimax optimal generalization rate O(n^{-1/(d+2)}) for fixed t; adversarial robustness controlled by varepsilon; backpropagation as the co-state equation of the Hamiltonian system for residual networks (Pontryagin Maximum Principle); scaling exponents consistent with data intrinsic dimension via PDE quadrature; and a closed-form O(N) influence function (softmax attribution weights π_j) whose entropy landscape undergoes fold bifurcations as varepsilon increases, each merging attribution basins.