熵作为结构先验:DiT信念空间上的对数障碍驱动音乐多样性与发展
阅读原文· arxiv.org在监督式扩散训练中,Eisbach log-barrier利用DiT输出空间能量分布的熵导出无参数权重:高熵抑制梯度,低熵保留梯度。应用于Stable Audio 3 Medium在MusicCaps上的LoRA微调,意外产生更强的主题发展、更清晰的声学区分和更高的纹理多样性,而非模式坍缩。原因在于监督扩散中梯度方向锁定于真实值,置信度仅缩放步长,且时间熵压低平坦样本而保留高对比样本。该方法形成在线自参照数据课程,伴随噪声层级动力学分析与可检验预测。
Confidence-based loss weighting is usually avoided in generative models because it accelerates errors when the model is confidently wrong, but this intuition breaks down in supervised diffusion training. We introduce the Eisbach log-barrier, a parameter-free weight derived from the entropy of the DiT output's spatial energy distribution: high entropy damps the gradient, while low entropy preserves it. Applied to LoRA fine-tuning of Stable Audio 3 Medium on MusicCaps, it unexpectedly yields stronger thematic development, clearer acoustic differentiation, and higher textural diversity than unweighted training, the opposite of mode collapse. This works because in supervised diffusion the gradient direction is locked to ground truth, so confidence only scales the step size, and because temporal entropy downweights flat samples while preserving high-contrast ones. The result is an online, self-referential data curriculum that emerges purely from the forward pass, with analyzed noise-level dynamics and testable predictions.