蚂蚁百灵发表UFP4论文,提出均匀网格FP4训练配方。在Dense 1.5B、MoE 7.9B和MoE 124B长程预训练中,该配方相比强E2M1基线实现了更低的BF16相对损失退化。论文指出,配合细粒度缩放和RHT后,FP4训练的瓶颈从动态范围转向局部分辨率,E1M2/INT4格式能更好利用RHT改进的桶分配,而E2M1可能使RHT有害。论文地址:https://arxiv.org/abs/2606.20381
Great breakdown from Qian. In our recent UFP4 paper, we show that a uniform-grid FP4 recipe achieves lower BF16-relative loss degradation than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining. Full paper: https://arxiv.org/abs/2606.20381