Ant Ling@AntLingAGI

2026-06-24 20:40·16小时前

AI 摘要

蚂蚁百灵发表UFP4论文，提出均匀网格FP4训练配方。在Dense 1.5B、MoE 7.9B和MoE 124B长程预训练中，该配方相比强E2M1基线实现了更低的BF16相对损失退化。论文指出，配合细粒度缩放和RHT后，FP4训练的瓶颈从动态范围转向局部分辨率，E1M2/INT4格式能更好利用RHT改进的桶分配，而E2M1可能使RHT有害。论文地址：https://arxiv.org/abs/2606.20381

Great breakdown from Qian. In our recent UFP4 paper， we show that a uniform-grid FP4 recipe achieves lower BF16-relative loss degradation than strong E2M1 baselines across Dense 1.5B， MoE 7.9B， and MoE 124B long-run pretraining. Full paper： https：//arxiv.org/abs/2606.20381

QianShould FP4 training still default to E2M1?🤔 With fine-grained scaling + RHT, the bottleneck may shift from dynamic range to local resolution. E1M2/INT4 better ...

arXiv数据/训练论文/研究

在 X 查看原推

Ant Ling@AntLingAGI · X

2026-06-24 20:40·16小时前

AI 摘要

QianShould FP4 training still default to E2M1?🤔 With fine-grained scaling + RHT, the bottleneck may shift from dynamic range to local resolution. E1M2/INT4 better ...

arXiv数据/训练论文/研究

在 X 查看原推x.com