我们最近发布了一篇论文,表明UFP4,我们的均匀网格FP4训练方案,在密集1.5B、MoE 7.9B和MoE 124B长程预训练中,比强E2M1基线更接近BF16。 关键洞察:FP4训练质量不仅与比特宽度有关,还与网格几何有关。
We recently released a paper showing that UFP4, our uniform-grid FP4 training recipe, stays closer to BF16 than strong E2M1 baselines across Dense 1.5B, MoE 7.9B, and MoE 124B long-run pretraining. The key insight: FP4 training quality is not only about bit width, but also grid geometry.