浮点运算不满足结合律!许多高性能计算核心会将工作负载分配到多个流多处理器上,并以非确定性顺序累加部分结果。许多AI实验室只能接受这一点,或为确定性付出巨大的性能代价。DeepSeek决定两者都不选。(1/4) 🧵
Floating point math is not associative! And many of the highest performance kernels split the workload among SMs and accumulate partial results in a nondeterministic order. Many AI labs just accept this, or pay a huge performance penalty for determinism. DeepSeek decided to do neither. (1/4) 🧵