TensorDyne 发布 AI 推理机架 Napier,声称在 DeepSeek-R1 上基于内部模拟达到 363,000 tokens/s(用户速度 210 tokens/s),是 NVIDIA NVL72 GB300(27,400 tokens/s)的 13 倍。Napier 在对数空间中运算,将乘法转为加法,从而降低芯片面积与功耗,更多晶体管用于 SRAM,每 token 能耗更低、推理密度更高。此举改变 AI 推理经济学,不再单纯比拼 FLOPS,而是转向功率、内存局部性、互连延迟与 token 服务成本。
Quite a massive inferencing rack breakthrough from @TensordyneInc .
They just announced an AI-inference rack, claiming 13x the rack throughput of NVIDIA's NVL72 GB300 in a DeepSeek-R1 comparison based on internal simulations.
What makes this a big deal is that Tensordyne is attacking inference at the math level.
AI chips spend huge amounts of energy moving and multiplying numbers.
Napier (its AI inference racks) works in log space, where multiplication becomes addition, and addition is cheaper to build, switch, cool, and repeat billions of times per token.