Rohan Paul@rohanpaul_ai

2026-06-18 00:52·15天前

AI 摘要

TensorDyne 发布 AI 推理机架 Napier，声称在 DeepSeek-R1 上基于内部模拟达到 363,000 tokens/s（用户速度 210 tokens/s），是 NVIDIA NVL72 GB300（27,400 tokens/s）的 13 倍。Napier 在对数空间中运算，将乘法转为加法，从而降低芯片面积与功耗，更多晶体管用于 SRAM，每 token 能耗更低、推理密度更高。此举改变 AI 推理经济学，不再单纯比拼 FLOPS，而是转向功率、内存局部性、互连延迟与 token 服务成本。

Quite a massive inferencing rack breakthrough from @TensordyneInc .

They just announced an AI-inference rack， claiming 13x the rack throughput of NVIDIA's NVL72 GB300 in a DeepSeek-R1 comparison based on internal simulations.

What makes this a big deal is that Tensordyne is attacking inference at the math level.

AI chips spend huge amounts of energy moving and multiplying numbers.

Napier （its AI inference racks） works in log space， where multiplication becomes addition， and addition is cheaper to build， switch， cool， and repeat billions of times per token.

So instead of spending tons of transistor budget on heavy multiply circuits， Napier tries to shrink the math itself.

So that means less chip area for compute and more for SRAM， resulting in less power per token and way more inference packed into the same rack.

If they have made log math accurate and fast enough for real inference， then Napier is not just pushing more power into a rack， it is changing the cost of the basic operation behind model serving.

AI inference is no longer just a FLOPS race. It is a rack-level fight over power， memory locality， interconnect latency， and how many paying tokens can be served before the economics break.

They reported their TDN Rack reaches 363，000 tokens per second on DeepSeek-R1 at user speeds of 210 tokens per second per internal simulation， compared with 27，400 tokens per second for Nvidia's NVL72 GB300.

🧵 1.

产品更新推理

Rohan Paul@rohanpaul_ai · X

51导出 Markdown