Cerebras在其晶圆级芯片上实现了每秒981 tokens的推理速度,处理参数规模达1万亿的Kimi K2.6模型。该速度已获Artificial Analysis验证,是当前最快GPU云方案的6.7倍。其技术优势源于单一晶圆集成设计,大幅减少了芯片间通信延迟,从而突破了传统GPU集群因跨芯片数据搬运造成的性能瓶颈。这一速度提升对需要运行企业级编码代理等大型AI应用至关重要,能显著缩短测试、调试与迭代周期。
Cerebras reported 981 tokens/sec on the 1T-parameter Kimi K2.6 model. 6.7× faster than the next GPU cloud, validated by Artificial Analysis.
The hard part is moving model weights and activations fast enough, because normal GPU clusters split the model across many chips and spend a lot of time passing data between them.
Cerebras uses wafer-scale chips, meaning one processor is built across a full silicon wafer, so more of the routing happens on-chip with much higher bandwidth and lower delay.
The real business claim is not just speed, but speed on a model big enough for enterprise coding agents, where every extra second slows testing, debugging, and iteration.
---