对于 GPT 5.6 Sol,高达 750 tokens/sec。 当前 GPT-5.5 优先和规模层级服务宣称 99% >50 tokens/sec,因此 Cerebras 上的 Sol 声称达到该速率的 15 倍。 这个巨大数字来自专门的推理硬件:Sol 运行在 Cerebras 上,其晶圆级芯片旨在以远少于普通多 GPU 设置的存储和网络延迟来移动模型数据。
A huge 750 tokens/sec for GPT 5.6 Sol.
The current GPT-5.5 priority and scale-tier service advertises 99% >50 tokens/sec, so Sol on Cerebras is claiming up to 15x that rate.
This huge number is coming from the specialized inference hardware: Sol is being served on Cerebras, whose wafer-scale chip is designed to move model data with far less memory and networking delay than a normal multi-GPU setup.