阶跃星辰发布其推理优化型模型Step 3.7 Flash。该模型为196B MoE架构,从设计之初就专注于推理效率。其采用多矩阵分解注意力机制,使KV-cache成本仅为DeepSeek模型的约22%;同时通过注意力与FFN解耦技术,实现了硬件优化的高效服务。该模型已通过Fireworks AI提供,采用Apache 2.0许可,并可用于构建智能体应用。
This is exactly the philosophy: don't bolt on efficiency, design for it from day one.
MFA + AFD aren't tricks. They're what lets Step 3.7 Flash serve at a fraction of the KV-cache cost.
Huge thanks to @FireworksAI_HQ for making Step 3.7 Flash one-click to run.
Go build something agentic with it.