StepFun@StepFun_ai

2026-06-29 23:19·3天前

AI 摘要

Step 3.7 Flash 在 Claw-Eval General 自主智能体评测中排名第二。我们在多步执行和长周期任务鲁棒性方面表现强劲，排名仅次于 Claude Opus 4.6。这是面向真实世界智能体工作负载的有前景的信号。

Step 3.7 Flash hits #2 on Claw-Eval General for autonomous agents.

We're seeing strong performance across multi-step execution and robustness in long-horizon tasks， ranking just behind Claude Opus 4.6.

Promising signals for real-world agent workloads.

StepFun@StepFun_ai · X

2026-06-29 23:19·3天前

AI 摘要

Step 3.7 Flash hits #2 on Claw-Eval General for autonomous agents.

We're seeing strong performance across multi-step execution and robustness in long-horizon tasks， ranking just behind Claude Opus 4.6.

Promising signals for real-world agent workloads.