多项基准显示 Claude Opus 是长时间运行工作的最佳模型。SWE-Marathon 基准评估编码智能体在 10 亿 token 预算下自主完成长期软件任务(如重写 JAX 代码为 PyTorch、用 Rust 构建 C 编译器)。Opus 在此类任务上领先。Boris Cherny 给出 5 个技巧:使用自动权限模式避免审批;用动态工作流协调数百/数千个智能体;用 /goal 或 /loop 推动持续执行;在云端使用 Claude Code(桌面/移动端)以便关闭笔记本;确保 Claude 能端到端自验证——Chrome 扩展验证网页、iOS/Android 模拟 MCP、启动完整后端服务。
Seeing a number of benchmarks showing Opus is the best model for long-running work.
Five tips for running Opus autonomously for hours/days:
- Use auto mode for permissions, so Claude doesn't ask for approval
- Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
- Use /goal or /loop, to nudge Claude to keep going until it's done
- Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)