Boris Cherny@bcherny

2026-06-08 09:16·25天前

AI 摘要

多项基准显示 Claude Opus 是长时间运行工作的最佳模型。SWE-Marathon 基准评估编码智能体在 10 亿 token 预算下自主完成长期软件任务（如重写 JAX 代码为 PyTorch、用 Rust 构建 C 编译器）。Opus 在此类任务上领先。Boris Cherny 给出 5 个技巧：使用自动权限模式避免审批；用动态工作流协调数百/数千个智能体；用 /goal 或 /loop 推动持续执行；在云端使用 Claude Code（桌面/移动端）以便关闭笔记本；确保 Claude 能端到端自验证——Chrome 扩展验证网页、iOS/Android 模拟 MCP、启动完整后端服务。

Seeing a number of benchmarks showing Opus is the best model for long-running work.

Five tips for running Opus autonomously for hours/days：

Use auto mode for permissions， so Claude doesn't ask for approval
Use dynamic workflows， to have Claude orchestrate hundreds/thousands of agents to get a task done
Use /goal or /loop， to nudge Claude to keep going until it's done
Use Claude Code in the cloud， so you can close your laptop （easiest way is the desktop or mobile app）
Make sure Claude has a way to self-verify its work end to end： Claude in Chrome browser extension for web， iOS/Android sim MCP for mobile， a way to start the full web server or service for backend work

Rishi DesaiCan coding agents stay coherent over a 1 billion token budget? Can they build Slack from scratch? Rewrite a JAX codebase in PyTorch? Build a C compiler in Rust?...

智能体 Anthropic MCP/工具教程/实践

在 X 查看原推

Boris Cherny@bcherny · X

57导出 Markdown

2026-06-08 09:16·25天前

在 X 看原推· x.com

AI 摘要

Seeing a number of benchmarks showing Opus is the best model for long-running work.

Five tips for running Opus autonomously for hours/days：

Use auto mode for permissions， so Claude doesn't ask for approval
Use dynamic workflows， to have Claude orchestrate hundreds/thousands of agents to get a task done
Use /goal or /loop， to nudge Claude to keep going until it's done
Use Claude Code in the cloud， so you can close your laptop （easiest way is the desktop or mobile app）