elvis@omarsar0

2026-06-13 01:50·20天前

AI 摘要

DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作，会过早暂停、犯错或走捷径（reward hacking）。关键在于明确目标、消除假设，避免模型自行推断。他的实践公式：用Opus 4.8进行细致规划，GPT-5.5执行所有步骤，评估器（通过/goal）则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标，比纯文本目标更强，能更好地约束智能体。完整讨论已录制并免费开放。

How to effectively run autonomous long-running coding agents？

This is one of the most exciting discussions on agents I've ever had.

I recorded it and am making it freely available.

（bookmark it）

The idea of autonomous long-running agents is a real thing.

We talk about lots of things like /goal， /loop， and dynamic workflows， and what comes next.

One interesting discussion was around how to make the agent run for longer while ensuring it stays on track.

Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen， and lots of weird shortcuts （reward hacking）.

What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this.

But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator （via /goal）， I am often using something like Deepseek or the latest models from Qwen， Kimi， and MiniMax， etc.

Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals.

Watch here： https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d

智能体 Anthropic

elvis@omarsar0 · X

69导出 Markdown