Human in the /loop

What I like most about coding with agents right now is the room to leave a few runs going and still get on with other work. When something finishes or needs a call， I show up.

This post is a short explainer of the setup I use， a definition of done the agent can score， a loop that keeps going until it should stop， pings so I know when to lean in.

Find something the agent can verify

Before kicking off a longer running task， I lock a definition of done. Examples I actually use：

Model or eval work. Target is a score. Change the approach， run the eval， keep the change only if the number moved the right way. Closest to Karpathy's autoresearch for ML training loops.

Web app or UI. Target is a QA pass. Load the page or run Playwright， screenshot it， make sure it still does the thing.

Backend or refactor. Target is the test suite. Failing tests first， then green， and it has to stay green.

Speed or flakiness. Target is a number （p95， a benchmark）. Change and measure until you're under the line you set.

Data or content cleanup. Target is a count. Loop until zero rows fail validation， or every item passes the check.

Writing the loop is mostly writing how you'd check the work yourself. Some tasks need every step on the page. Others I give a goal and a rough direction and let the model fill in the middle. I start more explicit than I think I need， then loosen it once I see what it can infer.

Wrap it in a loop

Definition of done in hand， I tell the agent to loop on it. Change something， measure， keep or revert， go again. Doesn't have to be one tiny edit each time. The step just has to be measurable against the target. I care most about the stop conditions， which might be

eric zakariasson@ericzakariasson · X

68导出 Markdown

2026-06-26 21:04·6天前

在 X 看原推· x.com

AI 摘要

Eric Zakariasson 分享其AI智能体编程工作流：先设定可验证的完成标准（如模型评估分、测试全绿、p95阈值等），再将任务包装成循环——智能体反复修改、测量、保留或回退，直到达标、多轮无改进、思路用尽或遇阻。通过MCP和/notify向Slack发送通知，需要决策时主动联系人类。循环在云端运行，可同时启动多个长循环，并穿插PR、一次性调查等短任务。提示词模板用/loop驱动迭代、/notify保持更新。

http://x.com/i/article/2070417295810166784

Human in the /loop

What I like most about coding with agents right now is the room to leave a few runs going and still get on with other work. When something finishes or needs a call， I show up.

This post is a short explainer of the setup I use， a definition of done the agent can score， a loop that keeps going until it should stop， pings so I know when to lean in.

Find something the agent can verify

Before kicking off a longer running task， I lock a definition of done. Examples I actually use：

Model or eval work. Target is a score. Change the approach， run the eval， keep the change only if the number moved the right way. Closest to Karpathy's autoresearch for ML training loops.

Human in the /loop

Find something the agent can verify

Wrap it in a loop

Human in the /loop

Find something the agent can verify

Get pinged instead of babysitting

Run it in the cloud

Then start the next one

Prompting

Wrap it in a loop

Get pinged instead of babysitting

Run it in the cloud

Then start the next one

Prompting