Rohan Paul@rohanpaul_ai · 5月26日65This Meta + Stanford + Illinois survey paper argues that AI agents work better when code becomes their main working layer.
The problem is that an LLM by itself is mostly a text predictor, so long tasks can lose state, hide mistakes, and turn plans into actions in fragile ways.
The real advance is not “AI writes code,” but “AI uses code as the environment it thinks inside.”
The authors call the surrounding system an agent harness, meaning the tools, memory, sandboxes, checks, and feedback loops that turn a model into an agent.
Their core idea is that code should sit at the center of that harness, because code can be run, inspected, checked, saved, edited, and shared.
Tests become sensors.
Repositories become memory.
Logs become history.
Sandboxes become boundaries.
A generated script is no longer merely an answer; it is a handle the system can run, check, revise, share, and roll back.
The main finding is a pattern across many fields: code helps agents reason through executable steps, act through tool calls or control programs, and model environments through tests, traces, logs, repositories, and simulators.
----
Paper Link – arxiv. org/abs/2605.18747
Paper Title: "Code as Agent Harness"
译Meta、斯坦福与伊利诺伊的研究论文指出,AI智能体在将代码作为主要工作层时性能更佳。论文认为,大语言模型(LLM)作为文本预测器,在处理长任务时存在状态丢失、错误隐蔽等问题。真正的进步并非“AI写代码”,而是“AI在代码环境中思考”。论文的核心是提出一个以代码为中心的“智能体框架”,即工具、记忆、沙箱等系统。在此框架中,测试成为传感器,代码库成为记忆,日志成为历史,沙箱成为边界。生成的脚本成为可运行、检查、修改和共享的操控对象。总结发现,代码能通过可执行步骤帮助智能体推理,通过工具调用行动,并通过测试、日志等对环境进行建模。