兄弟们，讲真！那些让你每个月付费的AI工具、Bloomberg终端、交易系统、视频工作室，其实是资本用来维持稳定收入的机制。大家还在继续订阅OpenAI、HeyGen、Bloomberg吗？ GitHub上已经有10个开源项目，完全可以替代这些付费产品。它们免费、功能强大、支持自托管，并且让你完全掌握自己的数据和控制权。 1. AutoHedge：四个AI代理组成自主对冲基金，在Solana上实时交易，pip install即可运行 → https://github.com/The-Swarm-Corporation/AutoHedge 2. Vibe-Trading：64个金融技能加29个专家代理群，使用DAG模型实时讨论策略，包含清算热图和代币解锁追踪 → https://github.com/HKUDS/Vibe-Trading 3. Fincept Terminal：笔记本上运行的Bloomberg替代品，提供CFA级别分析以及巴菲特、达里奥、索罗斯等20多位投资大佬的AI代理，连接100多个数据源 → https://github.com/Fincept-Corporation/FinceptTerminal 4. LibreChat：自托管版ChatGPT+Claude+Gemini+DeepSeek等20多个模型，你的数据和历史全部保留在本地 → https://github.com/danny-avila/LibreChat 5. Open Higgsfield AI：自托管电影工作室，把Flux、Midjourney、Sora、Kling、Veo、GPT-4o全部集成进去，支持文本生图、图生视频，本地运行 → https://github.com/Anil-matcha/Open-Higgsfield-AI 6. Open-LLM-VTuber：开源AI虚拟主播，直接本地部署 → https://github.com/Open-LLM-VTuber/Open-LLM-VTuber 7. Claude Ads：Claude一键生成广告素材工具 → https://github.com/AgriciDaniel/claude-ads 8. Agentic Inbox：AI直接帮你管理邮箱，自动处理邮件 → https://github.com/cloudflare/agentic-inbox 9. Camofox Browser：无头浏览器，让AI代理完全隐身操作 → https://github.com/jo-inc/camofox-browser 10. Hyperframes：AI直接写HTML生成专业视频 → https://github.com/heygen-com/hyperframes 我们一直以为AI变革来自估值百亿的大公司。然而GitHub上的这些开源项目正在把机构级工具直接交给普通人。

译该推文指出，许多付费的AI工具和专业软件（如Bloomberg）是资本维持收入的机制。GitHub上已有10个开源项目可替代它们，提供免费、功能强大且支持自托管的选项。示例包括：AutoHedge（自主交易代理）、Vibe-Trading（金融技能与代理系统）、Fincept Terminal（Bloomberg替代品）、LibreChat（多模型聊天）以及Open Higgsfield AI（电影工作室）等。这些项目让用户能完全掌控自己的数据和控制权，将原本昂贵的订阅服务免费提供给普通人。

StepFun@StepFun_ai · 6月1日58

A thoughtful take on Step 3.7 Flash and the new frontier of agent efficiency, from @FrankYouChill 👇

译关于 Step 3.7 Flash 与智能体效率新前沿的深度思考，来自 @FrankYouChill 👇 [引用 @FrankYouChill]：http://x.com/i/article/2060950736851316737

Peter Steinberger 🦞@steipete · 6月1日58

Been teaching codex to be my QA assistant. For every commit it creates a user-test scenario and uses webVNC (crabbox), computer/browser use (peekaboo/mcporter) to test OpenClaw like a user/QA person would. This runs in the background and opens PRs with fixes.

译一直在训练Codex成为我的QA助手。对于每次提交，它都会创建一个用户测试场景，并使用webVNC（crabbox）、computer/browser use（peekaboo/mcporter）来像用户/QA人员一样测试OpenClaw。这会在后台运行，并自动提交带有修复的PR。

Ethan Mollick@emollick · 6月1日67

/goal and other fully automated AI agents are cool, but not a great model for the future of work with people. Instead you want your AI to know when to ask you GOOD questions, maybe because it is stuck, maybe because your taste matters, maybe because you would find it interesting.

译/goal 和其他完全自动化的 AI 智能体很酷，但并非人类未来工作的理想模式。相反，你希望你的 AI 知道何时向你提出好问题——可能因为它卡住了，可能因为你的品味很重要，也可能因为你会觉得有趣。

elvis@omarsar0 · 6月1日58

As we target more complex use of coding agents (e.g., dynamic workflows and /goals) on long-horizon tasks, you will start to see all kinds of bizarre issues like this. This is just about user experience; it's even more insane what happens behind the scenes (ridiculous use of tokens, infinite loops, inefficient agent-to-agent interactions). You really want to own that harness and be in more control of it as we target more advanced use cases of coding agents. Multi-agent systems are just another beast to deal with.

译该推文指出，当编程智能体被用于处理更复杂的长时间任务时，会产生从用户体验到后台系统的多重挑战。前端表现为各种奇怪问题，后端则存在严重的token浪费、无限循环和智能体间低效交互。作者强调，在这种更复杂的用例下，拥有并控制运行框架变得至关重要，并指出多智能体系统是另一个需要应对的难题。

AYi@AYi_AInotes · 6月1日63

Paul Graham这句话简直能骂醒90%公司的 CEO，他说"唯一比CEO亲自深度参与用AI造东西更糟糕的事，就是CEO完全不亲自深度参与用AI造东西" 看到很多人都在骂他不懂管理，说CEO就该做战略，不该插手执行，但其实他们压根没看懂这句话的真正分量。他没说要让CEO去当全职工程师写生产代码，核心表达是别再只看PPT听汇报了，别再把AI全丢给那个AI转型负责人了，你得自己亲手去写Prompt，去造Agent，去用AI自动化你的工作流去撞墙，去感受AI在哪里优雅，在哪里崩坏，在哪里需要人判断。 AI是一场每周都在迭代的认知革命，今天不可能的事，明天可能就变成了10倍效率，今天看起来很坚固的护城河，明天可能就被AI一脚踩平，你靠二手信息建立起来的战略，本质上就是在看后视镜开车，等你反应过来的时候，你的公司已经被那些天天泡在AI里的创始人甩得连尾灯都看不见了，很多人说过度参与会让CEO忽略大局但Paul Graham说的很清楚，这两种错误的危害根本不在一个量级，过度参与最多是效率低一点，但完全不参与是直接判了公司的死刑，所以建议所有CEO们： 1. 每天强制留出1小时，什么都不干，只用AI做你自己的真实工作 2. 不要做高大上的Demo，去做最脏最累的活：处理邮件、写文档、分析数据 3. 每周至少用AI造一个能真正用起来的小工具 4. 不要问你的团队"AI能做什么"，你自己得先搞清楚"AI不能做什么" 在工业时代，不摸机器的工厂主会被淘汰，在互联网时代，不用互联网的老板会被淘汰，在AI时代，不亲手用AI的CEO，可能会旁观自己公司的被淘汰。

译Paul Graham警示CEO：比亲自深度参与用AI构建更糟的，是完全不参与。核心观点是CEO不能只依赖汇报与PPT，必须亲手写提示词、造智能体、用AI自动化工作流，亲身感受其能力与局限。AI认知每周都在迭代，依赖二手信息制定战略如同看后视镜开车，公司会被天天泡在AI里的创始人甩开。文章建议CEO：每天花1小时用AI处理实际工作、每周造一个能用的小工具，并先弄清AI不能做什么。在AI时代，不亲手实践的CEO可能旁观公司被淘汰。

elvis@omarsar0 · 6月1日37

😂PewDiePie building his own agent orchestrator and releasing it was not on my 2026 bingo card. Own the agent. Own the harness. It's not that hard, folks.

译😂PewDiePie构建并发布自己的智能体编排器，这完全不在我的2026年预测清单上。拥有智能体。拥有框架。这并不难，各位。

StepFun@StepFun_ai · 6月1日32

Intelligence got us here. Efficiency is what gets real work done. At ClawCon Macao, our GM of Developer Business @EileenTal laid out the next frontier for agents — and the thinking behind Step 3.7 Flash. 👏

译阶跃星辰（Step）发布Step 3.7 Flash模型。公司开发者业务负责人指出，模型竞争的新前沿不再是单纯的智能，而是智能体效率。新的目标是让AI智能体能够可靠、高效、规模化地完成真实世界的工作任务。

StepFun@StepFun_ai · 6月1日65

Intelligence got us here. Efficiency is what gets real work done. At ClawCon Macao, our GM of Developer Business @EileenTal laid out the next frontier for agents — and the thinking behind Step 3.7 Flash. 👏

译阶跃星辰在BEYOND ClawCon Macao活动上提出，模型竞争的新前沿是智能体效率，即可靠、高效、大规模地完成实际工作的能力，而不仅仅是智能本身。Step 3.7 Flash模型正是基于这一思考推出的。

elvis@omarsar0 · 6月1日60

// The Efficiency Frontier // Cool paper on context management. As agents reuse the same documents and histories across many turns, the cheapest context strategy is not fixed. This work describes a principled rule for picking one per deployment instead of defaulting to whatever topped a benchmark in isolation. Retrieval and compression methods are almost always benchmarked on accuracy and cost separately, so you never learn when one actually beats another under real load. The Efficiency Frontier models context strategy selection as a single cost-performance problem, with a log-utility term for diminishing returns from extra context and a reuse parameter N that amortizes preprocessing across repeated queries. Sweep N and the optimal strategy changes, exposing crossover regions where retrieval, compression, or full context each wins. On 5,000 HotpotQA instances, deployment-aware selection cuts effective token usage about 25 percent at the same performance, and amortized memory compression runs over 50 percent cheaper than full-context prompting in higher-performance settings. Paper: https://arxiv.org/abs/2605.23071 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译该论文指出，当AI智能体在多轮对话中重复使用相同文档和历史记录时，固定的上下文策略并非最优。研究提出了“效率前沿”框架，将上下文策略选择建模为一个成本与性能的平衡问题。通过引入重用参数N进行扫描，可以识别出检索、压缩或全上下文各自占据优势的交叉区域。在5000个HotpotQA实例上的测试表明，部署感知的选择能在保持相同性能下减少约25%的有效token使用量，而摊销内存压缩在高性能设置下比全上下文提示的运行成本便宜超过50%。

🚨 AI News | TestingCatalog@testingcatalog · 5月31日56

BUILD 🔥: Exclusive, an early look at Copilot Code and Copilot Cowork tabs of the upcoming super app from Microsoft, following earlier leaked screenshot of the Scout 24/7 Agent. Developers, Developers, Developers 👏

译独家抢先看：微软即将推出的超级应用中，Copilot Code 和 Copilot Cowork 标签页的早期界面，此前已泄露了 Scout 24/7 Agent 的截图。开发者们，开发者们，开发者们 👏

Baidu Inc.@Baidu_Inc · 5月31日59

http://x.com/i/article/2060155258350014464 # Meet DAA: A New Metric Built for Results in The Agent Era Dear friends, welcome to May edition of AI Pulse. A lot happened at Baidu Create 2026 (our annual developer conference) this month — new agents, new infrastructure, new product launches. But the idea we keep coming back to is one our CEO, Robin, put on the table during his keynote: DAA, Daily Active Agents. A new way of measuring the value AI is delivering. In this issue, we'll take a deep dive into what DAA is and why it matters to the industry. Token consumption is at record highs and climbing. According to Goldman Sachs Research, agentic AI is expected to drive a 24-fold increase in token consumption by 2030. It has become one of the industry's most-watched indicators of AI scale and adoption. But tokens only tell half the story. They measure input — how much the machine consumed, how hard it worked. The other half of the equation — what actually got produced, what tasks were completed — remains uncounted. At Baidu Create 2026, Robin gave his answer about that missing half: DAA, or Daily Active Agents. It shifts the question from "how much was consumed?" to "how many agents are actively working and delivering results?” The Agent Era Has Arrived — and It Needs a Different Scoreboard Something has shifted in the past year. AI is no longer sitting inside a chat window waiting to be asked questions. It is out in the world, completing tasks, making decisions, and running operations. Agents are handling customer inquiries, optimizing port logistics, generating marketing content, scheduling factory floors — autonomously, continuously, and at scale. This is a different kind of AI activity from anything we've measured before. And it requires a different kind of metric to capture it. There are two metrics commonly used in the AI industry measure success today: DAU and token consumption. Both offer a starting point, but they weren't built for what AI is becoming. DAU was built in the mobile internet era for the attention economy: whoever captures more of users' time wins. That logic worked for apps. It doesn't work for agents. An agent doesn't "open" anything. It either finishes the job or it doesn't. And the numbers bear this out. According to Counterpoint Research's Q1 2026 data, some of the world's most-used AI products — measured by DAU — are not necessarily the ones generating the most revenue. User scale and business value have been quietly decoupling. Token consumption gets closer, but it still isn't sufficient. Tokens measure input — how much compute was consumed, how hard the machine worked. Not what it produced. Gartner noted in a recent report that token consumption doesn't effectively reflect business value, efficiency, or sustainability. A system can burn through billions of tokens and deliver nothing of consequence. As the industry matures, the gap between "how much the AI consumed" and "how much the AI delivered" becomes impossible to ignore. DAA closes that gap. It counts completed task loops — agents that took on work and actually finished it. It measures output, not activity. Delivery, not consumption. Robin's prediction at Create 2026: global DAA could eventually exceed 10 billion. That number reflects something important about how agents scale differently from users. One person can run many agents simultaneously. Agents multiply capacity rather than compete for attention. The ceiling on what becomes countable is unlike anything DAU ever imagined. Three Things Are Evolving at Once To understand why DAA matters right now, it helps to look at what Robin described as an "AI evolution theory": three simultaneous shifts, all pointing in the same direction. The first shift is in the agents themselves. Early agents answered questions. Current agents complete tasks. The next wave does something more interesting — agents evolve, learning from every task they run without anyone intervening. In the mobile era, software improved on a schedule set by developers. In the agent era, improvement is continuous and self-directed. The second shift is at the individual level. Someone working alongside a team of agents can now accomplish what used to require a full team of people. Builder, founder, creator, all in one person. The productive capacity of a single individual is being fundamentally re-scaled. The third shift is organizational. The basic unit of a company is moving from "people coordinating with people" to what Robin calls "mixed human-agent formations." Agents are now embedded in the middle of a workflow instead of sitting at its edge, handling tasks that used to require dedicated headcount. None of this happens in isolation. Smarter agents make individuals more capable. More capable individuals change how organizations are built. And as organizations restructure, the appetite for better agents grows. DAA is the number that captures all of this in motion — how much of this new productive capacity is actually being used, every single day. The Agents Behind the Number: DuMate, Miaoda, Yijing, and Famou A metric is only as meaningful as what it measures. At Create 2026, Baidu introduced a new generation of agents that are already putting DAA to work. DuMate is Baidu's general-purpose agent. It doesn't answer questions one at a time — it runs tasks in parallel. Handle the inbox, analyze the sales data, draft the marketing copy. Simultaneously. In the international agent benchmark PinchBench, DuMate ranked first globally with a 93.3% task completion rate. It's also a single entry point: one prompt can route through Baidu Search, Miaoda coding agent, Famou agent, and many other capabilities, all at once. Miaoda and MeDo are the coding agents. Over a million applications built. More than 10 million users. 81% of them with no coding background. Robin's framing was blunt: development costs are collapsing toward zero. "Disposable software" — built for a single purpose, a single moment — is now a real idea. Global developers can already access MeDo, Miaoda's international version, at medo.dev. Baidu Yijing is a digital human platform integrated with live streaming, video production, and real-time interaction across 12 languages, with native-level lip-sync. As Robin put it: "A digital human is simply an agent you can see. Equipped with voice, facial expressions, and gestures, a digital human is more expressive and more trustworthy." This year, a Coca-Cola World Cup TVC was produced through Yijing — five characters, five city styles, all directed and edited by AI. Production time was cut by more than half. Famou Agent 2.0 is the self-evolving decision agent. It works in operations environments: manufacturing scheduling, logistics planning, and process optimization. At one of the world's most automated terminals, Famou delivered a 10.21% efficiency improvement on top of an already optimized baseline. That translates to roughly one million additional standard containers processed per year. Underneath all of this is Baidu's full-stack infrastructure — computing, cloud, models, and agents — rebuilt for the agent era. Robin was direct: "AI is not just a model. It is a system. It is a new generation of computing." Baidu AI Cloud has been repositioned as a full-stack AI cloud purpose-built for large-scale agent workloads. From Consumption to Delivery: What DAA Is Really Asking the Industry to Do As agents move from novelty to necessity — running in ports, factories, classrooms, and boardrooms — how we measure AI starts to matter a great deal. That's why DAA asks the harder questions: did the agent actually deliver? Was the task completed? Did something real happen as a result? Metrics shape behavior. If you measure token consumption, you build for scale. If you measure daily active agents, you build for outcomes. That is what DAA is really challenging the industry to do — to reorient around a different definition of what success looks like. Here's a quick look at what else has been happening at Baidu this month: > Q1 2026: AI Business Crosses a New Threshold - For the first time, Baidu Core AI-powered Business represented more than half of Baidu General Business revenue, bringing in over RMB 13.6 billion in Q1 2026, up 49% year-over-year. - Growth was broad-based across AI Cloud infrastructure, AI applications, and Apollo Go. Full report here. > Apollo Go: 3.2 Million Fully Driverless Rides Delivered in Q1 - Apollo Go had a strong start to the year. In Q1 alone, we delivered 3.2 million fully driverless rides, with total rides continuing to grow at a triple-digit rate year-over-year. Over 22 million cumulative rides are provided to the public as of April 2026. - It also continued expanding internationally, as the global footprint reached 27 cities as of May 2026. The driverless operations are now running across multiple zones in Dubai, with the Apollo Go App launched in March, while it is on track to commence open-road testing in Switzerland, and to begin testing in London with Uber and Lyft soon. > Baidu AI Cloud Ranked No. 1 in Autonomous Driving R&D Solutions in China - According to IDC China's H2 2025 report, Baidu AI Cloud captured over a third of China's autonomous driving R&D solutions market, ranking first. - It now serves the top 15 auto brands by sales and the top 10 NEV companies in China, helping automakers move autonomous driving from R&D into mass production. > ERNIE 5.1 Is Now Available - We launched our latest foundation model that builds on ERNIE 5.0, with upgrades across search, reasoning, knowledge Q&A, creative writing, and agentic capabilities at around 6% of the pre-training cost of comparable models. - On LMArena's Search Leaderboard, ERNIE 5.1 scored 1,223 to rank 4th globally and 1st among Chinese models. Try it at ernie.baidu.com. > MSCI ESG Rating Upgraded to AA - We released our annual ESG report. Our MSCI ESG rating was raised to AA, and we were included in S&P Global's Sustainability Yearbook 2026. - From accessible mobility to closing the AI skills gap, we're working to make sure more people can share in the progress AI brings. Full report here. Have a question about building with MeDo, or something you'd love us to cover next? Leave a comment or DM us! Until our next roundup, keep up with our latest AI developments and innovations by following us on LinkedIn and X.

译在百度Create 2026大会上，CEO李彦宏提出了DAA（每日活跃智能体）新指标，用于衡量AI智能体的实际任务完成情况。该指标旨在解决现有DAU（仅反映用户规模）和token消耗量（仅反映模型投入）的局限性。据Goldman Sachs Research预测，智能体AI将驱动token消耗量到2030年增长24倍，但投入不等于产出。DAA则直接计数成功完成工作循环的智能体，衡量的是交付成果而非活动量。李彦宏预测，全球DAA最终可能超过100亿。

🚨 AI News | TestingCatalog@testingcatalog · 5月31日57

Anthropic is planning to further expand into the consumer and bioscience sectors. The biggest things to watch for 👀 - Conway agent - Orbit assistant - Knowledge-based memory - Multilingual Voice Mode - Operon for bioscience researchers and more! Which one do you think will drop next?

译Anthropic计划进一步扩展至消费与生物科学领域，并预告了多款即将推出的产品，包括Conway agent、Orbit assistant、知识记忆、多语言语音模式以及面向生物科学研究的Operon。引用观点指出，Anthropic选择先聚焦编程，但随着Claude的智能提升，其应用将扩展到人类智能能发挥作用的各个领域。

Peter Steinberger 🦞@steipete · 5月31日61

The idea of OpenClaw is always that it should be yours. It's modular and lean, only add what you need. Fewer skills, fewer tools = your agent can work more efficiently.

译OpenClaw的理念始终是它应该属于你。它是模块化且精简的，只添加你需要的功能。更少的技能，更少的工具 = 你的智能体可以更高效地工作。

meng shao@shao__meng · 5月31日74

Agent: OpenAI Codex + Tools: Google 全家桶、WhatsApp、电报、浏览器自动化等 + Data: Google Drive、Notion、AGENTS.md 等 + Skills: inbox-zero、contacts 等 == 个人生活自动化 Agent 工具栈 @nicbstme 提出的两个典型工作流 1. 介绍邮件（跨 5 个工具的「胶水活」）朋友 WhatsApp 求助 → 搜 WhatsApp/Gmail 找邮箱 → 网页查公司融资 → 起草介绍信 → 等批准 → 发邮件 → WhatsApp 告知完成。人工约 20 分钟、大量上下文切换；用户侧约 10 秒提需求。Agent 做的是跨 App 的编排，不是回答问题。 2. 车牌更新（行政连续性）发照片给 Agent → 更新 Drive 里的 Markdown 车辆档案 → 保留 VIN、保险等字段 → 上传回 Drive → 必要时用浏览器自动化同步到 FasTrak、停车 App、保险门户等无 API 的系统。体现的是行政连续性：同一份信息在多处保持一致，而非一次性问答。最重要的架构决策：Drive 作为 Source of Truth Nicolas 刻意把知识从 Notion 迁到 Google Drive，原因很务实： · Notion 对人友好，对 Agent 不友好（嵌套页面、数据库属性、权限、UI 原生结构） · Drive + Markdown/CSV：可搜索、可 diff、可编辑、可上传、可引用 file ID · gogcli 提供统一的 CLI 面（Gmail、Drive、Calendar、Docs、Sheets 等）组织知识不应只为人类 UI，而应面向 Agent 的工具路径。稳定 file ID、纯文本、表格、返回 JSON 的命令——这些才是 Agent 友好的数据形态。联系人 CSV（电话、邮箱、LinkedIn 等）被作者称为「最佳投资之一」，因为它是跨渠道 lookup 的枢纽。工具优先级（可靠性层级） API / CLI > 本地文件 > 浏览器自动化 > 屏幕/UI 自动化 Agent 的可靠性上限取决于工具面。gog gmail messages list --json 比让模型在网页上点来点去更稳定、可重试、可推理。浏览器和屏幕自动化是必要时的兜底，不是主路径。 Skills：Agent 的「习惯」与「品味」 Skill 不是 fancy 架构，就是可迭代的操作手册。以 inbox-zero 为例： · 列出收件箱 → 区分自动归档 / 需人工审阅 · 展示重要邮件、引用原文、建议归档或回复 · 起草后等明确批准再发送 · 保留所有收件人、回复简短、不主动建议电话、签名用 "Nicolas" 没有 Skill，每次都要重新 prompt 所有偏好；有了 Skill，说「run inbox zero」即可。个人 Agent 的个性化，来自操作品味的累积，而非 cute voice。反馈闭环： · 工具失败 → 修工具或加 guardrail · 判断失误 → 更新 Skill · 忘记偏好 → 写入 memory / AGENTS.md · 工作流重复 → 体系 compound 改进批准门控：信任分级才是产品 Nicolas 明确反对「YOLO 全自动」： · 低 stakes 可直接发（如「告诉 Hugo 我下周在西雅图」） · 高 stakes 必须：读上下文 → 起草 → 展示 → 等批准 → 执行 → 确认。有用 vs 可怕的分界，在于是否在正确时刻问人。「杀手级」工作流：What did I miss? 比单点邮件更重要的，是生活收件箱 triage： · 每隔几小时问一句「我漏了什么？」→ Agent 扫描 WhatsApp、Telegram、Gmail、SMS、Calendar、Drive 变更 → 汇总：谁需要回复、什么 urgent、什么 stale、什么可忽略、什么该建日历、什么要查文档。特点：上下文重、重复、跨工具、充满小决策——人讨厌做第一遍扫描，Agent 擅长第一遍，判断权仍在人。复现清单（Nicolas 给出的路径） 1. 装 Agent 运行时 + 各渠道 CLI/连接器 2. 集中数据：Drive 为真相源，联系人 CSV，重要文档可搜索化 3. 谨慎授权：Full Disk Access、Screen Recording、Accessibility——必须配合同级 approval gates 4. 写 operating rules（AGENTS.md）：draft before send、工具路由、隐私边界等 5. 为重复流程写 Skills，并在每次失误后更新

译该推文介绍了以OpenAI Codex为核心的个人生活自动化智能体工具栈。它集成了Google全家桶、WhatsApp、电报及浏览器自动化等工具，并以Google Drive作为“真相源”数据层。核心是跨应用编排与判断，关键决策需经人工批准。技能（如inbox-zero）是可迭代的操作手册，用于固化偏好。典型的“介绍邮件”编排展示了Agent在处理多工具、高上下文切换任务时的效率。工具优先级为API/CLI > 本地文件 > 浏览器自动化。

Greg Brockman@gdb · 5月31日33

codex computer use is viscerally compelling

译Codex的计算机使用体验令人震撼

宝玉@dotey · 5月31日55

通用 Agent 就是未来的操作系统了，就像现在我们操作电脑需要借助操作系统，以后我们跟 AI 通信会通过 Agent OS。 App 会有几种结局： - 消亡：Agent 自己就有能力，不需要独立的 App - 变成 CLI 或者 MCP：搭配 Skill 去让 Agent 调用，用户不需要直接操作 App，Agent 帮助调用 - Agent GUI 插件，或者说 Agent App：有些能力 Agent OS 满足不了的，必须通过 GUI 去手工操作下的，还需要做成插件，按照需要让 Agent 唤起给人临时用一下在未来一段时间，SaaS 会有个趋势，都要推出 cli + Skill，让 Agent 学会用它，这样才能保住客户，不至于被淘汰掉。

译推文认为，通用AI智能体将成为未来的操作系统，当前的App将演变为三种形态：被其内置能力取代而消亡、转化为CLI或MCP形式通过技能供其调用、或作为其GUI插件补充图形界面操作。为此，SaaS服务需推出CLI + 技能模式以适应趋势。

宝玉@dotey · 5月31日51

Kimi Code、DeepSeek Harness 最好尽早做 GUI，尽早支持好办公任务，做通用 Agent。卷 TUI 卷 Coding 没前途，当然 Coding 是基础能力，如果 Coding 都做不好其他任务也不会做得好。

译推文呼吁 Kimi Code、DeepSeek Harness 等 AI 编程工具应尽早提供图形界面（GUI），并拓展对通用办公任务的支持，以进化为通用 Agent。作者认为，仅在终端界面（TUI）和单一编程能力上竞争没有前途，尽管编程是核心基础。同时，推文引用并关注了另一个新选手 Grok Build，指出其更新迅速、潜力较大。

Orange AI@oran_ge · 5月31日48

AI 这么刚需的东西微信官方应该早点自己支持他们的 agent 至少应该支持吧听说张小龙亲自操刀如果不支持 md 渲染… 有点说不过去

译推文批评微信作为主流通讯工具，却不支持 Markdown 和 HTML 文件格式的渲染与便捷打开，导致文件分享封闭，尤其在移动场景下造成困扰。作者呼吁微信应更早重视并支持这类基础功能，并特别指出“AI这么刚需的东西”，微信至少应该在其智能体（Agent）功能上提供良好支持。引用推文也反映了相同的痛点：周围人频繁使用 Markdown 和 HTML 发文件，但微信对此一窍不通且封闭。

宝玉@dotey · 5月31日61

像我日常会几个 Agent 一起用：Codex、Claude Code、Cursor、GitHub Copilot，这些 Agent 各有所长，或者有时候要集众家之长。 Matt 这个 Sandcastle 就是用 TypeScript 脚本来编排 Workflow，可以把这些 Agent 编排在同一个 WorkFlow 中一起来完成一些任务，可以在虚拟机中运行。但过于极客不太适合普通用户，一般的场景真用不上，适合一些追求极致的场景。举例来说你要赛博养蛊：写个技术方案，让各个 Agent 一人出一套，再相互打分完善。

译Sandcastle是由@mattpocockuk开源的一个TypeScript工具，允许用户通过脚本编排Workflow，在虚拟机中协同调用Codex、Claude Code、Cursor、GitHub Copilot等多个AI智能体来完成复杂任务。它定位为面向追求极致效率场景的极客工具，适用于需要多智能体协作或“赛博养蛊”式的任务，例如让各智能体分别生成技术方案再相互评审完善。

Nathan Lambert@natolambert · 5月31日62

Given that Claude seems so lazy in chat (especially with technical search topics), it seems pretty telling about how a harness can make a model far more independent and thorough. GPT 5.5, and many of OpenAI's recent models, seem incredibly thorough -- like they won't give up -- and the codex harness is a much lighter change on the model. Of course I have a lot of uncertainty here, but it's surprising to me how weak Claude's search is when I try the Claude app again. I only use ChatGPT for research, but Claude Code can do wonderful things like getting exactly the right figures from papers I know and insert them into a slide deck. Interesting times ahead!

译用户指出，Claude在普通聊天中（特别是技术搜索）表现较懒散，但通过Claude Code编程智能体，却能精准获取所需论文图表并完成任务。相比之下，GPT 5.5和OpenAI近期模型表现得极为彻底和坚持不懈，而Codex harness（编程工具框架）对模型的改造相对更轻量。核心对比在于不同模型与不同工具框架结合后，在搜索与研究任务上的表现差异。

AYi@AYi_AInotes · 5月31日50

holy shit，大家平心而论的说，第一直觉是AI还是真人？如果不做标注你能看出来这是AI吗？那些演技烂到家的流量明星得失业了吧！

译推文探讨了使用AI的两种范式：一是“agent型”（如Claude Code、Codex），自主执行；二是“实习生型”（如Cursor），需人协作判断。作者认为后者才是真正的“以术入道”过程，能磨练个人判断力。为解决Cursor等工具需人在场的瓶颈，作者推荐了网易“UU远程”，其支持手机远程连接Mac，提供4K 144帧流畅体验及原生终端。核心观点是：AI发展的关键不在于更强大的模型，而在于建立一种随时能与AI共同思考的连接方式，最终助人成为更优秀的提问者。

elvis@omarsar0 · 5月31日67

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them (and in whatever form) as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe @karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations.

译在需要长时间运行的动态工作流、大型代码库处理或深度研究任务中，聊天窗口不足以展示成果。HTML Artifacts提供了必要的验证与决策层，已成为作者与AI智能体协作的核心界面。作者将其广泛用于日志记录、实验跟踪、头脑风暴、代码审查、智能体会话管理、深度研究与写作等场景，并构建了标签页系统进行管理。文章最后引用Karpathy的观点：随着智能体应用走向更高级、输出更复杂，我们将需要包括交互式神经视频/模拟在内的更高级交互形式。

elvis@omarsar0 · 5月30日63

Increasingly, HTML Artifacts are becoming a core part of how I work with AI agents. Long-horizon agent sessions need a better way to surface insights about what work it has done. This may not be obvious right now, but as you start to let your agent work on dynamic workflows, large codebases, long-running loops (e.g., using /goal), and deep research tasks, you need a good way to present results. Chat window is not it. You also don't want to just trust everything the agents do. Artifacts help provide an important verification layer, which in turn enables important decision-making. I like HTML artifacts because I can just ask the agent to produce as many of them -- and in whatever form -- as I need to verify the work and make sense out of everything. I even built a nice tab system for my artifacts. They are great for continual learning and research. I use HTML artifacts for logging, tracking experiments, brainstorming, managing my inbox, code reviews, agent session management, deep research, writing, reading, and so much more. I believe @karpathy wrote about this somewhere: As we move on to more advanced applications of AI agents and outputs get more complex, we will start to find the need for even more advanced forms of interactions with AI, including interactive neural videos/simulations. I did a talk on LLM Wikis and HTML artifacts recently, if you are curious to learn more on the topic: https://academy.dair.ai/events/cmovobp97000904l5h0n9a2yz

译作者指出，HTML工件正日益成为其与AI智能体协作的核心媒介，尤其在需要呈现长程任务成果的场景中。随着智能体处理动态工作流、大型代码库及深度研究任务，传统聊天窗口已力不从心。HTML工件提供了关键的验证层，使用户能审核智能体的工作成果并作出决策。作者在日志记录、实验跟踪、头脑风暴、代码审查等众多任务中应用HTML工件，并提及Karpathy关于未来需要更高级AI交互形式（如交互式神经模拟）的观点。

StepFun@StepFun_ai · 5月30日67

Step 3.7 Flash, free for 30 days for Hermes Agent users. What could possibly go wrong? 🍿 Thanks @NousResearch for making it happen. Can’t wait to see what Hermes users build!

译Step 3.7 Flash，Hermes Agent 用户可免费使用 30 天。还能出什么问题？🍿 感谢 @NousResearch 促成此事。迫不及待想看 Hermes 用户们会构建出什么！

AYi@AYi_AInotes · 5月30日57

Damn，The crowning moment of Tesla's Full Self-Driving in China！这个真的要卧槽一下，太他么震撼了谁不想拥有一辆这样的Tesla啊😭 要不是亲眼所见，我是万万不敢相信Tesla FSD已经天下无敌了，就这个会车，别说新手司机，我这个十年老司机也没把握啊，这特么才是真正的遥遥领先啊，以后没有在实战里检验过不要吹自己遥遥领先好吧🐶 视频来自抖音大胡L5，最近疯狂吹FSD，怀疑老哥是不是拿了特拉斯的赞助😂

译推文感叹Tesla FSD在中国路测中的会车能力表现惊艳，堪称“遥遥领先”。引用推文进一步探讨了AI工具的使用本质，提出工具分为替人思考的“Agent型”和与人共思的“实习生型”（以Cursor为代表），后者是使用者“以术入道”、磨炼判断力的过程。其关键瓶颈是必须在场，而作者通过免费工具UU远程（4K 144帧、原生终端支持）在手机上远程操控运行Cursor的Mac，解决了此限制。

Peter Steinberger 🦞@steipete · 5月30日51

With GPT 5.5, /goal, autoreview and crabbox my prompts moved from ~30-60min to often 4-10h tasks and my confidence that it’s ready is much much higher. Yielding agents is a skill.

译使用 GPT 5.5、/goal、autoreview 和 crabbox 后，我的提示词任务从约30-60分钟变成了常常4-10小时的任务，而我对结果准备就绪的信心也大大提高了。让智能体屈服是一种技能。

Rohan Paul@rohanpaul_ai · 5月30日69

This survey suggests over 80% of companies have seen no productivity gains from AI so far, despite billions in spending. Among 6,000 executives, 1/3 of leaders said they use AI, but only for 90 minutes a week. This is even though most respondents believe AI will increase productivity by 1.4%, cut staff by 0.7%, and boost output by 0.8% in the next 3 years. Of the executives, a third said they use AI at work, but only around 1.5 hours per week on average. Meanwhile, 25% of those surveyed have not used AI yet. --- nber .org/papers/w34836

译一项对6000名高管的调查显示，尽管投入巨大，超过80%的公司尚未从AI中获得生产力提升。仅1/3的领导者使用AI，且平均每周使用时间仅约90分钟。不过，多数受访者预期AI未来三年内将提升生产力。与此同时，Goldman Sachs预测AI智能体的Token使用量到2030年将增长24倍，因其任务循环消耗的Token可能远高于普通对话。智能体生产力与Token消耗之间的平衡，正成为企业新的成本考验，微软近期收紧了对Claude Code的访问即是一例。

Chubby♨️@kimmonismus · 5月30日63

A team of former DeepMind researchers just raised $50M to build an AI lab built around recursive self-improvement at the level of the whole research organization, not only a single model. Index and Radical co-led, NVIDIA's venture arm is in, and angels like Dwarkesh Patel, Thomas Wolf and Max Jaderberg are on the cap table. The founders have the track record to back it up. Louis Kirsch comes out of the Schmidhuber lineage on self-improving systems. Edward Hughes has argued that open-endedness is essential for artificial superhuman intelligence. Tantum Collins worked on AI policy in the Biden White House. Their idea is simple and big at the same time. Today's models are great at answering questions, but real discovery also depends on knowing which questions are worth asking. Inherent wants AI that works right next to humans inside that loop, as a collaborator and not only a tool. They call it living within the experiment. They also set it up as a Public Benefit Corporation, so the mission is written into the company from day one. This is the direction a lot of us have been hoping for, and one of the more credible attempts at recursive self-improvement I've seen so far. Really excited for it.

译由前 DeepMind 研究员创立的 AI 实验室 Inherent 完成了 5000 万美元的种子轮融资，由 Index Ventures 和 Radical 共同领投，NVIDIA 旗下风投部门 NVentures 参投。创始团队包括 Louis Kirsch、Edward Hughes 和 Tantum Collins。该公司旨在构建能够主动发现新知识的 AI 智能体，其核心理念是实现整个研究组织的“递归自我改进”，使 AI 成为人类研究中的协作伙伴。Inherent 被定位为一家公共利益公司，总部位于伦敦。

Rohan Paul@rohanpaul_ai · 5月30日54

Goldman Sachs: "Token use by AI agents is expected to multiply 24 times by 2030" AI agents are now creating the first serious cost test for the AI boom. As was reported this week, Uber and Microsoft are already rethinking expensive agent usage. A chatbot may answer once, but an agent plans, calls tools, checks results, edits mistakes, and repeats the loop. That loop can make one user request consume 10x, 50x, or even far more tokens than a normal answer. Goldman’s bullish case is that monthly token use could reach 120 quadrillion by 2030, while inference cost per token keeps falling 60%-70% per year. The fight is now between agent productivity and token waste. Earlier this month, Microsoft began revoking developer access to Claude Code, with plans to move them to its in-house Copilot Command Line Interface tool by June 30. The company has framed this as consolidating teams around its own tools, but the timing at the fiscal year’s end hints it may also be about lowering costs.

译高盛预测，到 2030 年，由 AI 智能体驱动的模型 token 月度消耗量将激增至 120 quadrillion，较当前增长约 24 倍。核心原因是智能体在完成单次用户请求时，需要进行多轮工具调用、结果检查与修正，导致其 token 消耗量可能达到普通问答的 10 倍甚至 50 倍以上。这一趋势引发了成本担忧，Uber 和 Microsoft 等公司已开始重新评估昂贵的智能体使用方案。报告同时指出，推理成本正以每年 60%-70% 的速度下降，智能体带来的生产力提升与潜在的 token 浪费正成为新的博弈焦点。

🚨 AI News | TestingCatalog@testingcatalog · 5月30日51

COPILOT 🔥: Microsoft is working on its own super app along with a new always-on agent called Scout according to Sources. This UI looks familiar 👀

译COPILOT 🔥：据消息人士透露，微软正在开发自己的超级应用，并计划推出一款名为Scout的全新常驻智能体。这个界面看起来很熟悉 👀

Elon Musk@elonmusk · 5月30日40

Grok Build is moving fast

译xAI 持续更新其智能体编码工具 Grok Build，最新版本为 v0.2.11。本次更新重点包括：集成了 𝕏 搜索和更快的网页搜索；新增了 `/export`、`/login` 等多个命令。平台支持扩展至 Windows ARM64 和 macOS x86_64。在智能体方面，子智能体现在可以共享终端后端与调度器，并增加了主动系统提醒。用户体验上，终端视频播放提升至 30fps，优化了链接交互与计划模式。稳定性方面，默认重试预算增加，并修复了多项渲染问题。该工具正从早期 CLI 快速发展为严肃的智能体编码环境。

AYi@AYi_AInotes · 5月30日71

holy，发现OpenAI 的野心是真大啊，Codex可能是接下来普通人能用到的最顶的生产力工具🤔 昨晚OpenAI发布的Codex Windows版Computer Use，说是人类工作方式的一次范式级转变也不为过吧哈哈。这Codex on Windows 视频，我看很多人都在截图那个 goblin，但我觉得更值得看的是 goblin 出现前的那 3 秒。那 3 秒里，Codex 收到了一条指令：「测试我正在做的 WinUI 应用」，兄弟们注意，这个不是"打开浏览器搜一下"，也不是"写段代码"，就是要测试一个 Windows 原生桌面应用，然后它打开了 Paint，选画笔、调颜色、一笔一笔拖出一个完整图案，整个过程特别丝滑流畅！这意味着至少有两件事： ① Codex 的 vision-action loop 已经能处理像素级 GUI 操作，不是点按钮，是真的在画画，这东西之前只在 macOS 上见过，现在 Window 版本竟然一步到位了！ ② 官方刻意用 WinUI 测试开场，是在暗示：这不是玩具喔，我们在给开发者和企业用的真实生产环境！也就是说以后只要手机端发一句「整理我的 Slack sections」，Windows 端就开始在桌面 App 里干活，这套"手机指挥 + 桌面执行"的逻辑可能才是这次更新的真正骨架。打个比方说，Goblin 是烟花，WinUI + Slack就是弹药。我给这条视频做了完整的中英双语字幕，enjoy it!

译OpenAI发布了Codex的Windows版Computer Use功能。根据演示视频，Codex已能处理像素级的Windows原生GUI操作，例如在收到“测试我正在做的WinUI应用”指令后，能直接打开Paint（画图）应用，流畅地选择工具、调色并一笔一笔绘制图案，这展示了其vision-action loop的成熟度。官方通过WinUI测试等场景，暗示该功能面向真实的生产环境。同时，通过ChatGPT移动应用，用户可以从手机端启动、审核和引导Codex在Windows机器上执行任务，形成“手机指挥+桌面执行”的跨设备协作流程。

宝玉@dotey · 5月30日70

我今天对群聊总结的 Skill 更新了个小功能，在群里 @bot，总结群聊记录的时候，就可以结合聊天记录的上下文，在总结的时候回复问题具体参考：https://github.com/JimLiu/baoyu-skills/commit/a85c81e8db8a19a633e30dda0823e8a9c686263d

elvis@omarsar0 · 5月30日65

In a few months, people will start to realize how fundamentally important MCP for agents is. It's not even about connecting tools. There are many ways to do that. It's about the types of abstraction it already enables. My new self-improving system, enabled through agent-to-agent interaction, is all powered by MCPs. This was not an accident. I ran my entire orchestrator through a self-improving loop with clear criteria/goal, and it came up with all kinds of interesting ways (mostly powered by MCP tools) on how to enable complex interactions, versioning, eval workflows, communications, tools, etc. Something new could always emerge, but I think the protocol itself will be crucial and necessary for all the advancements ahead. MCP is the future. And I am glad a lot of it is built in the open.

译作者认为MCP（模型上下文协议）对AI智能体的未来至关重要，其核心价值不仅在于工具连接，更在于它所启用的抽象能力。作者以自身构建的自我改进系统为例，该系统完全通过MCP驱动，展示了MCP如何赋能智能体间交互、实现复杂协调、版本控制、评估工作流及工具集成等关键功能。作者强调，尽管新事物可能不断涌现，但MCP协议本身对于未来所有进展将是必要且关键的基础。

meng shao@shao__meng · 5月30日59

Salesforce 工程如何从 Copilot 走向 Agentic？来自 Salesforce 的分享，讲述了工程团队如何从「工程师 + 更强 Copilot」，进化到把 SDLC 的执行层逐步交给 Agent，人负责目标、规则、验收与复利的「Agentic 工程」： https://www.salesforce.com/news/stories/how-engineering-became-agentic/ 团队经历了两个阶段： 1. AI 嵌入旧流程：高 adoption（他们曾 >90%） 2. 用 AI 拆掉 handoff、低价值流程：Agent 驱动写码/审 PR/测试/文档/部署三个撬动变革的杠杆： 1. 工具收敛 + 零摩擦 — 全组织 Claude Code，取消 token 上限 → 信号是「深度用 Agent 被允许、被期待」。 2. 规则即代码 — Markdown 规则 + 参考实现；PR 反馈写回规则 → 精度复利，而非每次重 prompt。 3. 自治 + 并行 — build/fix/validate 闭环少介入；隔离环境并行出 PR。案例（33 API / 231 人天 → 13 天）：证明的是「可规则化 + 可自动验证」的任务，不是一切研发。变革中的数据体现： · PR +79%、有效产出 +151% → 吞吐与「有效价值」在涨。 · 事故 -5% → 他们在争「快 ≠ 烂」；但指标自研（Engineering 360），因果未公开。真正信号：下游（review/测试/发布）没被上游加速压垮，而是 Agent 也接住了下游 —— 否则只会「代码洪水」。新核心能力从写代码 → 三件事： · 把问题拆成 Agent 能执行的结构与验收标准； · 判断委派 vs 留在环内； · 沉淀 Skills / CLAUDE.md / 规则库（团队复利资产）。工程师在变成 Agent 工作流的设计者与所有者。对咱们做工程有帮助的 3 条 1. 先找「规则清晰 + 测试可自动验」的活（迁移、补测、文档同步），别先让 Agent 写模糊需求。 2. 建「PR 反馈 → 规则」闭环，这是 18 倍案例里唯一可低成本复制的内核。 3. 同时改度量与安全：没有 Effective Output 类指标 + Agent 执行权治理，上游加速只会制造 review/事故债。

译Salesforce 分享了工程团队从“Copilot 辅助”演进到“Agentic 工程”的路径，即让智能体承担软件开发生命周期的执行层，工程师专注于目标、规则与验收。关键变革包括：全组织采用 Claude Code 并取消 token 限额、推行“规则即代码”（Markdown 规则+参考实现）、以及自治与并行。一个原估 231 人天的 API 迁移案例，仅用 13 天完成。变革成果体现在：PR 数量增加 79%，有效产出增加 151%，事故减少 5%。真正的信号是下游流程也被智能体接住，避免了“代码洪水”。工程师的核心能力转变为设计智能体工作流与沉淀规则库等复利资产。

歸藏(guizang.ai)@op7418 · 5月30日59

Windows 有救了？ Codex 昨晚又发布了大量体验更新，很多都超级有用，尤其是对 Windows 用户。大家最期待的支持了 Windows 的 computer Use，以及通过移动端的 ChatGPT 远程控制 Windows 上的 Codex。但是 Windows 上的 Computer Use，在它控制的时候你是不能控制的。这跟 Mac 上的不太一样。聊天记录控制功能：能够控制你的聊天记录，比如置顶、查找、创建新的归类、存档等一系列操作。新增个人资料页面：能看到你消耗的 Token、连续登录天数、最长的任务。 ChatGPT 上的 Codex 控制，现在支持了以下功能：侧边对话：你可以不打断主对话，新开一个分支继续跟 AI 基于当前项目或任务进行对话，输入/side 开启一键模型切换：通过长按即可快速切换模型。 iPad 专属快捷方式：支持从快捷方式直接进入 Codex，无需通过侧边栏切换。 Git Diff 显示：对话结束后，系统会显示 Git 上的 Diff，即你代码编辑的不同部分。

译Codex发布体验更新，支持Windows的Computer Use功能，但控制时用户无法操作。新增聊天记录管理（置顶、查找、归类）和个人资料页面（显示Token消耗、连续登录天数、最长任务）。ChatGPT上的Codex控制增加侧边对话（/side命令）、一键模型切换、iPad快捷方式，并在对话后显示Git Diff。

宝玉@dotey · 5月30日57

Codex 现在可以自己管理自己的会话了。创建会话、搜索会话、整理归档、置顶重要的、还能为并行任务拉起独立的 worktree，全都可以通过对话指令完成。 Codex 开始操作自己的界面了。

meng shao@shao__meng · 5月30日36

如何构建你自己的 Agent Harness？先看几个问题： · 生产级 Harness 是“选一个框架”就能搞定的吗？ · 生产级 Harness 必须承担的 15 项真实职责是什么？ · 每项职责如何做成可安装、可版本化、可换语言的 worker？ · 单次 turn 如何跑通？ · 策略、审批、预算、trace 在生产级 Harness 里为什么重要？ @mfpiccolo 在他的「How to Build Your Own Agent Harness」中给出了完整答案，强烈建议阅读原文： https://iii.dev/blog/how-to-build-your-own-agent-harness/

译如何构建你自己的 Agent Harness？先看几个问题： · 生产级 Harness 是“选一个框架”就能搞定的吗？ · 生产级 Harness 必须承担的 15 项真实职责是什么？ · 每项职责如何做成可安装、可版本化、可换语言的 worker？ · 单次 turn 如何跑通？ · 策略、审批、预算、trace 在生产级 Harness 里为什么重要？ @mfpiccolo 在他的「How to Build Your Own Agent Harness」中给出了完整答案，强烈建议阅读原文： https://iii.dev/blog/how-to-build-your-own-agent-harness/ [引用 @mfpiccolo]：http://x.com/i/article/2060024515619397638

宝玉@dotey · 5月30日71

Q：我把数据库接入了 AI Agent，每次用手机发消息让它帮我查数据、导出文件，但 token 消耗特别大。我已经把工作流写进了 Agent 的 Memory 里，但它就是不按流程走，该怎么办？ A：这是一个非常典型的问题。根本原因在于：Memory 只是“背景信息”，不是“执行指令”。Agent 每次对话都会重新理解意图、重新规划步骤，这个思考过程本身就是 token 消耗的大头。解决方案：用 Agent Skill + Script 替代 Memory 里的工作流。核心思路是把任务拆成两部分： - LLM 只做它擅长的事——把自然语言翻译成 SQL 查询语句 - 确定性的步骤全部用脚本——执行 SQL、格式化结果、上传文件，这些不需要 AI 思考，写成 Python/Shell 脚本直接跑再进一步，在 Skill 里内嵌你的表结构说明和常用 SQL 模板，Agent 只需要填空而不是从零推理。改完之后 token 消耗能降一个数量级。一句话总结：能用脚本干的事别让 LLM 干，LLM 只负责翻译，不负责执行。

译指出将工作流写入Memory方案的根本问题在于Agent需每次重新理解意图，导致token消耗大且不稳定。最佳实践是采用“Agent技能+脚本”架构：LLM仅负责将自然语言转译为SQL，所有确定性步骤由脚本执行。此方案可大幅降低token消耗。