codex thursday~ boy is it a bad day to me a manual workflow that crosses application boundaries on your computer

译向Codex演示一次工作流后，即可将其保存为可复用的技能。Record & Replay让Codex学习重复任务（如报销、请假），并转为可检查、可编辑的技能。用户可控制录制的起止。Jason Liu感叹：跨应用手动工作流的日子不好过了。

Claude Code can now upload and edit HTML artifacts that you can share with your team or other Claudes! Starting with teams so you can share internally with your team, coming to Pro and MAX plans soon!

译Claude Code 现在可以上传和编辑 HTML 工件，你可以与你的团队或其他 Claude 共享！从团队计划开始，以便你在内部与团队共享，即将在 Pro 和 MAX 计划中推出！

OpenAI Developers@OpenAIDevs · 6月19日57

Show Codex a workflow once. Reuse it as a skill. Record & Replay lets you show Codex a recurring task, like filing an expense report or submitting a time-off request. Codex turns that demo into an inspectable, editable skill. You control when recording starts and stops.

译向 Codex 展示一次工作流，就能将其作为技能复用。录制与回放功能让你可以向 Codex 展示重复性任务，比如提交费用报告或请假申请。Codex 会将那段演示转化为可检查、可编辑的技能。你可以控制录制的起止时间。

François Chollet@fchollet · 6月19日48

When I was playing RTSes, I generally thought about strategy in terms of resource utilization. For instance, in any game that has a unit hp passive regeneration mechanic, any unit that is full-hp represents a wasted resource (you could be gaining hp during that time, so you are net behind). Today, if you are paying for a fixed-price agentic coding subscription, any week you end below your weekly token quota represents a wasted resource. Utilize your token regeneration mechanic.

译当我玩即时战略游戏时，我通常会用资源利用率来思考策略。例如，任何具有单位生命值被动回复机制的游戏中，满血单位都代表着资源浪费（因为这段时间本可以回复生命，所以实际处于落后状态）。如今，如果你为固定价格代理编码订阅付费，任何一周低于周token配额则同样意味着资源浪费。请善用你的token回复机制。

swyx@swyx · 6月19日37

completely unprompted wow moment from today - asked @DevinAI to make us a @tbpn style breaking news style announcement card for our AIEWF speakers drop tmr, FULLY expecting it to fail at a heavily visual task and it oneshotted the WHOLE DAMN THING

译今天完全无提示的惊喜时刻——让 @DevinAI 为我们制作一张 @tbpn 风格的突发新闻公告卡，用于明天 AIEWF 演讲者阵容发布，我本以为它会在高度视觉化的任务上失败。结果它一次性搞定了整个活儿。

Deedy@deedydas · 6月19日66

Pretty neat that with one URL change, you can now replicate and iterate on AI papers without having to even provision your own GPUs

译只改一个URL就能复现和迭代AI论文，甚至无需自备GPU，这相当不错。

xAI@xai · 6月19日38

Grok models are now available on Databricks Agent Bricks. Bring SpaceXAI's latest models to your enterprise data to power capable AI agents. https://x.ai/news/grok-databricks

译Grok 模型现已可在 Databricks Agent Bricks 上使用。将 SpaceXAI 的最新模型引入您的企业数据，以驱动强大的 AI 智能体。 https://x.ai/news/grok-databricks

Lee Robinson@leerob · 6月19日37

The Cursor Slack has bots solving customer issues, followed by other bots reproducing and confirming fixes. All built on our SDK!

译Cursor Slack 中有机器人解决客户问题，然后其他机器人复现并确认修复。全部基于我们的 SDK 构建！

AYi@AYi_AInotes · 6月19日69

用Codex写代码，最贵的一步是上来就写，把Review环节往前挪一步，返工率砍半。三个层级按需拿走： 1️⃣零成本即用版，把这段话贴在需求最前面： “先别写代码。先复述你对任务的理解，我最想解决的问题是什么，哪里还有歧义，直接开写最可能误解哪。最后给执行计划。” 2️⃣官方内置版，输入 /plan 或按 Shift+Tab， Codex会自己收拢上下文，抛澄清问题，输出完整执行计划再动手，需求越模糊越适用。 3️⃣一劳永逸持久化版，在 AGENTS.md 里写入强制前置规则，让它每次接任务先深度思考，复述需求，识别风险，再执行，不用重复贴指令。好的Agent从来不是反应快和撸代码的手速快，兄弟们记住，必须是先搞对方向，再跑速度。

译用 Codex 写代码时，将 Review 前置可显著降低返工率。作者总结三个层级：零成本版（粘贴提示要求先复述任务再执行）、官方内置版（/plan 或 Shift+Tab 触发计划）、持久化版（AGENTS.md 写入前置规则）。UCSD 黄碧薇教授深耕因果 AI 12 年，提出 AI 四代演进：相关性小模型→因果小模型→相关性大模型（LLM）→因果大模型。其团队开发的 causal-learn 入选 Apple Scholar。今日 Aether AI 完成首轮融资，被视为从堆参数转向下一代 AI 范式的信号。

elvis@omarsar0 · 6月18日68

Microsoft Teams just got its first AI employee. I tested it. A real AI employee that lives in the channel, does the work, and proposes the next move. Not another prompt box. Worth a look. @viktor__com

译Viktor AI智能体正式入驻Microsoft Teams，直接嵌入频道，用户@提及即可获得完成的工作，无需学习或提示。Viktor已在Slack实现2000万美元年化经常性收入，现扩展至拥有3.2亿用户的Teams。新用户可获100美元免费积分，无需绑定信用卡。其目标是以零门槛让每个职场人获得AI价值。

elvis@omarsar0 · 6月18日40

Cool paper on Skill routing for LLM agents. Real tasks rarely map to a single skill. They need several composed together, but most skill routing still treats the problem as picking one tool from a library. This work formalizes Compositional Skill Routing, decomposes a complex query into atomic sub-tasks, retrieves the right skill for each, and then composes an executable plan. The system, SkillWeaver, pairs an LLM decomposer with a bi-encoder FAISS retriever and a dependency-aware DAG planner. It comes with CompSkillBench, 300 compositional queries over 2,209 real skills, so the multi-skill case gets measured directly. Why does it matter? As skill libraries grow, single-skill retrieval quietly caps what an agent can do. The DAG planner turns retrieved skills into an ordered, dependency-respecting plan. Paper: https://arxiv.org/abs/2606.18051 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译传统LLM智能体技能路由仅从工具库选取单一技能，难以应对多技能组合的真实任务。本文形式化定义“组合式技能路由”，将复杂查询分解为原子子任务，为每个子任务检索对应技能并组合成可执行计划。系统SkillWeaver由LLM分解器、双编码器FAISS检索器和依赖感知DAG规划器构成。同时发布CompSkillBench基准，含300个组合查询和2,209个真实技能，直接评估多技能路由能力。DAG规划器将检索技能转化为有序、尊重依赖关系的计划。

Rohan Paul@rohanpaul_ai · 6月18日66

Microsoft Teams is becoming interesting again. Viktor, the AI employee, has just launched inside Microsoft Teams. It sits inside the channel, reads context, remembers prior work, and sends back completed output. Viktor also says it crossed $20M ARR inside Slack before reaching Teams. And Teams has 320M users. AI Interface should disappear into the place where work already happens. @viktor__com

译Viktor（AI 员工）登陆 Microsoft Teams，可嵌入频道、读取上下文、记忆先前工作并返回完成输出。用户只需 @Viktor 即可协作，无需学习或提示。Viktor 此前已在 Slack 实现 $20M 年化经常性收入（ARR），而 Teams 拥有 3.2 亿用户。新用户可获 $100 免卡信用额度。

🚨 AI News | TestingCatalog@testingcatalog · 6月18日71

Microsoft Teams users can now hire Victor as an AI employee to get support with their goals. Viktor can read from and write to more than 3,000 tools and maintain persistent memory across sessions, so it picks up where a team left off instead of starting over each day. Zeta Labs reports SOC 2 Type 1 certification and says @viktor__com is officially approved by Microsoft for Teams.

译Zeta Labs 的 AI 员工 Viktor 已登陆 Microsoft Teams。Viktor 可读写超 3000 个工具，并具备跨会话持久记忆，团队可接续工作。该产品已获 SOC 2 Type 1 认证及微软官方批准。此前在 Slack 中，Viktor 仅凭单一应用、无销售团队与推广即实现超 2000 万美元年经常性收入。面向 Teams 3.2 亿用户，Viktor 采用零门槛设计：用户只需 @提及即可完成任务，无需学习、提示词或理解 AI 能力。新用户获赠 100 美元额度，无需绑定信用卡。

Chubby♨️@kimmonismus · 6月18日67

Viktor is now showing up in Microsoft Teams, Agents are thus entering the next significant work environment. The promise is exciting: one AI the whole team shares, that remembers your work and ships finished output, not just answers. They proved that on Slack and reached $20M+ annualized revenue run rate since launching. Now we get to see if it holds up at Teams scale.

译此前在 Slack 上已实现 2000 万美元年化经常性收入（ARR），无销售团队、无大规模铺开。Viktor 主打零门槛：用户无需学习、无需写提示词，像@同事一样 @提及 Viktor 即可获得完成的工作，甚至可以不提及它。团队称其目标是让 3.2 亿 Teams 用户无需培训就能直接获得 AI 产出。新用户获赠 100 美元额度，无需绑卡。

Kimi.ai@Kimi_Moonshot · 6月18日43

Introducing Goal Mode in Kimi Work Goal lets your desktop agent run 24/7 until the task is done, built for long-horizon tasks and complex multi-step workflows.

译在 Kimi Work 中推出 Goal Mode Goal 让你的桌面智能体 24/7 运行，直到任务完成，专为长周期任务和复杂多步骤工作流打造。

Alibaba Cloud@alibaba_cloud · 6月18日54

🇯🇵 Expanding AI infrastructure for Japan's agentic AI future. Alibaba Cloud has launched its 5th data center in Tokyo and brought Model Studio to Japan, enabling enterprises to build next-generation AI agents with the latest Qwen models. Building the foundation for the agentic AI era. Get API：https://int.alibabacloud.com/m/1000414648/

译🇯🇵 为日本智能体AI未来扩展AI基础设施。阿里云已在东京启用其第五个数据中心，并将Model Studio引入日本，使企业能够利用最新的Qwen模型构建下一代AI智能体。为智能体AI时代奠定基础。获取API：https://int.alibabacloud.com/m/1000414648/

小互@xiaohu · 6月18日56

Apodex ：一个面向深度研究而打造的 Self-evolving heavy-duty solver 专门解决那种"没有现成答案、需要大量调研才能搞定"的硬问题可一次最多派出 150 个子 Agent 并行探索，总共能跑 15,000 步在 BrowseComp 上超越了 GPT-5.5-pro，在 DeepSearchQA 上超越了 Claude-Opus-4.8 和 Kimi-K2.6... 在科研和金融领域具有强大的研究能力 ... 它的工作步骤是：深度研究—自我校验—撰写主要特点： 1、多 Agent 团队协作：主 Agent 接到任务后拆解成子问题，异步派发给专业化的子 Agent，每个子 Agent 有自己独立的上下文、提示词和工具集。子 Agent 的报告汇入共享报告池，编排器异步读取，不会被最慢的那个卡住。单任务最高可调度 150 个子 Agent，执行超过 15,000 步。 2、内置三层自我验证机制：当子 Agent 报告出现分歧时，冲突审查员介入；具体声明需要落地时，事实检查员介入；草稿完成时，草稿审查员过一遍。最后还有一个全局验证器对所有汇集的证据做终审。验证器在结构上是独立于推理器的，被提示去"评估"而不是"继续推理"，可以推翻前面的结论。 3、由一个专门的 AgentOS 驱动：与执行任务严格分离，它只负责通用的底层事务： - Agent调度 — 150 个子 Agent 谁先跑谁后跑，资源怎么分配。 - 模型和工具路由 — 这个子任务该调哪个模型、该用哪个工具（搜索引擎、代码执行器、数据库等）。 - 事件流 — 子 Agent 之间怎么传递消息和状态更新。 - 检查点和追踪 — 跑到哪一步了，出错了能不能回滚。 - 成本记账 — 这个任务总共调了多少次 API，花了多少钱。 - 权限管理 — 哪些工具允许用，哪些数据允许访问。这种设计好处是：当你添加新应用时候，只需一个插件代码文件夹，底下的调度、路由、记账、追踪这些基础设施全都现成的，不用动内核一行代码。

译Apodex专为解决无现成答案的硬问题设计。可同时派出最多150个子Agent并行探索，总步数超15,000步。在BrowseComp上超越GPT-5.5-pro，在DeepSearchQA上超越Claude-Opus-4.8和Kimi-K2.6。工作流程分深度研究、自我校验、撰写三阶段。内置三层自我验证机制（冲突审查员、事实检查员、草稿审查员）及独立全局验证器。由AgentOS负责调度、路由、事件流、检查点、成本记账、权限管理等底层事务，添加新应用只需插件代码，无需修改内核。

Alibaba Cloud@alibaba_cloud · 6月18日31

Disrupting GenAI Costs: Alibaba Cloud's Strategy. Takahito Naito (Managing Executive Officer, CyberAgent) and Takeshi Kurita (Regional Manager of Japan and Korea, Alibaba Cloud) discuss the strategic utilization and future of enterprise AI models. 👉 https://xtech.nikkei.com/atcl/nxt/special/18/00001/060300084/ #AlibabaCloud #CyberAgent #CloudComputing #GenerativeAI #Qwen #AgenticCloud

译颠覆GenAI成本：阿里云的战略。 Takahito Naito（CyberAgent董事总经理）和Takeshi Kurita（阿里云日本韩国区域经理）讨论企业AI模型的战略利用与未来。 👉 https://xtech.nikkei.com/atcl/nxt/special/18/00001/060300084/ #AlibabaCloud #CyberAgent #CloudComputing #GenerativeAI #Qwen #AgenticCloud

meng shao@shao__meng · 6月18日82

Vercel 开源了他们的 Agent Framework「Eve」 Agent 即目录，生产级能力开箱即用，把反复出现的 Agent 形态抽象成框架，让开发者只写「做什么」，而不是「怎么跑起来」。 https://vercel.com/blog/introducing-eve 核心设计：Agent 是一个目录 agent/ agent.ts # 模型与配置 instructions.md # 系统提示 / 人格 tools/ # 可执行能力 skills/ # 领域知识（Markdown） subagents/ # 子 Agent 委托 channels/ # Slack、Discord 等入口 schedules/ # 定时任务 connections/ # MCP / OpenAPI 外部连接内置的生产能力 · 持久会话：每轮对话是可 checkpoint 的 durable workflow（基于开源 Workflow SDK），可暂停、崩溃/部署后恢复 · 沙箱：Agent 生成代码与主应用隔离；本地 Docker/microsandbox，部署用 Vercel Sandbox，可写 adapter · Human-in-the-loop：工具上设 needsApproval，暂停不占算力，审批后从断点继续 · Connections：MCP / OpenAPI 以文件声明；鉴权由框架代理，模型不接触 URL/凭证；Vercel Connect 处理 OAuth · 多 Channel：同一 Agent 服务 HTTP、Slack、Discord、Teams 等；Channel 间可 handoff · Tracing & Evals：OpenTelemetry 标准 trace；eve eval 可本地或 CI 跑，作部署门禁开发与部署流程 · 本地：eve dev → TUI 可见每步（load_skill、tool call 等）；底层是 HTTP API，CI/脚本也可驱动。 · 部署：vercel deploy，Agent 即普通 Vercel 项目；部署不中断进行中的会话（在启动版本上跑完）。沙箱等通过 adapter 切换，代码不变。 · 接入团队：eve channels add slack 生成 channel 文件；审批在 Slack 里点按钮；schedules/ 用 cron 定时触发（部署为 Vercel Cron Job）。 · 工程化：Agent 进 Git（prompt/工具/skill 都有 diff 和 review）；Preview 部署可提前测 Slack bot；eve eval 进 CI 防回归。 Vercel 内部验证 · d0：月 3 万+ 问数，权限与提问者对齐 · Lead Agent：自主 SDR，年成本约 $5k，回报约 32 倍 · Athena：RevOps 6 周无工程师搭建，接 Snowflake/Salesforce · Vertex：约 92% 工单自动解决 · draft0：内容审阅流水线 · V：路由 Agent，统一入口分发到百级 Agent 舰队

译Vercel 发布开源 Agent 框架 Eve，核心设计“Agent 即目录”：通过 agent.ts、instructions.md、tools、skills、subagents、channels、schedules、connections 等文件声明行为。内置持久会话（可 checkpoint）、沙箱隔离（本地 Docker/Vercel Sandbox）、Human-in-the-loop 审批（不占算力）、MCP/OpenAPI 连接（鉴权由框架代理）、多 Channel 支持（HTTP/Slack/Discord）、OpenTelemetry 追踪与 eve eval 门禁。本地 eve dev TUI，部署为普通 Vercel 项目，不中断进行中会话。内部已验证：d0 月 3 万+ 查询，Lead Agent 年成本约 $5k 回报 32 倍，Vertex 约 92% 工单自动解决。

meng shao@shao__meng · 6月18日52

Codex Automations 的内外双循环两类上下文 · 任务前上下文：历史、事实、约束、关系、既有决策，来源有检索、工具、记忆 · 任务后上下文：保留、修改、删除、发送、搁置，来源有人工审阅行为任务前上下文决定第一次能不能写对；任务后上下文揭示什么才算「对」。双循环架构，就是分别系统化这两类信息。 # 内循环：把上下文带进任务内循环负责：要不要回 → 找什么 → 怎么写 → 怎么验 → 产出可审草稿。三个要点： 1. 检索即写作好回复依赖相似邮件、半年前的决定、项目状态、权威来源等。目标不是搜全，而是找到最小、足够让回复准确且具体的信息集。 2. 工作流可固定，也可智能体化既可以是「拉信 → 过滤 → 分类 → 起草 → 校验」的固定流程，也可以是「每天早上 9 点为我需要回复的邮件建草稿」这类自然语言指令，由 Codex 自行决定步骤。关键不在形式，而在检索是否内嵌于写作。 3. 动作可逆只建草稿，不自动发送。审阅前保存：提议回复、来源、提示词与写作指引版本。没有这层记录，审阅只是轶事；有了，审阅才是可复用的证据。 # 外循环：从审阅中回收上下文外循环在审阅之后启动，先看结果类型： · 原样发送 → 草稿有效 · 改后发送 → 最有价值的 before/after · 删除 → 可能写错，也可能本不必回（难判） · 搁置 → 信息不足，不宜过度解读即使已发送，也只记录「你接受了什么」，不代表对方满意或任务完成；真正效果可能在后续往来里才显现。但审阅本身已是写稿时不存在的证据。草稿与终稿之差 = 证据，不等于教训。 · 开头变短 → 可能是写作偏好 · 补了事实 → 可能搜错地方 · 删掉承诺 → 可能需要新的校验规则 · 整段重写 → 可能是应保留的人类判断外环的真正工作：读懂 diff 的含义，而不是把每次修改都写进 prompt。 # 外循环如何改进内环外循环只问一个问题：下次怎样能更接近你第一次就满意的版本？答案可能是：写作指引、新数据源、新检索步骤、对「 unsupported commitments 」的检查、更早交给你人工处理——不必每条 edit 都变成规则。实践上： · 经你批准的教训，放进简单 markdown · 外环提议更新；你决定采纳 · 内环下次起草前读取该文件今天纠正的，变成明天运行的上下文——这是外环对内环的闭环。 # 双循环，双时钟 · 内循环：快（如每 2 小时），快响应、低延迟 · 外循环：慢（日末 / 满 N 条审阅 / 每周），太频 → 从个案过拟合；从不跑 → 修正被遗忘内外循环速度刻意错开：内循环服务即时效率，外循环服务模式与稳定改进。同一结构可用于：邮件、deck、报告、简报、issue 分诊等——凡是有「起草 → 人审 → 发送/修改/丢弃」的流程，都适用。

译邵猛详解 Codex Automations 的双循环架构：内循环负责将上下文带入任务，通过“检索即写作”、可逆动作（只建草稿不自动发送）等原则快速产出可审草稿；外循环在人工审阅后启动，通过草稿与终稿的 diff 提取证据，区分修改类型（写作偏好、事实补漏、承诺删除等），将经批准的教训写入 Markdown 供内循环下次使用。双循环速度错开：内循环快（如每 2 小时），外循环慢（日末/满 N 条审阅/每周），平衡即时效率与模式改进。适用于任何“起草→人审→发送/修改”的流程。

宝玉@dotey · 6月18日50

这篇文章写的有点玄乎，估计是为了蹭现在流行的 Loop Engineering 的概念。核心内容是两个循环：内循环和外循环所谓内循环，就是干活的定时任务，每 2 小时检查下有没有新邮件，如果有邮件，就去自动检索相关上下文，自动帮你写个草稿，但是不发送，让你自己修改后发送。所谓外循环，就是一个自进化自学习的 Skill，每次你对 AI 写的草稿修改，都根据修改记录去优化你的 Skill，让它下次能写的更好。这有点像我以前介绍过的写作风格 Skill，你可以把自己的文章提炼出一个写作风格 Skill，然后先用它生成，生成完了修改，修改完了让 Agent 根据你修改的内容去完善写作风格 Skill，越来越懂你。我当时是手动做，他这样做成一个自动化循环应该会更好一点。

译一篇介绍AI自动回复邮件的“内循环”与“外循环”设计的文章。内循环是定时任务每2小时检查新邮件，自动检索相关上下文生成草稿但不发送，供用户手动修改后发出；外循环则是自进化的Skill，每次用户对草稿的修改都会被Agent记录，用于不断优化写作风格Skill，使其生成内容更符合用户习惯。作者类比了自己以前手动提炼写作风格Skill的做法，指出该方案将迭代过程自动化，形成持续改进的闭环。

eric zakariasson@ericzakariasson · 6月18日28

mobilemaxxing cursor (app is soon GA)

译现在可以更轻松地将本地智能体迁移到云端，合上笔记本后它们仍可继续工作。你还能从手机向Cursor发送提示词，并行运行多个智能体，并收到带有演示的拉取请求。Cursor移动端应用即将正式发布。

Chubby♨️@kimmonismus · 6月18日58

Email is one of the last martech layers still stuck in its own dashboard while the rest of the stack moved into the agent. Nitrosend is a bet that it doesn't have to be: one MCP install and the whole email layer runs from inside Codex, Claude, or ChatGPT. The signal worth noting is the team behind it, the Hartley brothers, who built SmartrMail into a platform that sent billions of emails before selling it in 2022.

译Nitrosend 通过一次 MCP 安装，让邮件系统直接在 Codex、Claude 或 ChatGPT 内部运行，彻底摆脱传统仪表盘。其团队 Hartley 兄弟曾创立 SmartrMail，发送数十亿封邮件并于 2022 年出售。他们认为仪表盘曾是瓶颈，而非产品本身，Nitrosend 正是移除这一瓶颈的产物。

Rohan Paul@rohanpaul_ai · 6月18日56

Genspark's newly launched AgentBase feels like a serious step toward the “build your own internal software” era. Take the data already sitting in your inboxes, files, apps, and databases, then turn it into a CRM, HR system, project tracker, dashboard, or internal tool in minutes. Once the data is structured, Genspark Super Agent can help draft emails, run research, build decks, create dashboards, and set up workflows.

译Genspark发布AgentBase（预览版），可将电子邮件、文件、应用和数据库等现有数据转化为CRM、HR系统、项目追踪器、仪表盘等内部工具，几分钟即可搭建。兼容Salesforce、HubSpot等现有系统，通过一句话提示即可自定义仪表盘和工作流。配合Genspark Super Agent，还能完成起草邮件、研究、构建演示文稿、创建工作流等任务。目标是用一个平台替代30+ SaaS工具。

elvis@omarsar0 · 6月18日70

You can only truly get this level of output when using orchestrator agents that can coordinate multiple agents across projects. Build your own orchestration layer now. And own it.

译金融科技公司 Block 自建内部 AI 系统 Builderbot，可跨整个代码库协调多个智能体。工程师在 Slack 中标记后，系统自动研究、规划并交付。当前日处理 20 万次操作，每周合并 1500 个 pull request，贡献了 Block 全部生产代码变更的 15%，将原需数月的流程缩短至数天。DAIR.AI 创始人 Elvis Saravia 强调，只有通过编排层协调多个智能体才能实现此类输出，建议团队自建编排层。

Ethan Mollick@emollick · 6月18日43

Big issue with AI strategies at big companies which realized the importance of AI last year (which is only a small subset, most are still not moving fast) is that, in the best case, they developed their strategy in late 2025, before the agentic revolution Things changed since...

译大型公司AI策略的一个大问题是，那些去年意识到AI重要性的公司（实际上只是很小一部分，大多数仍然行动缓慢），最多是在2025年底、智能体革命之前制定了他们的策略自此，情况已变...

ChatGPT@ChatGPTapp · 6月18日50

New in ChatGPT: a better way to schedule tasks. Scheduled tasks are faster, more reliable, and easier to manage from the new Scheduled page. The new scheduled tasks experience is rolling out to Go, Plus, Pro, Business, and Enterprise users on web and mobile.

译ChatGPT 新功能：更好的任务计划方式。计划任务更快、更可靠，新的 Scheduled 页面也让管理更轻松。新的计划任务体验正在向 Go、Plus、Pro、Business 和 Enterprise 用户逐步推出，支持网页和移动端。

🚨 AI News | TestingCatalog@testingcatalog · 6月18日50

Apodex has released Apodex 1.0, a verification-centric deep research agent that searches the web, synthesizes evidence, and generates reports in which every claim is backed by an auditable chain of evidence. In heavy-duty mode, Apodex 1.0-H runs an async team of up to 150 sub-agents, with a global verifier checking the assembled evidence before any answer is committed. Evidence over generation 👀

译Apodex 推出 Apodex 1.0，一款以验证为核心的深度研究智能体，能够自主搜索网络、综合证据，并生成报告，其中每个声明都附带可审计的完整证据链。重载模式 Apodex 1.0-H 可运行多达 150 个子智能体的异步团队，并由全局验证器在交付前检查所有已收集的证据。官方宣称该方案达到 SOTA 水平。

elvis@omarsar0 · 6月18日60

Highly-recommended reading! After using /loops & /goal throughout my projects, I believe that verifiers and robust guardrails are imperative to get current/future coding agents to work right. You can't just YOLO your way with blind autonomous loops. It doesn't work!

译Rahul 指出，fable+ 类模型本质是英语→代码解释器，Fable 5 最差。diff 大小按风险管控：高风险区（身份/数据/网络/资金）用小 diff，可经验验证代码用大 diff。软件交付速度取决于审查/合并能力而非 PR 生成，瓶颈在 lint、测试、CI、影子验证。智能体需深度理解全栈，风险优先级为安全>正确性>性能。复杂性成本变化，可能值得多维护 50% 代码换 5% 性能提升。低风险时将代码块作黑盒仅做经验验证。逻辑逐行审查成本高，只用于关键处。更快迭代需借助权限 opt-in、影子模式等护栏。

Jim Fan@DrJimFan · 6月18日81

I made Physical AutoResearch sound simple (conceptually), but it took a village to pull off and lots of design thinking into the robot /loopcraft. The hardest part is everything we need to setup *before* pressing Enter. Here's a behind-the-scene tour: 1. Safety harness Letting 8 robots run unattended overnight means safety has to be more than a hint in the system prompt. ENPIRE hardwires it in 2 layers: (1) hard kinematic limit that trips an immediate task failure and auto-resets as soon as a robot leaves its safety envelope, and (2) a torque-limited compliant gripper so a bad contact or misaligned insertion ends in a safe stall, instead of crushing the robot or the object at hand. We make safety more conservative than usual so humans can sleep tight. In reality, we still need a few human operators to watch over the "robots of loving grace". 2. Definition of /done An agent that can edit its own reward will game it for sure. ENPIRE fixes the goalposts before the fleet can move them. Here's the recipe: Collect a few minutes of success & failure demos -> Ask agent to write code using computer vision tools to classify success and measure against groundtruth -> Agent hill-climbs on classifier until reliably good -> This classifier becomes the real-time reward function that directly computes on sensor streams -> *Freeze* the reward function before AutoResearch. It's sacred, enshrined in a Gym env that no one can touch. 3. System telemetry design Robot-seconds is by far the scarcest resource, followed by GPU-seconds, and finally tokens. We instrument all three and surface them to ENPIRE for live resource awareness rather than letting it hill-climb in a vacuum. We define: - Mean Robot Utilization ("MRU"): the fraction of wall-clock time when the robot is actively executing an experiment. Otherwise the hardware is sitting idle and waiting for the next code commit. - Mean Token Utilization ("MTU"): tokens consumed per minute, our proxy for how hard the agent is actually thinking. A low MTU means the agent is stalled, waiting on a robot rollout to finish instead of doing research. - GPU utilization: fraction of wall-clock time when GPU is active. ... and evaluate on two budget-to-outcome metrics: 1. Tokens-to-Success: token budget the fleet burns to complete /goal. 2. Time-to-Success: wall-clock time to /goal

译NVIDIA GEAR实验室推出ENPIRE系统，首次实现物理世界自主研究。系统让8个Codex智能体控制8台机器人，配备GPU和token预算。安全方面采用硬运动极限切断和扭矩受限夹爪两层硬件保障，支持通宵无人运行。奖励函数通过视觉分类器离线固定并冻结，防止智能体作弊。实时监测机器人利用率（MRU）、token利用率（MTU）和GPU利用率，以Tokens-to-Success和Time-to-Success评估效率。ENPIRE自主完成扎带、整理细针、安装GPU等高精度任务，发现8机器人并行探索显著更快。系统将开源。

Chubby♨️@kimmonismus · 6月18日30

I've been working with Tavus for a while now, but this is simply amazing. I'm serious, it's like science fiction. An avatar that I can collaborate with on the PC feels exactly like the future from Star Trek, just as I've always imagined it. It's fantastic. Computer use + voice model + avatar – that's the way forward!

译1987年苹果展示的Knowledge Navigator——能看见用户、控制电脑、外观和声音都像人类的AI助手，近40年后被Tavus在Cerebras支持下变为现实。新推出的Dom具备computer use、语音模型和数字人形象，用户可与之在PC上协作操作。推文作者称这如同科幻成真，像《星际迷航》中的未来。

Yuchen Jin@Yuchenj_UW · 6月17日77

The future of coding is not one agent. It's a whole AI team. Omnigent lets you run a team of agents in one live session: Claude Code, Codex, Cursor, Pi, and your own agents. It is a meta-harness for AI agents, built from our internal Databricks dev tools, and now open-sourced for everyone. Built by the legendary @matei_zaharia and the Databricks AI team. And yes, Matei still writes a lot of code, even the frontend code for Omnigent and our products.

译编程的未来不是单一智能体，而是一个完整的AI团队。 Omnigent让你在一个实时会话中运行一个智能体团队：Claude Code、Codex、Cursor、Pi，以及你自己的智能体。它是一个面向AI智能体的元框架，基于我们内部的Databricks开发工具构建，现已开源给所有人。由传奇人物@matei_zaharia和Databricks AI团队打造。没错，Matei仍然编写大量代码，包括Omnigent和我们产品的前端代码。

elvis@omarsar0 · 6月17日42

eve looks like a very promising agent framework. Built-in: - Durable execution - Sandboxed compute - Human-in-the-loop approvals - Subagents - Evals - and more I like the emphasis on evals right away. Should I do a tutorial on it?

译eve 看起来是一个非常值得期待的智能体框架。内置： - 持久执行 - 沙箱计算 - 人工介入审批 - 子智能体 - 评估（Evals） - 以及更多我很欣赏它一开始就强调评估。我是否应该做个教程？

Rohan Paul@rohanpaul_ai · 6月17日61

Every workflow is now shifting toward AI agents. Nitrosend just made agent-powered email automation. It lets Codex, ChatGPT, Claude, Cursor, Gemini, or any MCP agent build and send branded email campaigns from one prompt. It continously learns from sends, so subject lines, timing, and content can improve from a company’s own data instead of generic email advice. As I run my own newsletter, I know nobody enjoys setting up email flows. Now email can live in Codex. I don't leave, I don't switch tabs, I just type what I need and keep working. Here's an example, just one prompt in ChatGPT

译Nitrosend 推出基于 AI 智能体的邮件自动化工具。它允许 Codex、ChatGPT、Claude、Cursor、Gemini 或任何 MCP 智能体通过一个提示词构建和发送品牌邮件活动。系统会持续从发送数据中学习，自动优化主题行、发送时机和内容，而非依赖通用建议。引用@gthartley 称，传统邮件仪表盘运行了二十年，但仪表盘本身就是瓶颈——Nitrosend 移除了它。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日54

Nitrosend has launched an AI-native email platform that users can run from within Codex, ChatGPT, Claude, Cursor, or any other MCP agent. It can create newsletters, transactional messages, and branching sequences, all in editable markup. Vibe emailing 👀

译Nitrosend 推出了一个 AI 原生邮件平台，用户可在 Codex、ChatGPT、Claude、Cursor 或其他 MCP agent 中运行。它可以创建新闻邮件、事务性消息和分支序列，全部采用可编辑标记。 Vibe emailing 👀

meng shao@shao__meng · 6月17日62

Exa 正式发布「Exa Agent」：托管式 Web Research Agent API，把前沿模型与 Exa 自研搜索工具链打包成单一接口，面向「深度调研、名单构建、实体 enrichment」三类任务 https://exa.ai/blog/exa-agent # 技术路径：三层叠加 1. 任务分解 + 并行子 Agent 面对大规模数据集或宽口径调研，系统会把任务拆成多个子任务，按领域并行派生子 Agent。这是典型的 Map-Reduce 式研究架构，适合 WideSearch 类「多实体 × 多字段」任务。 2. Model Fusion（模型融合）不固定用单一最强模型，而是按任务动态混用 frontier 模型与高性价比模型，在质量与成本之间做路由。Blog 未披露具体路由策略，但方向清晰：把算力花在「难的地方」，简单子任务用便宜模型。 3. Token 效率：Highlights 模型 Blog 再次强调 Exa Highlights——据称可将 token 用量最高削减 94%。对 Agent 工作流而言，这直接决定：同样预算下能读多少网页、做多少轮检索，是成本优势的重要来源之一。 # 评测设计：WideSearch 与 Row-F1 Blog 重点展示了 WideSearch 基准（2025年8月发布），任务形态是：从全网聚合原子信息，输出结构化表格（实体 + 多列 enrichment）。 Exa 采用的评分方式是 Row-F1： · 一行算成功，必须实体匹配正确 + 所有必填列均有效 · 他们曾试过 Cell-level F1，但认为过于宽松——单列对了、实体错了也会得分这个选择本身合理：更贴近 B2B 场景（CRM enrichment、竞品表、融资名单）的真实需求，而不是学术 QA 的「部分正确也给分」。 Blog 图表将 Exa Agent High 与 Perplexity Agent Pro、Parallel Task Ultra、Opus 4.8、GPT 5.5 对比，维度是 Row-F1 vs 单次查询成本，Exa 在 Pareto 前沿上占优。 # 应用场景 1. Finance Agent 实时抓取全网财务/融资/产品动态，聚合为自定义格式 2. GTM / Sales 自带账户列表做 enrichment，或由 Agent 生成数十至数百实体名单 3. Company Research 多维度公司简报（融资、产品、合作、高管、GitHub 等） 4. Literature / Code Review 文献综述、代码相关调研

译Exa 正式发布 Exa Agent，一个将前沿模型与自研搜索工具链打包成单一接口的托管式 API，面向深度调研、名单构建和实体 enrichment。核心技术包括：任务分解 + 并行子 Agent（Map-Reduce 架构）；按任务动态混用前沿模型与经济模型的 Model Fusion；Highlights 模型可将 token 用量最高削减 94%。在 WideSearch 基准上采用 Row-F1 评分，Exa Agent 成本不到 GPT 5.5 和 Opus 4.8 的一半，处于 Pareto 前沿。应用场景涵盖金融、GTM/Sales、公司研究及文献/代码 review。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日41

Capafy ❤️ Hermes Capafy has added support for Hermes Agent from @NousResearch, letting users who build Hermes skills publish them on the Capafy marketplace. It keeps the logic closed-source, and the skill creator earns a commission every time someone uses it.

译Capafy 是一个基于技能的 AI 智能体市场，现已支持 NousResearch 的 Hermes Agent。用户可发布 Hermes 技能并保持闭源，每次被他人使用时技能创建者即可获得佣金。Capafy 提供发布、运行和变现技能的平台，让开发者通过闭源技能持续盈利。

Berryxia.AI@berryxia · 6月17日71

卧槽！微信首个AI Agent 来了～前几天看到支付宝的 AI agent 出来了，这不，微信的 AI agent 马上紧随其后！微信直接给AI Agent发了一张专属支付卡，让它能在对话里从推荐到下单支付一条龙完成。现在用户可以在支持的AI Agent对话里直接买东西，微信推出了“AI专属卡”。这张卡和你的微信支付主账户完全独立，Agent只能用卡里的余额消费，你随时决定充多少、提多少。目前已经通过WorkBuddy（企业微信Mac版）里的美团生活助手实现，能让AI帮你买团购券并到店核销。以前AI Agent最多能推荐、规划，现在直接能帮你把钱花出去。关键是隔离设计，用户不用担心AI乱花主账户的钱，控制权完全在自己手里。这其实是把支付能力提前嵌入到了Agent工作流里，让“聊着聊着就买了”成为现实。这波操作把AI从聊天工具推向了真正能完成商业闭环的阶段。微信又一次把基础设施提前卡位，等Agent真正普及的时候，支付这块已经被它先占住了。未来Agent-to-Agent自动议价、下单、结算的场景，也可能先在微信生态里跑起来。以前大家觉得AI Agent离真正改变生活还很远，现在连消费环节都被微信给打通了。那么问题来了，果然大厂抢先抢占的支付入口，而不是别的。这个离钱最近吧～

译微信紧随支付宝推出“AI专属卡”，让AI Agent在对话中完成推荐、下单到支付全流程。该卡与微信支付主账户完全隔离，Agent只能使用卡内余额，用户可随时充值或提现。目前功能已通过WorkBuddy（企业微信Mac版v5.1.1以上）中的美团生活助手实现，支持AI帮用户购买团购券并到店核销。这一设计将支付能力嵌入Agent工作流，既保障资金安全，又打通商业闭环，为未来Agent间自动议价、结算铺路。

karminski-牙医@karminski3 · 6月17日73

GLM-5.2 刚刚正式发布! 给大家带来实测! 直接说结论本次测试中, 提升最大的是Agent能力, 而且是有质的变化! 测试中GLM-5.2 完全不用搜索附近的位置, 就能直接去想要到达的地方. 这一切竟然是它在一开始把地图背下来了! 这在我测试的20多个模型中之前是没有一个模型能做到的, 比如之前的模型想去换电站, 那么都要搜一下附近有哪些换电站(这就会浪费一次tool_call), 而GLM-5.2直接就知道换电站的位置! 从来没用过搜索函数. 这种一开始就把需要的数据内化到上下文中, 并且能够贯穿整个1M上下文进行推理的能力真的是叹为观止. 除此之外, 本次测试后端代码的 Agentic Coding 能力也有提升, 来到了总榜的第二名. 而本次测试暴露出最大的短板则是空间理解. 其实成也萧何败也萧何, 它虽然把换电站的位置都背下来了, 但是去的换电站却不是最近的, 所以虽然记住了, 但是记住了之后在用之前再根据自己当前所在位置推理一下, 他还是没有做到的, 这也是最大的短板了, 强烈建议官方优化一波. #GLM52 #智谱 #智谱AI #AgenticCoding #长上下文能力

译GLM-5.2 正式发布，实测显示其 Agent 能力有质的变化。该模型能将地图数据内化到 1M 上下文中，直接知道换电站位置，全程未调用搜索函数，在测试的 20 多个模型中唯一能做到。后端 Agentic Coding 能力提升至总榜第二名。短板是空间理解：虽记住换电站位置，但无法根据当前位置推理最近站点。

Alibaba Cloud@alibaba_cloud · 6月17日47

Bring Qwen into the physical world! 🤖 Welcome to the EdgeAgent Arena! Build robots & IoT devices that perceive via edge sensors and act locally to win your share of the $70,000+ prize pool. 🔗 Register now: https://click.qwencloud.com/m/20000000281/

译将Qwen带入物理世界！🤖欢迎来到EdgeAgent Arena！构建通过边缘传感器感知并在本地行动的机器人和物联网设备，赢取超过7万美元的奖金池中的份额。 🔗立即注册：https://click.qwencloud.com/m/20000000281/