In the next version of Claude Code: subagents run in the background by default, so you can keep talking to Claude while your subagents work If you want your agent to run in the foreground, just tell Claude

译下一版 Claude Code 中：子智能体默认在后台运行，因此你可以在子智能体工作时继续与 Claude 对话。如想让智能体在前台运行，只需告诉 Claude 即可。

🚨 AI News | TestingCatalog@testingcatalog · 3天前74

Cursor released an iOS app 🔥 > Users will be able to check live activities and follow up on ongoing tasks. > PR reviews with diff viewer will be supported as well. Looks like it is not available in the EU though.

译Cursor 发布了 iOS 应用 🔥 > 用户将能够查看实时活动，并跟进正在进行的任务。 > 还将支持带有差异查看器的 PR 审查。不过看起来在欧盟地区无法使用。

Chubby♨️@kimmonismus · 3天前71

No composer 3, but Cursor for iOS. dont know...

译Cursor for iOS 正式发布。用户可通过启动始终在线的云智能体在任何地方构建，或从应用远程控制电脑上的智能体。此外，Composer 2.5 在应用内享受75%折扣，持续至7月5日。主推文回应：没有 Composer 3，但有 Cursor for iOS，表示不确定。

Rohan Paul@rohanpaul_ai · 3天前55

The next marketing fight may be over which brands appear inside LLM-generated recommendations. @Crowdreply_io just introduced an AI search visibility platform that helps brands measure, track, and shape whether ChatGPT, Claude, Gemini, and Perplexity recommend them. Search used to mean ranking a webpage on Google, but AI answers now compress discovery, comparison, and recommendation into one generated response. CrowdReply’s is building around that gap

译CrowdReply.io 推出AI搜索可见性平台，帮助品牌衡量、追踪和塑造在ChatGPT、Claude、Gemini、Perplexity等AI推荐中的可见性。传统搜索是网页在Google上的排名，而AI回答将发现、比较与推荐压缩为单次生成响应。CrowdReply MCP是首个能分析并排名网站在AI搜索中表现的MCP，通过对话定位缺失项并自动处理实施方案。

OpenAI Developers@OpenAIDevs · 3天前19

Your favorite Codex shortcuts are getting an upgrade. July 15th.

译你最爱的 Codex 快捷键即将升级。 7 月 15 日。

🚨 AI News | TestingCatalog@testingcatalog · 3天前32

OpenAI and @work_louder are about to announce a mechanical keyboard for Codex on July 15? Is it what I think it is? 👀

译OpenAI 和 @work_louder 即将于 7 月 15 日宣布一款为 Codex 打造的机械键盘？是我猜的那样吗？👀

jason@jxnlco · 3天前30

did you know @dkundel is the chief hype officer?

译Codex快捷键即将迎来升级，7月15日上线。有人打趣称@dkundel是首席宣传官。

eric zakariasson@ericzakariasson · 3天前63

i've been using cursor mobile on the go for the last weeks, and having access to all cloud agents from everywhere is really nice go on a walk, get an idea, dictate it in the app come back from walk to a finished agent where you can jump into it try it today!

译过去几周我一直在路上使用 Cursor Mobile，能随时随地访问所有云端智能体，真的太棒了。出去走走，有了想法，在应用中口述下来。走回来时智能体已经完成，可以直接进入其中。今天就试试吧！

Tibo@thsottiaux · 3天前65

Advanced Codex users. We shipped a replacement to coarse sandbox modes: reusable, inheritable permission profiles binding OS-enforced file read/write/deny rules (even **/*.env) to per-domain network + Unix sockets. Plus fail-closed admin allowlists. Least privilege per task. https://developers.openai.com/codex/permissions

译高级Codex用户。我们推出了粗放沙箱模式的替代方案：可重用、可继承的权限配置文件，将操作系统强制文件读/写/拒绝规则（甚至**/*.env）绑定到每域网络和Unix套接字。外加故障关闭的管理员白名单。每任务最小权限。

🚨 AI News | TestingCatalog@testingcatalog · 3天前43

Google is working on Inbox for Gemini Enterprise. The new section on the sidebar will contain three categories: Needs review, In progress, and Done. This feature will likely help users reach Inbox 0 based on Gemini recommendations generated from their work context.

译Google 正在为 Gemini Enterprise 开发收件箱功能。侧边栏的新部分将包含三个类别：需审查、进行中和已完成。该功能可能帮助用户根据 Gemini 基于其工作上下文生成的建议，实现收件箱归零。

elvis@omarsar0 · 3天前59

This is smart from Cline. They just launched ClinePass, which makes it easy to access the latest open-weight models like GLM 5.2, Kimi k2.7-code, Mimo 2.5, Deepseek v4 pro, Minimax M3, and more. Always a win when you don't have to juggle API keys.

译Cline 推出 ClinePass 订阅服务，月费 $9.99，提供 2-5 倍折扣价访问 GLM 5.2、Kimi k2.7-code、Mimo 2.5、Deepseek v4 pro、Minimax M3 等开源权重模型，省去管理多 API 密钥的麻烦。首月优惠价 $1.99，可通过 npm i -g cline 注册后在 Cline CLI 及 IDE 中使用。

Rohan Paul@rohanpaul_ai · 3天前49

AI agents to automatically improve business-critical KPIs. Giga just launched Scout, moves AI support from scripted replies toward measured business outcomes. Once you define the business KPI, AI agents create the agents, learn from real conversations, test each update, and keep improving toward that single goal.

译Giga 发布 Scout，一种以业务 KPI 为目标的 AI 智能体工具。用户用自然语言设定目标，Scout 自动构建智能体，从真实对话中学习（尤其是人工客服介入时），测试每次更改并保留有效部分。小型文案和策略修复可自动推送；涉及资金或系统的操作会带证据路由给团队审批。例如，金融科技公司将“资金存款”设为 KPI，Scout 智能体能自动触达未存款客户并促成存款，恢复流失收入。Scout 还能自行检测并修复自身集成故障，所有变更需用户批准后才生效。

🚨 AI News | TestingCatalog@testingcatalog · 3天前64

Cline has launched ClinePass, a flat monthly subscription that opens access to a curated set of open-weight coding models across its IDE extensions, CLI, and SDK. The current lineup includes GLM 5.2, Kimi K2.7 Code, DeepSeek V4 Pro, MiniMax-M3, and Qwen3.7, with a subscription replacing separate API keys across providers.

译Cline 发布 ClinePass 按月订阅服务，覆盖其 IDE 扩展、CLI 和 SDK，取代多个提供商的独立 API 密钥。当前套餐包括 GLM 5.2、Kimi K2.7 Code、DeepSeek V4 Pro、MiniMax-M3 及 Qwen3.7 等开源权重编码模型。Cline 称对 GLM-5.2 印象深刻，推出 $9.99/月订阅，提供 2-5 倍折扣访问；另提供 $1.99 促销价，通过 `npm i -g cline` 注册即可使用。

小互@xiaohu · 3天前46

瞎捣鼓了一个东西 http://Best.xiaohu.ai 给点意见🤓

PixVerse@PixVerse_ · 3天前40

Creating a fully realized dark sci-fi world once required studio sets, complex compositing, and a significant VFX budget. With PixVerse, a simple backyard phone clip can be transformed into a cinematic scene while keeping the original performance completely untouched.

译过去，打造一个完整的黑暗科幻世界需要摄影棚布景、复杂的合成技术以及大量视效预算。有了 PixVerse，一段简单的后院手机拍摄视频也能转化为电影级场景，同时完全保留原表演的完整性。

Berryxia.AI@berryxia · 3天前62

开始让美女助教卖课了😂 丝滑～

译开源项目OpenMontage单日获3000 Star，将视频生产拆为12条pipeline，内置52工具和500+ agent skills。用户用自然语言描述需求，agent完成从调研到剪辑全流程，支持AI与真实素材混合工作流，具备预合成验证、后渲染自检等质量把控，渲染引擎Remotion+HyperFrames，普通人对话可产出专业级视频。

Elon Musk@elonmusk · 3天前28

Grok Build daily updates

译Grok Build 更新至 v0.2.73，新增文本选择高亮保持设置，修复了 tmux 或编辑器终端中切换标签后出现重复行的问题，以及剪贴板复制只在通过可信路径接收文本时显示成功。

Alibaba Cloud@alibaba_cloud · 3天前33

AI is rewriting the rules of retail. We just launched new AI-powered solutions for retail. They understand your customers across every touchpoint, turning fragmented insight into personalized, immersive experiences that drive measurable retail growth. Built on Qwen. Proven at scale. Explore Alibaba Cloud for Retail → https://int.alibabacloud.com/m/1000414981/ #AI #Retail #AlibabaCloud

译AI 正在改写零售业的规则。我们刚刚发布了新的 AI 驱动的零售解决方案。它们能在每个触点上理解你的客户，将碎片化的洞察转化为个性化、沉浸式的体验，从而推动可衡量的零售增长。基于 Qwen 构建。在大规模场景中得到验证。探索阿里云零售解决方案 → https://int.alibabacloud.com/m/1000414981/ #AI #零售 #阿里云

🚨 AI News | TestingCatalog@testingcatalog · 4天前16

Tasks on Grok for iOS got renamed to Automations. For now, it seems to be only a name change along with a slightly different UI. Are we still about to see Grok desktop eventually?

译Grok for iOS 上的 Tasks 已更名为 Automations。目前看来，这似乎只是名称变更，外加 UI 略有不同。我们最终还能看到 Grok 桌面版吗？

宝玉@dotey · 4天前68

RepoPrompt 已经开源了，社区版（Community Edition）已上线 GitHub。背后的故事是这样的：几个月前，OpenAI 开发者体验负责人 Romain Huet 找到 Provencher，邀请他加入 OpenAI 团队。Provencher 答应之前提了一个条件，要先安排好现有付费用户。于是 Repo Prompt 先免费开放，现在彻底开源。 Repo Prompt 最初只做一件事：帮开发者从代码仓库里挑选文件，拼成一段高质量的 prompt，然后复制粘贴到 ChatGPT 或 Claude 里。听起来很简单，但它切中了一个真实痛点：把整个代码库丢给 AI 模型，效果往往很差，超过 32K token 的 prompt 甚至会让模型变笨，你需要精挑细选，只给模型看它真正需要的代码。这种做法现在有个正式名字叫上下文工程。开源版本的变化很大。Provencher 把架构做了一个反转：不再让应用本身去调度 agent，而是让内置的 MCP server 成为主控，底层的命令行工具（Claude Code、Codex、OpenCode、Gemini CLI）变成可以随时替换的执行层。这意味着你可以用一个推理模型做规划和任务分解，然后把子任务分发给不同的 agent 并行执行，每个 agent 只看自己负责的那部分文件。为了适应开源协作，很多老版本的手工拼 prompt功能被砍掉了，项目结构也从 Xcode 依赖中解耦出来，不需要装 Xcode 就能编译。贡献者管理借鉴了 libgdx 作者 Mario Zechner 的做法，维护一个白名单，之前的付费用户只要同意就自动成为认证贡献者。目前只支持 macOS，跨平台版本还在开发中，可以通过 Homebrew 安装（brew install --cask repoprompt-ce）。社区版：https://github.com/repoprompt/repoprompt-ce 老版本：https://github.com/repoprompt/repoprompt-classic

译RepoPrompt 已开源，社区版上线GitHub。作者Provencher被OpenAI招安，条件是为付费用户做好安排：先免费再开源，付费用户获赠Codex Credits。该工具从仓库挑选文件生成prompt，解决超32K token使模型变笨的问题，称为“上下文工程”。开源版反转架构：内置MCP server为主控，底层CLI工具可替换，支持推理模型规划并并行分发子任务。目前仅支持macOS，可通过Homebrew安装。

🚨 AI News | TestingCatalog@testingcatalog · 4天前64

Vida open-sourced BrowserBC, a framework that allows users to turn browser sessions into reusable skills for AI agents. > Instead of recalculating navigation on every turn, agents can follow a skill created from earlier task execution. > Vida reports a substantially higher success rate with fewer steps, via the same AI agent. Hotel booking bench? 👀

译Vida 开源了 BrowserBC 框架，能将浏览器会话转化为 AI 智能体的可重用技能。仅需一次录制，智能体即可依据之前任务执行的技能导航，无需每次重新计算。Vida 报告称，使用相同 AI 智能体，该方法成功率显著更高且步骤更少。

OpenRouter@OpenRouter · 4天前61

Tip: OpenRouter continuously runs GPQA and TAU-Bench on most open-weight models and publishes the results publicly. This informs our AutoExacto meta-benchmark, used by default when routing tool calls. Here, @Parasail_io and @Zai_org rank first: https://openrouter.ai/z-ai/glm-5.2#performance

译提示：OpenRouter 持续在大多数开源权重模型上运行 GPQA 和 TAU-Bench 评测，并公开发布结果。这些结果用于构建我们的 AutoExacto 元基准，在路由工具调用时默认使用。以下，@Parasail_io 和 @Zai_org 排名第一：https://openrouter.ai/z-ai/glm-5.2#performance

Berryxia.AI@berryxia · 4天前50

兄弟们，DeepSeek开源了DSpark！一个投机解码框架，不是新模型，是推理优化。核心问题：传统投机解码里，一个小的draft模型先猜一串token，然后大模型一次性验证。问题是猜的越后面越容易错，验证错误的猜测也浪费GPU算力。 DSpark的解法： 1. 并行backbone + 顺序head混合。纯并行猜测速度快，但后面的token会衰减，因为每个位置猜的时候不知道前面实际采样了什么。 DSpark加了一个小的Markov head，用前一个token调整当前猜测，解决了后缀衰减问题。 2. 置信度调度。加了一个置信度head，估算每个draft token的存活概率。再配合一个负载感知调度器，GPU空闲时多验证几个token，忙碌时少验证。不是所有猜的token都值得检查，只检查那些可能正确的部分。效果：在DeepSeek-V4生产环境中，单用户生成速度比MTP-1基线快60-85%。不同场景下吞吐提升1.5x到5x。开源内容： - 模型checkpoint：`DeepSeek-V4-Pro-DSpark` 和 `DeepSeek-V4-Flash-DSpark`，复用现有V4权重，附加draft模块 - 训练代码：MIT协议的DeepSpec代码库 - 与北京大学联合开发为什么重要：投机解码一直被认为"理论好但实战难"。 DSpark证明了在真实生产系统中，投机解码能稳定提速60%以上，而且不影响输出质量。 DeepSeek已经部署在生产环境里了。

译DeepSeek 开源 DSpark，一个面向生产环境的投机解码框架。核心解决传统投机解码中 draft 模型猜测后期 token 错误率高、浪费算力的问题。DSpark 采用并行 backbone + 顺序 Markov head 混合架构，消除后缀衰减；并引入置信度 head 和负载感知调度器，动态控制验证数量。在 DeepSeek-V4 生产系统中，单用户生成速度比 MTP-1 基线快 60-85%，吞吐提升 1.5x 至 5x。开源内容包括基于 V4 权重的 `DeepSeek-V4-Pro-DSpark`/`Flash-DSpark` checkpoint，以及 MIT 协议的 DeepSpec 训练代码，与北京大学联合开发。

PixVerse@PixVerse_ · 4天前58

From a basic grey 3D cockpit model to a full-speed cinematic lap. Seedance 2.0 uses the 3D pass to lock motion and camera movement, delivering precise, consistent results without relying on text prompts.

译从基本的灰色3D座舱模型到全速电影级圈速。 Seedance 2.0 使用3D通道锁定运动和相机移动，无需依赖文本提示即可提供精确、一致的结果。

🚨 AI News | TestingCatalog@testingcatalog · 4天前32

OpenAI is testing a new effort-selector UI for Codex as a slider. Besides that, it seems that real-time voice support will be completely reworked, as the previously available components have been removed.

译OpenAI 正在为 Codex 测试一种新的努力选择器 UI，采用滑条形式。此外，实时语音支持似乎将被彻底重写，因为之前可用的组件已被移除。

jason@jxnlco · 4天前36

instructor 1.15.4 is out mostly a maintainer sweep: - fixed v2 list/scalar response models - preserved backticks in streamed JSON strings - Image.autodetect now handles raw bytes - refreshed stale docs model strings, including Ollama llama3.2 small patches, fewer weird edges

译instructor 1.15.4 发布主要是维护性扫除： - 修复了 v2 列表/标量响应模型 - 保留了流式 JSON 字符串中的反引号 - Image.autodetect 现在处理原始字节 - 刷新了过时的文档模型字符串，包括 Ollama llama3.2 小补丁，更少奇怪边缘

Yuchen Jin@Yuchenj_UW · 5天前14

My OpenAI bro just dropped the most authoritative benchmark.

译我的 OpenAI 哥们刚刚发布了最权威的基准测试。

Tibo@thsottiaux · 5天前36

Tons of improvements landed in Codex. - Handles super long threads smoothly. - Hoverable navigation rail for previewing and jumping between turns that feels just right. - Settings search covers more controls, with clearer appearance and host-filtering options and easier-to-find custom-provider settings. - Zoom-level changes no longer misalign tooltips, dialogs, menus, selection bubbles, drag previews, or autocomplete. - Copying into Slack preserves Markdown formatting such as bullets, bold text, code, and links; and large text pastes no longer freeze the UI. - And most importantly: a dedicated Pets panel.

译Codex 本周推出多项体验改进。超长线程处理更流畅，导航栏悬浮可预览和跳转对话回合。设置搜索覆盖更多控制项，外观与主机过滤选项更清晰，自定义提供商设置更易找到。缩放时工具提示、对话框、菜单等不再错位。复制到 Slack 保留 Markdown 格式，大文本粘贴不冻结 UI。此外还新增了专属 Pets 面板。

🚨 AI News | TestingCatalog@testingcatalog · 5天前60

Meta AI app for iOS got incognito chats and a new look for the Glasses page. The updated page has shortcuts for all the primary toggles, including live translation and conversation focus.

译Meta AI app for iOS 新增了隐身聊天功能，并为 Glasses 页面提供了新外观。更新后的页面包含所有主要开关的快捷键，包括实时翻译和对话焦点。

Chubby♨️@kimmonismus · 5天前67

BrowserBC, a new open-source project from the ViDA team, explores a more efficient way to run web agents. Instead of using a frontier model for every step of an agent workflow, BrowserBC records a human web flow once with a stronger model, distills it into a reusable skill, and then lets a smaller, cheaper model handle execution. The reported results are notable: on WebArena-Hard, tool calls drop by 27%, while success increases from 60% to 81%. A very good open source project at the right time.

译ViDA 团队开源的 BrowserBC 项目，探索更高效的 web agent 运行方式：先用强模型录制一次人类浏览器操作流程，将其蒸馏为可复用技能，再交给更小更便宜的模型执行。一次录制即可泛化技能。在 WebArena-Hard 上，tool calls 降低 27%，成功率从 60% 升至 81%。

jason@jxnlco · 5天前41

Codex Auto review mode as I asked it to dm a coworker my .env file

译Codex Auto review mode，当我让它给同事发送我的.env文件时。

Berryxia.AI@berryxia · 5天前61

这个老师讲解LLM 真是通俗易懂啊，兄弟们～你觉得呢？

译一位老师以通俗易懂的方式讲解大语言模型（LLM），引发网友共鸣，并邀请大家分享看法。原文信息有限，未提及具体模型名称或课程细节。

AYi@AYi_AInotes · 5天前63

卧槽，Claude Code 桌面版这波更新太懂开发者了，原生多会话拖拽分屏，直接把并行 Agent 工作流的效率拉满了🤯 以前跑多个 Claude Code 会话得靠 tmux，开一堆终端窗口来回切，管理混乱进度也看不清。现在官方直接把多路复用器做进了桌面应用里，所有会话在左侧侧边栏统一管理，拖拽就能排成并排窗格，一个窗口同时看几个 Agent 干活。核心用法很清晰： 1. 桌面 App 里开多个会话，不同项目不同子任务都能分开。 2. 自由拖拽排列窗格，支持单独弹出新窗口。 3. 内置终端，文件编辑器，预览面板都能一起分屏排布。 4. 底部同时显示多个会话的输入区，随时切换输入。相当于把终端里的黑盒并行，变成了可视化的多任务工作台，所有进度一眼全览，不用再来回切窗口找上下文。放在以前这得靠第三方工具折腾半天，现在官方直接把并行 Agent 工作流的原生基建递到你手里，已经更了桌面版的可以直接去试试，体验提升比预想的大很多。 https://x.com/LLMJunky/status/2070733200846909717/video/1

译Claude Code 桌面版更新，支持原生多会话拖拽分屏，将并行 Agent 工作流可视化。用户可在桌面 App 中开多个会话，左侧侧边栏统一管理，拖拽即可排列并排窗格，支持单独弹出窗口。内置终端、文件编辑器、预览面板均可分屏排布，底部同时显示多个会话的输入区。相比此前依赖 tmux 和终端窗口切换，效率大幅提升。

OpenAI Developers@OpenAIDevs · 5天前52

🆕 Codex quality-of-life updates landed this week Starting with long threads: scrolling is smoother now, and your place stays put as you move through the conversation.

译🆕 Codex 质量提升更新本周发布。从长线程开始：滚动现在更流畅，并且在浏览对话时你的位置保持不变。

elvis@omarsar0 · 5天前61

http://x.com/i/article/2069825847729508352 # Building Agents with Vercel's Eve Framework Vercel recently shipped Eve, an open-source framework for building, running, and scaling agents. The core idea is that you stop hand-rolling the same agent plumbing every time, and start treating an agent as something you can read off disk. This is the practical version of what Eve is, why it matters, and what building with it actually looks like, drawn from the free hands-on lab we just built around it. Below you can read some of my thoughts (written with the help of Claude) after spending a week building with Eve. If you want to try Eve without any setup, we built a free hands-on lab where you drive the real eve CLI in a live terminal with no API key of your own required. You can try it at Introduction to Eve. ## Where Eve comes from Eve comes from a team at Vercel and is open source under the Apache 2.0 license. The official Vercel documentation describes it as a filesystem-first framework for durable backend AI agents, and it is currently in beta, so the APIs can still change before general availability. > "Agents today are where the web was before frameworks, with everyone hand-rolling the same plumbing and nothing carrying over to the next one." The Eve team, Vercel. Introducing Eve, June 17 2026. That is the whole motivation. Durable sessions, a sandbox to run code, approvals, tracing, evals. Every team rebuilds these before their agent does anything useful, and none of it transfers to the next project. Eve ships that infrastructure as the framework, so production is built in from the first run instead of bolted on at the end. ## An agent is just a directory of files The core idea, and the one the lab keeps returning to, is that an agent is not a graph you wire together in code. It is a folder. > "An agent is a directory. A file's name and place in the tree are its definition." The tools an agent can call, the skills it knows, the subagents it delegates to, its schedules, and its evals all live on disk as plain files. You can open the folder and see exactly what your agent is, diff it, commit it, and hand it to a teammate. There is no hidden runtime state to reason about, because the file tree is the state. Two files at the root define the agent itself. agent/instructions.md holds the always-on system prompt, and the optional agent/agent.ts sets the runtime config such as which model to use. Every capability below them, the tools, skills, subagents, connections, channels, and sandbox, is a directory eve auto-discovers by name, so adding one is usually just adding a file. ## The parts you assemble In the lab, each capability is one file you drop into the project, and Eve wires it up with no registration step. Here is what those files actually look like. Tools are the agent's hands. A tool is a typed action the agent can call, defined in a file under agent/tools/. The lab ships save_note.ts. The model decides when to call a tool from its description. Your code decides what happens, and it runs in your app runtime with full access, not in the sandbox. That split is what keeps an agent both flexible and safe. Skills give the agent know-how instead of actions. A skill is a markdown file under agent/skills/, advertised by a one-line description and loaded into context only when a request matches. The lab's filing.md is a few lines. Ask the agent to "log" a note and it loads this skill, files the note, and signs it off with "Filed with eve." that you never asked for. This is progressive disclosure. A support agent can hold dozens of playbooks as skills and pull in only the one the ticket needs, so the prompt stays lean. Subagents let one agent delegate. Every agent gets a built-in agent tool, so the parent can fan three subtasks out at once and gather the results. This is exactly how V routes work across Vercel's fleet of Eve agents. Human-in-the-loop gates the actions that need judgment. Mark a tool needsApproval: always() and the run pauses for a person before it executes, burning no compute while it waits. The pause is durable, so a task can wait on a human for minutes or days and resume right where it stopped. That is the draft0 pattern. Move fast on everything low-risk, and keep a hand on the few actions that ship. Durable sessions are why all of this survives the real world. Every conversation is a checkpointed workflow, so it survives a crash or a deploy and resumes exactly where it stopped. In the lab the agent simply remembers a fact you gave it three messages ago. In production it is an agent whose work starts in Slack and continues on the web days later, with no state-management code that you wrote. Evals prove it still works. An eval drives the real agent through a session and asserts on what happened. Change a prompt or a tool, run the evals, and you catch the regression before your users do. They run locally and in CI, the same way unit tests do. Connections are the way out, and channels are the way in, each a single file. A connection points the agent at an external service, an MCP server or an OpenAPI-style API, and Eve brokers the auth so the model never sees the URL or credentials. A channel puts that same agent in Slack, Discord, Teams, or behind an HTTP API. The agent you built in the terminal is the agent that ships to Slack. You change where it lives by adding a file, not by rewriting it. The pattern is always the same. Drop a file, the agent reads it, behavior changes, and you commit the file alongside your code. ## What this looks like in production This is not a toy. The examples below come straight from Vercel's Eve announcement, where the team describes the fleet of more than a hundred agents they run internally. The lab uses these same agents as the reference for each concept you learn. - d0, an internal data agent, answers around thirty thousand questions a month through a single read-only SQL tool against the warehouse. - Vertex, a support agent, resolves about ninety-two percent of tickets on its own by reaching into the help center and internal tools through connections. - Athena, a sales agent wired to Salesforce and Snowflake, was built in six weeks with no engineers. - draft0 drafts and reviews content, but a human signs off before anything ships. - V sits in Slack, reads each incoming task, and routes it to the agent best suited to answer. Every one of these is the same shape you build in the lab. The difference between the agent in your terminal and the one resolving real support tickets is mostly which files are in the directory. ## A concrete first session You do not start from a blank page. In the lab you launch a working agent in a real terminal and talk to it in plain English. You ask it to build something, say a small welcome.html, and watch it call its write_file tool and save the result to its sandbox, never touching your real machine. Then you hand it the save_note tool above, ask it to file a note, and see it pick the tool on its own from the description. From there the lab layers on a skill, a subagent, an approval gate, an eval, and a connection, one file at a time, until you have walked the whole framework. ## From your laptop to production This is where the filesystem-first bet pays off. > "The same directory runs in production exactly as it ran on your laptop." It is a normal Vercel project. Eve compiles the agent/ directory into an app that runs on Vercel Functions, so the agent you built and tested locally is the agent that deploys. What changes is not your code but the infrastructure underneath it, and each piece maps to a documented Vercel service. - The sandbox graduates. Locally the agent runs in an isolated, bash-style sandbox. In production each agent gets a real isolated Vercel Sandbox, so it can run shell commands and write files without ever touching your application runtime. - Sessions become durable workflows. Eve persists session state on Vercel Workflows, so a run survives a deploy, recovers from a cold start, and can pause on a human approval for minutes or days, then resume exactly where it stopped. The docs put it plainly, sessions "resume after cold starts, deploys, or long pauses." - Schedules and channels go live. Your defineSchedule files start firing on cron, and the channels you added put the same agent in Slack, Discord, Teams, or behind an HTTP API. - Every run is traced. Vercel Observability shows each agent run with its sessions, turns, tools, reasoning, timing, and token usage, with no setup. - Models and auth are handled. Model strings route through AI Gateway with OIDC, so you never manage provider keys, and Vercel Connect brokers OAuth and API keys for your connections. - One agent becomes a fleet. The same shape scales horizontally, which is how Vercel runs more than a hundred of these agents at once, each one just a directory. You do not re-implement anything for production. You deploy the directory, and the framework handles durability, isolation, models, and scale. ## How to get started 1. Scaffold a project. Run npx eve@latest init my-agent to create the project, install dependencies, and start the dev server. You get an interactive agent in your terminal in seconds. Talk to it in plain English. 1. Give it a tool. Add a defineTool file like save_note, ask the agent to use it, and watch it call your code. 1. Teach it a skill. Write a short markdown file with a description that says when to use a procedure. This encodes know-how without writing logic. 1. Delegate with a subagent. Hand off a focused job through the built-in agent tool so your main agent stays clean. 1. Prove it with an eval, then schedule it. Add a defineEval file and a defineSchedule file with a cron line. Now you have a checked, recurring agent. 1. Connect and ship. Add a connection to reach a real service, a channel to put the agent in Slack, then deploy the same directory to Vercel. Here is the takeaway. Eve's bet is that an agent should be a set of files you can read, not a runtime you have to trust. That makes agents inspectable, versionable, and portable, and it moves the hard production concerns into the framework where they belong. If you see any errors or things that need further clarification, don't be afraid to reach out. ## Other Useful References - Eve documentation, the official docs - Eve concepts, how agents, sessions, tools, skills, connections, and sandboxes fit together - Introducing Eve, the Vercel announcement - vercel/eve, the open-source framework on GitHub - Introduction to Eve, our free hands-on lab

译Vercel 开源了框架 Eve，将智能体视为一个目录：`agent/instructions.md` 定义系统提示，`agent/agent.ts` 配置模型等运行时参数；工具（`agent/tools/` 下的类型化文件）、技能（`agent/skills/` 下的 Markdown 文件，按需加载）、子智能体（内置 agent 工具实现委托）和人工审批（`needsApproval` 标记）均以文件形式存放，无需注册步骤。Eve 内置持久会话、沙箱、追踪和评估等生产级基础设施。

AK@_akhaliq · 5天前56

hf-claude lets you use over 100 open models in claude code including glm 5.2, minimax-m3, deepseek v4 pro

译hf-claude 让你在 Claude Code 中使用超过 100 个开源模型，包括 GLM 5.2、MiniMax-M3、DeepSeek V4 Pro。

Runway@runwayml · 5天前66

Localize ads is now available as a Recipe via the Runway API. You can now translate static ads and graphic assets via a single API call.

译广告本地化现在可通过 Runway API 以 Recipe 形式使用。现在您可以通过单次 API 调用翻译静态广告和图形资产。

🚨 AI News | TestingCatalog@testingcatalog · 5天前27

Google is working on Collections support for NotebookLM. > Users will be able to group multiple notebooks into a single collection. > Collections will appear in a separate tab in the NotebookLM main menu. Since Notebooks now also function as "projects" in Gemini, this may help users organize them more effectively.

译Google 正在为 NotebookLM 开发 Collections（集合）支持。 > 用户可以将多个笔记本分组到一个集合中。 > 集合将出现在 NotebookLM 主菜单的一个单独标签页中。由于笔记本现在在 Gemini 中也作为“项目”运行，这可能有助于用户更有效地组织它们。

凡人小北@frxiaobei · 5天前63

DeepSeek V4 进行了一次更新。新推出了投机解码（Speculative Decoding）框架 DSpark，推理速度提升 80%。 DSpark 已被部署在 DeepSeek-V4（Flash 和 Pro）的真实线上流量中。报告：《DSpark: Confidence-Scheduled Speculative Decoding with Semi-Autoregressive Generation》 https://github.com/deepseek-ai/DeepSpec/blob/main/DSpark_paper.pdf