Uber is limiting AI coding tool spending to $1,500 per employee each month. Uber’s CEO said last month that AI agents now submit and build roughly 10% of its code, while legal and marketing teams are also warming up fast to generative-AI tools. --- finance. biggo. com/news/M36WiZ4BLfE1EzqPygfr

译Uber将每位员工每月的AI编程工具支出限制在1500美元。 Uber的首席执行官上个月表示，AI智能体现在提交并构建了大约10%的代码，而法律和营销团队也正在快速适应生成式AI工具。

ginobefun@hongming731 · 6月3日70

http://x.com/i/article/2061947122350751744 # BestBlogs 早报 · 06-03｜动态工作流、Copilot 桌面、AI 工程范式在线阅读和收听：https://www.bestblogs.dev/explore/brief/2026-06-03 > EP76 · 2026-06-03 — AI 工程的范式正在被重写：Claude Code 突破单一上下文窗口、为每个任务动态生成编排脚本，GitHub Copilot 以智能体为核心推出桌面控制中心，提交量已突破 14 亿次/月。与此同时，腾讯云工程师从控制论视角论证，大模型是史上首个「认知引擎」，软件工程师的核心职责正在从「写代码」升级为「设计能自我纠偏的 AI 系统」。本期还涵盖任务保真度缩放定律、MiniMax M3 开源模型、NVIDIA Cosmos 3 及机器人供应链深度拆解，一并呈现这场变革的全貌。 ## 导语今天是 2026 年 6 月 3 日，AI 工具链的底层逻辑正在发生一次结构性升级。 Anthropic 正式推出 Claude Code 动态工作流：Claude 不再只能在单一上下文窗口里规划并执行，而是能即时为每个任务生成一套专属的 JavaScript 编排脚本，自主决定要启动多少个子智能体、使用哪种模型、是否在独立的 worktree 里隔离运行。触发词只需一个：ultracode。与此同时，GitHub 在 Microsoft Build 上发布了 Copilot 桌面应用——一个为并行 Agent 开发打造的统一控制中心。My Work 视图让你同时监管多条进行中的 Issue 和 PR，Canvas 面板实时显示 Agent 的工作进度，Agent Merge 全程处理 CI 和代码审查。在所有这些工具铺开的背景下，GitHub 的每月提交量已经突破 14 亿次，同比翻倍。本期精讲之外还有 7 篇速览，覆盖任务保真度缩放定律、AI 原生工程组织打造、MiniMax M3 开源模型、NVIDIA Cosmos 3、机器人供应链深度拆解、Agent 存算分离架构，以及贴吧 AI CR 落地 10 周后 bug 密度下降 66.87% 的完整实践。本期精讲三篇： - 精讲一：Anthropic 详解 Claude Code 动态工作流的工作原理与最佳实践 - 精讲二：GitHub 在 Microsoft Build 上推出以智能体为核心的 Copilot 桌面应用 - 精讲三：腾讯云工程师以控制论框架重新审视软件工程五十年与 AI 范式革命 ## 精讲一：为每项任务量身打造：Claude Code 中的动态工作流 | Claude Claude Code 面向的任务场景越来越复杂，但默认 harness 有一个固有限制：规划和执行必须在同一个上下文窗口里完成。随着任务变长、结构变复杂，这个窗口会越来越拥挤，开始出现「智能体懒惰」——Claude 开始抄近路；「目标漂移」——Claude 偏离了最初的任务目标。上周，Anthropic 发布了动态工作流（Dynamic Workflows），为这个问题提供了根本性的解法。动态工作流的工作原理动态工作流的核心是让 Claude 自己写一个 JavaScript 编排脚本，然后执行这个脚本来完成任务。这个脚本可以使用几个特殊函数来生成和协调子智能体（subagents），同时也可以调用标准的 JavaScript 工具：JSON、Math、Array 等。与静态工作流的关键区别在于两点。首先，动态工作流可以自主决定给每个子智能体使用哪个模型——这意味着 Claude 会把复杂的推理任务分配给更强的模型，把简单的信息采集交给更快的模型，在成本与质量之间动态权衡。其次，子智能体可以在独立的 worktree 里运行，实现真正的环境隔离，避免多个子任务互相污染工作状态。如果工作流被用户中断（比如关掉了终端），恢复会话后工作流可以从中断点继续，不需要从头再来。它解决了哪些具体的失败模式 Anthropic 在文章里明确列出了动态工作流针对的几类失败场景： - 长任务的上下文污染：单一窗口处理长任务时，早期的规划信息和后期的执行信息混在一起，Claude 开始迷失方向。 - 大规模并行任务：比如同时处理 80 份简历评级、同时从多个 Slack 频道抓取数据——这类任务天然适合多路并发，但默认 harness 无法原生支持。 - 高度结构化任务：比如让多个 Agent 分别扮演投资人、用户、竞争对手，从不同角度撕碎一份商业计划书。 - 对抗性任务：让两个子智能体互相挑战，形成一种反馈机制来提升结果质量。文章给出的几个示例 prompt 很有启发性：「这个测试大约每 50 次运行就会失败一次，用工作流来复现它，提出竞争性假设，不到找到能存活于证据的那个假设不要停」；「拿我最近 50 个会话挖出我反复在纠正的错误，把那些反复出现的写进 CLAUDE.md 规则」。这两个例子都展示了动态工作流的典型场景：需要反复迭代、需要并行比较、或者需要结构化协作的复杂多步任务。常见的工作流模式 Anthropic 总结了 Claude 在构建工作流时会组合使用的几种基本模式： - 分类执行（Classify-and-act）：先用一个 Agent 对输入进行分类，再把不同类别的任务分配给专门的下游 Agent。 - 排序（Sorting）：把大批量列表（比如 1000 条支持工单）按定性标准排序——单次 prompt 质量会随列表变大而退化，工作流可以分批处理再汇总。 - 竞争性验证（Adversarial check）：让一个 Agent 生成，另一个 Agent 专门找漏洞，循环直到结论站得住脚。使用建议动态工作流会消耗更多 token，不适合日常简单任务。最适合的场景是：任务足够复杂（单一上下文处理时质量会退化）、任务足够高价值（额外的 token 成本值得付出）、任务有结构化并行需求（多个角度、多个数据源、多个竞争性假设）。触发方式是在 prompt 里使用关键词 ultracode，或者明确要求「用工作流来完成这件事」。Anthropic 提醒，最佳实践仍在演进，建议首次使用时从相对简单的并行任务开始积累直觉，再逐步应用到更复杂的高价值场景。动态工作流与默认 harness 完全兼容，不需要时可以无缝回退，无需额外配置。对于正在用 Claude Code 处理复杂多步骤任务的工程师，这篇官方介绍值得仔细阅读：查看原文 ## 精讲二：GitHub Copilot 应用：以智能体为核心的桌面体验当 Agent 变成开发工作流的常态，管理多个并行 Agent 本身就成了一个新问题。你早上打开电脑，三件工作已经在推进中：一个 Agent 在排查生产 bug，一个 Agent 在实现积压需求，第三个 Agent 在处理代码审查反馈。你需要一个地方能同时看到这三个进度，能介入、能重定向、能测试、能合并。原有的开发工具并不是为这种工作方式设计的。在 Microsoft Build 2026 上，GitHub 发布了 Copilot 桌面应用，正是要填补这个空缺。 My Work：统一管理所有进行中的工作 Copilot 桌面应用的核心入口是 My Work 视图。这个视图汇聚了所有关联仓库里当前进行中的工作：活跃的 Agent 会话、Issue、PR、后台自动化任务。开发者不再需要在多个标签页之间切换来追踪不同 Agent 的状态，一个视图看全局。 worktree 隔离：Agent 会话互不干扰每一个 Agent 会话都在独立的 git worktree 环境里运行。这与 Claude Code 动态工作流的设计理念高度一致：隔离是并行 Agent 开发的基础——不同 Agent 的工作状态不会互相污染，合并时也有清晰的边界。 Canvas：双向协作面板 Canvas 是一个可视化的双向协作区域。Agent 工作时，你可以在 Canvas 里实时看到它的工作进度，也可以在任何节点插入反馈、调整方向。这种「异步介入」的交互模式与传统的「等待 Agent 完成再审查」不同，更像是一个真实存在的协作伙伴，只是它在你后台异步跑，你随时可以看进度并给意见。 Agent Merge：全程自动化 CI 和代码审查 Agent Merge 功能负责管理从 Agent 提交代码到合并的整个流程，包括触发 CI 检查、处理代码审查反馈、最终完成合并。开发者的精力可以更多集中在方向判断和质量审核，而不是流程管理。 Copilot 代码审查的定制化扩展与此同时，GitHub 还扩展了 Copilot 代码审查的能力：开发者现在可以通过自定义 Agent skills、MCP 服务器连接和可配置的 Actions 工作流，让每次代码审查都反映自己团队的标准、内部系统和工程上下文。代码审查还新增了「中等层级审查」（medium tier review）选项，在快速审查和深度审查之间提供了更细粒度的控制。规模背景：14 亿次提交/月 GitHub 在发布中披露了一组数据：当前平台的每月提交量已经突破 14 亿次，同比近乎翻倍；GitHub Actions 每周运行时间超过 20 亿分钟。这个增速直接说明了为什么 GitHub 要在这个时间点推出 Agent 原生的控制中心——现有工具的设计假设已经跟不上实际工作流的演进节奏。对于正在将多个 Copilot Agent 整合进开发工作流的团队，这篇发布文章是了解 GitHub Agent 原生方向的第一手资料。Copilot 桌面应用目前已向现有 Copilot Pro、Pro+、Business 和 Enterprise 用户开放技术预览，感兴趣的团队可以直接申请加入：查看原文 ## 精讲三：AI 软件工程范式革命的思考这篇来自腾讯云开发者的长文，是近期读到的关于 AI 与软件工程关系最系统、最有历史纵深的一篇思考。作者不是在讨论某个工具或某个技巧，而是从工程史的视角，对软件工程过去五十年的本质做出了一次重新定性。软件工程是过去五十年最不彻底的工程作者从控制论的视角，梳理了经典工程门类的成功路径：机械、化工、电力、自动化，这些领域都靠同一个范式完成了工程化——「消耗能源，把人脑参与的低阶认知回路固化成物理装置」。蒸汽机的离心调速器、化工厂的恒温器、电网的调度装置，本质上都是同一件事：让原本需要人来盯着、调整、判断的事情，由一台烧煤或通电的设备自己完成。不确定性被大规模消除，同样的输入产出稳定可预期的结果。软件工程卡在了这条路上。软件开发要处理的是抽象、分解、推理、创造——这些是高阶认知，没法像调速器那样固化成物理回路。五十年来，敏捷、Scrum、DevOps 解决的都是同一个问题，用的是同一种方式：优化堆人力的方式，但没有改变「必须靠人力堆」这个事实。这就是作者对「软件工程是最不彻底的工程」的定义：它在工程的形而上学层面是个残缺品——所有兄弟门类都完成了「能源替代低阶智能」这个动作，唯独软件没有。大模型是史上第一个「认知引擎」大语言模型做到了经典工程从来没做到的事：输入算力，输出能理解需求、生成代码、做逻辑推理的高阶认知产物。放到工程史的坐标里： - 经典工程：能源 → 低阶智能（机械调节、自动控制） - 大模型：能源 → 高阶智能（理解、推理、生成、决策）作者的判断是：大模型和蒸汽机的工程史地位是平行的。蒸汽机让「做功」第一次能源化，大模型让「认知」第一次能源化。软件工程「真正降临」的时刻，不是 Scrum 流行的时候，不是 DevOps 普及的时候，而是大模型让「能源换高阶智能」成为可能的这个时刻。在此之前所有的「软件工程」，严格说都是软件作坊的优化版。但这只是入场券，不是终局大模型带来了新的不确定性：幻觉（输出看起来合理，悄悄就错了）、漂移（同样的输入，今天和明天给出不一样的结果）、不可解释（没法看进它的决策过程）。这意味着大模型并没有消除不确定性，只是把「人的不确定性」换成了「模型的不确定性」。真正需要的是一整套新的工程原则——不再是「亲手消除每个微小的偏差」，而是「设计一个能自我纠偏的系统，并处理系统自己纠不回来的剩余偏差」。作者引入了冯·福斯特 1970 年代提出的二阶控制论：一阶控制论是「观察并控制被控对象」，二阶控制论是「观察并控制『观察并控制』这件事本身」。投射到 AI 软件工程： - 经典软件工程：人在写代码 - AI 软件工程：人在设计「AI 写代码的系统」这是身份的转变，不只是工具的转变。自动化越彻底，工业相关人口反而越多作者用一组跨越 150 年的数据指出：自动化越彻底，工业相关人口反而越多。1850 年代蒸汽机普及后，制造业整体爆炸式增长；1950 年代自动化后，工程师、设计师、工艺员数量暴增。每一次系统能力扩张，都会暴露出新的边界，而边界就是新的「偏差地带」，需要新一波人守在那里。结论：人不是被淘汰，而是迁移。边界在扩大，需要守的人反而更多了。但能在这种边界上工作的人会越来越少，因为形式化吃掉的都是低阶认知，剩下的都是越来越高阶的部分。与今日其他精讲的关系这篇文章与精讲一、精讲二形成了很好的理论基础互补。Claude Code 动态工作流和 GitHub Copilot 桌面应用，都是「设计能自我纠偏的 AI 系统」这个新工程原则在工具层的具体体现——worktree 隔离、子智能体协作、Canvas 双向介入，都在解决「如何设计系统来处理 AI 自身的不确定性」这个核心问题。对工程师意味着什么作者给出了一个相对乐观但也相当严峻的判断：AI 时代，人的统一职能是「处理系统暂时还无法处理的偏差」。这条铁律在所有工程门类里都成立——机械故障靠人拉回、电网负载偏差靠人仲裁，现在是认知偏差靠人纠正。不同的是，AI 工程里，偏差类型不再可枚举，偏差信号不再可观测，拉回手段也没有 SOP 可循。这意味着守边界的人，需要更强的判断力，而不只是更多的知识。作者在文章末尾讨论了组织形态和落地路线，以及他认为这场变革「最难的那道坎」在哪里，这部分值得有 AI 落地任务的工程师和技术管理者仔细阅读：查看原文 ## 速览 1. 任务保真度缩放定律：为什么数据质量决定 Agent 性能（AI Engineer） Snorkel 的实验证明：在相同算力和任务数量下，仅改变训练数据质量，高保真任务带来 6% 的性能提升，低质量任务只有 1%，差距高达 5 倍。高质量任务须满足四项标准：容器化（隔离干净的回滚和并行化）、可达性（目标非平凡但可实现）、功能正确性（逻辑可预期）、环境稳定性（执行基础设施稳定）。满足这四项才能产生干净的失败信号，让模型在 RL 训练中有效爬坡。低质量任务的常见缺陷是「退化失败态」：环境本身就不稳定，模型无法从失败中提取有意义的学习信号，额外的计算预算全部浪费在噪声上。对正在做 Agent 微调数据集的工程师，这组数据有直接的策略指导价值。查看原文 2. 打造 AI 原生工程组织 | Claude（Claude Blog） Claude Code 团队分享了他们如何重新设计工程流程以适应 AI 原生工作方式。代码生成、测试编写和重构已经不再是瓶颈，真正的瓶颈变成了验证、代码审查和安全评估。他们重写了规划方式（从长期路线图改为即时制订）、代码审查流程、上下文收集方式，以及团队的构成逻辑。这不是工具使用指南，而是一个已经完全转型的工程组织对「如何重新设计流程」的第一手记录，适合正在思考 AI 原生团队转型的工程 Leader 阅读。查看原文 3. MiniMax M3：首个融合三大前沿能力的开源权重模型（MiniMax 官方） MiniMax 正式发布 M3，声称是首个同时融合三大前沿能力的开源权重模型：编码与智能体性能（SWE-Bench Pro 59.0%、Terminal Bench 2.1 66.0%）、由 MiniMax 稀疏注意力（MSA）实现的 100 万 token 上下文窗口、从零构建的原生多模态能力。同期推出 MiniMax Code 产品和新的 token 计划。权重和技术报告将在约 10 天内发布。值得注意的是，M3 是国内团队在开源大模型赛道上迄今为止对标 GPT 4o 级编码能力的最完整尝试之一，对关注开源模型生态的开发者值得持续跟进。查看原文 4. NVIDIA 推出 Cosmos 3：用于物理 AI 的完全开放全能模型（NVIDIA AI） NVIDIA 发布 Cosmos 3，定位为世界上首个完全开放的、用于物理 AI 的「全能模型」（omnimodel），原生支持视觉推理、世界生成和动作生成三种能力。本次发布了两个版本：Super（32B）和 Nano（8B），面向机器人和自主系统领域。结合精讲三和速览第五条的机器人供应链分析，物理 AI 的基础模型层正在加速成熟。查看原文 5. 拆解机器人「肉身」、量产与供应链：空翻之后，它还要学会接住一片落叶（硅谷 101）硅谷 101 深度拆解人形机器人的硬件架构：骨架材料（从钢材到铝合金、镁合金、钛合金的演进与轻量化权衡）、关节执行器（从液压到电机转变的背后技术进步）、传感器体系、电气与计算系统，以及整条供应链的成本结构与量产门槛。文章还引用了智元、宇树等头部企业一线负责人的具体判断。宇树科技科创板 IPO 刚刚通过上交所审议，这篇系统性拆解正当其时，适合想深入了解机器人硬件护城河的读者。查看原文 6. 深度解析 Agent 存算分离架构设计（idoubi）作者以 FastClaw 为例，系统拆解云端 Agent 的存算分离架构：三种运行模式（本地裸机、本地带沙盒、云端多副本）的优缺点对比，存储层的四种方案（热状态用 Redis、对话记录用 Postgres、长期记忆用 pgvector/Milvus、工作产物用 S3/OSS），以及基于存算分离架构的完整运行流程，同时指出了分布式数据一致性的挑战。对比今日精讲一中 Claude Code 动态工作流的 worktree 隔离机制，两篇在「计算与状态分离」这个方向上有一定共鸣，对正在设计云端 Agent 基础设施的工程师有直接参考价值。查看原文 7. 用数据说话：贴吧 AI CR（小码哥）落地 10 周，bug 密度下降 66.87%（百度 Geek 说）贴吧 Server 团队的 AI Code Review 落地实践：通过规则定制、自动化评测和三层反馈闭环（高/中/低优先级评论处理流程），将 AI CR 评审占比从 33% 提升至 84%，bug 密度从 0.332 降至 0.11，降幅 66.87%。文章完整记录了 10 周的推进节奏、踩坑经验和方法论，代码库多、提交频率高、人工评审质量参差的团队可直接参考迁移。这份实践与精讲三的理论框架形成印证——AI CR 本身就是一个能自我纠偏的代码质量系统。查看原文 ## 今日阅读路径时间有限，建议先读这三篇： 1. 为每项任务量身打造：Claude Code 中的动态工作流（精讲一）— 如果你在用 Claude Code，这是今天最直接有用的一篇，10 分钟读完，了解动态工作流的工作原理和触发方式，以及哪类任务最值得启用。 1. AI 软件工程范式革命的思考（精讲三）— 今天内容最有长期价值的一篇。控制论框架下的软件工程史重构，以及「设计能自我纠偏的 AI 系统」这个新工程师身份定位，是理解当前所有 AI 工具演进方向的底层框架。 1. GitHub Copilot 应用：以智能体为核心的桌面体验（精讲二）— 并行 Agent 开发控制中心的完整介绍，了解 GitHub 在 Agent 原生方向的系统性布局，以及 worktree 隔离、Canvas 协作、Agent Merge 这几个核心机制的实际用法。还有时间？推荐任务保真度缩放定律（做 Agent 微调数据集的工程师必读，5 倍质量差距有直接策略价值）和机器人供应链深度拆解（宇树 IPO 时机下的硬件架构系统梳理，适合关注具身智能落地的读者）。

译Anthropic 为 Claude Code 推出动态工作流，允许模型为每个任务自主生成 JavaScript 编排脚本，动态选择模型并启动多个子智能体在独立环境中并行执行，以解决单一上下文窗口处理复杂任务的限制。同时，GitHub 在 Microsoft Build 上发布了以智能体为核心的 Copilot 桌面应用，提供统一视图、协作面板和自动化流程，旨在管理并行 Agent 开发。文章披露，GitHub 平台每月提交量已突破 14 亿次。

ginobefun@hongming731 · 6月3日49

#BestBlogs 早报 06-03 BestBlogs 今日早报推荐阅读： Anthropic 博客详解 Claude Code 动态工作流，Claude 能为每个任务即时生成专属编排脚本，告别「智能体懒惰」和「目标漂移」； GitHub 在 Build 同步亮相 Copilot 桌面应用，每个 Agent 独占 worktree、提交量已破 14 亿/月。腾讯云工程师则从控制论视角点出：大模型是史上首个「认知引擎」，工程师的核心职责正在从「写代码」升级为「设计能自我纠偏的 AI 系统」。

译Anthropic 详解 Claude Code 的动态工作流，其能为每个任务即时生成专属编排脚本，旨在解决智能体懒惰与目标漂移问题。GitHub 发布 Copilot 桌面应用，为每个智能体提供独立的 worktree，其月代码提交量已突破 14 亿 tokens。此外，有观点指出大模型是史上首个“认知引擎”，工程师角色正从编写代码升级为设计能自我纠偏的 AI 系统。

ClaudeDevs@ClaudeDevs · 6月3日66

We've updated /fork in Claude Code /fork now runs a background agent with your exact context (system prompt, tools, history, model) and prompt cache. The result gets returned to your session. /branch (the old /fork) still copies the transcript to a new session you drive.

译我们已更新 Claude Code 中的 /fork 命令。 /fork 现在会在后台运行一个智能体，使用您的完整上下文（系统提示词、工具、历史记录、模型）和提示词缓存。结果将返回到您的会话中。 /branch（旧的 /fork）仍然会将对话记录复制到您驱动的新会话中。

MiniMax (official)@MiniMax_AI · 6月3日74

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun A few highlights 🧵 1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and does block-level selection with a small top-K. That's how the 1M context window stays tractable. 2. The efficiency win is huge. In our previous generation, ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation. 3. M3 isn't just a coding model. Natively multimodal (image + video in), ability to handle long-horizon agentic tasks, and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it (Unity included) and it's holding its own. 4. M3 can self-evaluate on vision-coding tasks: it builds a website or SVG, browses and inspects its own rendered output, judges it, and iterates - grading work visually. 5. We're also seeing junior-analyst-level performance on finance tasks; something we haven't even showcased publicly yet. 6. What's next: harder long-horizon / multi-file tasks in future releases, scaling data + post-training (RL) compute toward pre-training scale, and going deeper into finance, legal & bio. Thanks to everyone who joined 🙏 Try M3 link in the comments👇

译MiniMax M3模型通过Live Session分享了核心信息。其MSA技术采用块级Top-K选择，保持真实、未压缩的KV缓存，使1M token上下文窗口高效运行。该技术将长上下文生成的注意力内核解码时间从约30%降至约5%，效率提升显著。M3是原生多模态模型，支持图像视频输入，可处理长程智能体任务及桌面操作，并具备视觉自评估迭代能力。模型在金融任务中展现出初级分析师水平。未来版本将聚焦更复杂的长程任务，并扩展金融、法律与生物领域。Together AI为其提供推理服务。

elvis@omarsar0 · 6月3日38

Code is all you need! Search as Code Harness as Code What's next?

译代码就是你所需的一切！搜索即代码工具链即代码接下来是什么？

Thariq@trq212 · 6月3日81

http://x.com/i/article/2061850535708483585 # A harness for every task: dynamic workflows in Claude Code Last week, we released dynamic workflows in Claude Code. Claude can now write its own harness on the fly, custom-built for the task at hand. While the default Claude Code harness is built for coding, it is also useful for many other types of tasks because, as it turns out, many tasks resemble coding tasks. But there are certain classes of tasks where we have had to build custom harnesses on top of Claude Code to achieve peak performance such as Research, security analysis, agent teams, or Code Review. Workflows allow you to dynamically create harnesses that enable Claude to solve all of those problems and more natively inside of Claude Code. You can also share and re-use these workflows with others. In this article, I’ll cover my initial workflows experiences and learnings so you can take full advantage. That said, best practices are still developing! Dynamic workflows often use more tokens, so think carefully about when and how to use them. Note: this post is also available on the Claude Blog ## Example prompts Before diving into the technical details, I’d like to start with some example prompts to get you thinking about the possibilities with workflows: - "This test fails maybe 1 in 50 runs. Set up a workflow to reproduce it, form theories and adversarially test them in worktrees /goal don't stop until one theory works." - "Using a workflow, go through my last 50 sessions and mine them for corrections I keep making and turn the recurring ones into CLAUDE.md rules" - “Use a workflow to dig through #incidents in Slack for the past six months and find recurring root causes where nobody has filed a ticket." - "Take my business plan and run a workflow where different agents tear it apart from an investor's, a customer's, and a competitor's perspective." - "Here's a folder of 80 resumes, use a workflow to rank them for the backend role and double-check the top ten. Interview me using the AskUserQuestion tool for a rubric." - "I need a name for this CLI tool. Use a workflow to brainstorm a bunch of options and run a tournament to pick the top 3." - "Use a workflow to rename our User model to Account everywhere." - “Go through my blog post draft and using a workflow verify every technical claim against the codebase, I don't want to ship anything wrong." ## How dynamic workflows work Dynamic workflows execute a javascript file with a few special functions that help spawn and coordinate subagents: Dynamic workflows also include standard JavaScript functions like JSON, Math, and Array, to help process data. It’s particularly useful to know that dynamic workflows can decide which models an agent uses and whether subagents are run in their own worktree, allowing Claude to choose the intelligence level and isolation needed. If a workflow is interrupted, for example by user action or quitting the terminal, resuming the session will allow the workflow to pick up where it left off. ## Why dynamic workflows When you ask the default Claude Code harness to do a task, it needs to both plan and execute in the same context window. For many coding tasks, this is highly effective, but it can sometimes break down over long-running, massively parallel and/or highly structured adversarial tasks. This is because the longer Claude works on a complex task in a single context window, the more it becomes susceptible to a few specific failure modes: - Agentic laziness refers to when Claude stops before finishing a particularly complex, multi-part task and declares the job done after partial progress, for example addressing 20 of the 50 items in a security review. - Self-preferential bias refers to Claude’s tendency to prefer its own results or findings, especially when asked to verify or judge them against a rubric. - Goal drift refers to the gradual loss of fidelity to the original objective across many turns, especially after compaction. Each summarization step is lossy, and details like edge-case requirements or "don't do X" constraints can get lost. Creating a workflow helps combat these by orchestrating separate Claudes with their own context windows and focused, isolated goals. ## Dynamic vs static workflows You may have previously created a static workflow using the Claude Agent SDK or claude -p to coordinate multiple instances of Claude Code together. But because static workflows need to work for all edge cases, they are usually more generic. With Claude Opus 4.8 and dynamic workflows, Claude is now intelligent enough to write a custom harness tailor-made for your use case. # Helpful patterns when using dynamic workflows You can start using dynamic workflows just by asking Claude to make one, or by using the trigger word “ultracode” to ensure that Claude Code creates a workflow. But building a mental model for how dynamic workflows work will help you understand when to use them and how you might nudge Claude via prompts. There are a few common patterns that Claude might use and compose together when building workflows: Classify-and-act Use a classifier agent to decide on the type of task, and then route to different agents or behavior based on the task. Or, use a classifier at the end to determine output. Fan-out-and-synthesize Split up a task into many smaller steps, run an agent on each step and then synthesize those results. This is particularly useful for when there are a large number of smaller steps, or when each step benefits from its own clean context window so they don't interfere or cross-contaminate. The synthesize step is a barrier—it waits for all the fan-out agents, then merges their structured outputs into one result. Adversarial verification For each spawned agent, run a separate spawned agent to adversarially verify its output against a rubric or criteria. Generate-and-filter Generate a number of ideas on a topic and then filter them by a rubric or by verification, dedupe duplicates and return only the highest quality, tested ideas. Tournament Instead of dividing the work, have agents compete on it. Spawn N agents that each attempt the same task using different approaches. Prompts or models then judge the results in a pairwise fashion using a judging agent until you have a winner. Loop until done For tasks with an unknown amount of work, loop spawning agents until a stop condition is met (no new findings, or no more errors in the logs) instead of a fixed number of passes. # Use cases Think creatively of when and how to ask Claude Code to make dynamic workflows. I’ve found that workflows are sometimes even more useful for non-technical work. ## Migrations and refactors Bun was rewritten from Zig to Rust using workflows. You can read more about how that was done in Jarred’s X thread. The key is to break down the task into a series of steps that need to be operated on for example callsites, failing tests, modules, etc. Spin off a subagent for every fix in a worktree to make the fix, then have another agent adversarially review, and merge them. Consider telling the agent not to use resource intensive commands so that you can maximally parallelize without running out of resources on your machine. ## Deep research We published a deep research skill (/deep-research) inside Claude Code that uses dynamic workflows. Specifically, it fans-out web searches, fetches sources, adversarially verifies their claims, and synthesizes a cited report. But you may do this sort of research for more than just web searches. For example, asking Claude to compile a status report from context in Slack or to research how a feature works by exploring a codebase in-depth. ## Deep verification On the other hand, if you have a report where you want to check and source every factual claim that it references you may want to generate a workflow which has one agent identify all of the factual claims and then spin off a subagent to check each one in-detail. You could also have a verification agent check the source subagent to make sure its source is high quality. ## Sorting You may have a list of items that you want to sort by some qualitative measurement that you believe that Claude Code is good at evaluating, for example: support tickets sorted by severity of the bug. But if you try to sort 1000+ rows in one prompt, quality degrades and it won't fit in context. Instead run a tournament, a pipeline of pairwise-comparison agents (comparative judgment is more reliable than absolute scoring), or bucket-rank in parallel then merge. Each comparison is its own agent, so the deterministic loop holds the bracket and only the running order stays in context. ## Memory and rule adherence If you have a particular set of rules that you find Claude misses or struggles with, even when put into the CLAUDE.mds, create a workflow with a list of rules that must be checked by verifier agents—one verifier per rule. Creating a skeptic persona subagent to review the rules to make sure they are in line will help avoid too many false positives. The reverse direction works too: mine your recent sessions and code review comments for corrections you keep making, cluster them with parallel agents, adversarially verify each candidate (would this rule have prevented a real mistake?), and then distill the survivors back into a CLAUDE.md. ## Root-cause investigation Debugging works best when you come up with several independent hypotheses and test them, but if you’re only using one context window, Claude can run into self-preferential bias. A workflow can structurally prevent this by spinning up agents to generate hypotheses from disjoint evidence. For example, separate agents for logs, files, and data. Each hypothesis can then face a panel of verifiers and refuters. This isn't just for code. Workflows can be used for sales (why did sales drop in March?), data engineering (why did this pipeline fail?), or any post-mortem exercise. ## Triaging at scale Every team has a support queue, bug reports, or some other backlog that cannot be fully processed by humans. A triage workflow classifies each item, dedupes against what's already tracked, and takes action. This could mean attempting the fix or escalating to a human user. A useful pattern for triage workflows is quarantine. This involves barring the agents that read untrusted public content from taking high-privilege actions, which are instead done by the agents in charge of acting on the information. Pair triage workflows with /loop to have Claude do this continuously. ## Exploration and taste Workflows can be useful when exploring different approaches to a solution, especially when it is taste based, like design or naming, and would benefit from a rubric. Try asking Claude to explore a bunch of solutions, and give a review agent a rubric for what a good solution looks like. The task is complete when the review agent feels like it has met the criteria. Solutions can also be ordered or selected via a tournament based on the rubric. ## Evals You can run lightweight evals for particular tasks by spinning off separate agents in a worktree and then spinning off comparison agents to compare and grade the specific outputs against a rubric. For example, evaluating and then refining a skill you’ve created against a particular criteria. ## Model and intelligence routing Create a classifier agent tuned to your tasks that decides which model to use. This can be helpful when your task will involve many tool calls and conducting research prior to execution can identify the best model for the job. For example, the best model for the task “explain how the auth module works” depends on how many files in the auth module there are and the shape of the codebase. A classifier agent can do this research and then route to Sonnet or Opus based on the expected complexity of the task. ## When not to use dynamic workflows Workflows are new. While there are many use cases where it will create outsized results, they are not needed for every task and may end up using significantly more tokens. It’s best to use workflows creatively to push Claude Code in ways that you haven’t previously. For regular coding tasks, try and ask yourself does it really need more compute? For example, most traditional coding tasks do not need a panel of 5 reviewers. # Tips for building dynamic workflows Prompting Detailed prompting, using the specific techniques we described above, for dynamic workflows creates the best results. Workflows are not just for large tasks. You can prompt the model to use a “quick workflow.” For example, you can create a quick adversarial review of an assumption. Combine with /goal and /loop When using workflows that can be repeated, for example triage, research, or verification, pair them with /loop to be run at regular intervals, and /goal to set a hard completion requirement. Token usage budgets You can set explicit token usage budgets for dynamic workflows to limit how many tokens a task uses. You can prompt it with a budget like: “use 10k tokens,” which will set the cap. Saving and sharing dynamic workflows You can save workflows by pressing “s” in the workflow menu. You can check these into ~/.claude/workflows or distribute them via a skill. To share them via a skill, put your JavaScript workflow files in the skill and folder and reference them in the SKILL.MD. To allow for more flexibility, you may want to prompt Claude to think of the workflows in the skill as a template instead of a script that needs to be run verbatim. ## A whole new world Workflows are a helpful new way to extend Claude Code. I encourage you to think of this as a starting point, there's still much to discover in how to use them best. Let us know what you find. Thariq Shihipar and Sid Bidasaria (@sidbid) are members of technical staff at Anthropic, working on Claude Code.

译Claude Code 新增动态工作流功能，使 Claude 能根据任务动态创建定制化的执行框架。该功能通过执行 JavaScript 文件来协调子智能体，并可指定模型与工作区隔离级别。它适用于研究、安全分析、代码审查等复杂任务，支持共享与复用。需要注意，动态工作流会消耗更多 token。

Thariq@trq212 · 6月3日69

Workflows are the biggest upgrade to Claude Code’s capabilities since skills and subagents. I dove deep into it with @sidbid to figure out best practices, examples and more. I’m particularly excited about the non-technical tasks it enables for Claude Code.

译工作流是 Claude Code 自技能和子智能体以来最大的能力升级。我和 @sidbid 深入探讨了最佳实践、示例等内容。我特别兴奋于它为 Claude Code 启用的非技术任务。

ClaudeDevs@ClaudeDevs · 6月3日73

How do you get Claude Code to check its own work before handing it back? Watch how you can encode your manual checks so Claude closes its own feedback loop:

译如何让 Claude Code 在交回工作前检查自己的成果？看看如何编码你的手动检查，让 Claude 自己关闭反馈循环：

Artificial Analysis@ArtificialAnlys · 6月3日49

We’re hosting a Coding Agent Benchmarks event on Thursday, June 11 in San Francisco with lightning talks and a panel discussion with leading AI researchers, builders, and engineers. If you're building coding agents, LLM tooling, or AI infrastructure, we’d love to see you there! Request to join 👇 https://luma.com/i5zotp6c

译我们将于6月11日星期四在旧金山举办一场编程智能体基准测试活动，包含闪电演讲以及与顶尖AI研究人员、开发者和工程师的小组讨论。如果你正在开发编程智能体、LLM工具或AI基础设施，我们很期待你的到来！申请加入 👇 https://luma.com/i5zotp6c

OpenAI Developers@OpenAIDevs · 6月3日69

Role-specific plugins in Codex are built around the work teams actually do. Plugins for Data Analytics, Creative Production, and Product Design give Codex the tools and context to create reports, creative directions, and prototypes. Built and used by OpenAI teams.

译Codex 中的角色专属插件围绕团队实际工作构建。数据分析、创意制作和产品设计插件为 Codex 提供了创建报告、创意方向和原型的工具与上下文。由 OpenAI 团队构建并使用。

向阳乔木@vista8 · 6月3日66

我去，一句话建网站啊，还能分享给别人查看。企业版，注意必须企业版更新Codex后， @ site 使用。 Codex这次更新有点强！ Anthropic 只是Design，OpenAI更进一步，包设计，还包网站生成。

向阳乔木@vista8 · 6月3日65

http://x.com/i/article/2061873460926943233 # Codex进化：写代码只是第一块拼图，下一块是什么？很多公司已经遇到一个尴尬场景。 AI 帮员工写完一段代码很容易。可一旦任务变成整理 Slack 里的上下文、翻 Google Docs、拉 CRM、做一份高管材料、把数据变成仪表盘，事情马上变复杂。 OpenAI 这次给 Codex 的更新，重点就在这里：把一个开发工具，继续往通用工作系统推。 OpenAI 披露，Codex 每周用户已经超过 500 万。非开发者，包括分析师、市场、运营、设计师、研究员、投资人和银行从业者，已经约占整体用户 20%，增长速度是开发者的 3 倍多。这个数字有意思。 Codex 正在从“会写代码的助手”，变成“懂岗位交付的工作台”。 ## 六个角色插件，把 Codex 推进真实工位这次最硬的变化，是角色插件。 OpenAI 一口气推出了 6 个面向角色的插件：数据分析、创意生产、销售、产品设计、公开股票投资、投行业务。每个插件不只是多接几个应用。更关键的是，它把相关应用、技能、指令和工作流打包在一起。 OpenAI 给出的数字是：这些插件合计覆盖 62 个热门应用和 110 个技能。这背后的产品思路很清楚。分析师需要追问指标为什么变了，销售需要把客户信号变成跟进动作，投行团队需要把研究和尽调变成客户能看的材料。岗位不同，默认上下文不同，交付标准也不同。如果 Codex 想进入这些岗位，就不能只等用户一句一句教它“怎么做”。它要提前知道这个角色的常见材料、常见工具和常见判断方式。下面是原文展示的插件生态图标墙（部分）这堆图标看起来像一张合作名单，但它真正透露的是另一件事：OpenAI 不想让 Codex 只待在自己的产品边界里。它要进企业已有的工具链。 ## Sites 出现后，AI 回答开始变成工作空间第二个关键能力叫 Sites。这是面向 Business 和 Enterprise 客户的预览功能。 Codex 可以把想法、分析和计划做成可分享的交互式网站或小应用，同一个 workspace 里的成员可以通过 URL 访问。这件事比“生成一个网页”更重要。很多知识工作的问题，从来不缺一段文字总结。缺的是一个可以反复查看、协作更新、聚合判断的地方。比如客户复盘、财务场景规划、产品发布中心、项目看板、创意简报库。这些东西如果散在文档、表格、聊天记录里，团队就会反复追问同一个问题：最新版本在哪里？谁负责下一步？哪个假设已经改了？ Sites 想把这类一次性输出，变成持续协作的页面。图中是原文展示的收益预测规划器示例。它说明 Sites 的定位已经越过静态文章，更像能承载数据、状态和决策的轻量工作界面。 OpenAI 还提到，Wix、Base44、Replit、Lovable、Figma、Webflow 和 Emergent 等早期伙伴会参与 Sites 生态建设。这很关键。一旦 AI 生成的页面可以被分享、更新、协作，它就开始碰到传统 SaaS 的地盘。 ## 批注让 AI 修改时更像同事第三个变化是 annotations，批注。开发者已经在 Codex 里用批注改代码、Markdown 和网站。现在这个能力扩展到了文档、表格、幻灯片等内容。用户可以选中网站里的导航栏，让 Codex 改字体；高亮投资论点里的某个判断，让 Codex 查来源；圈出幻灯片上的图表，让它改成更清楚的标签。这个能力的价值，在于它把“重新生成一遍”变成了“局部修改”。 AI 做第一稿不难，难的是第二轮、第三轮。人类反馈往往不是抽象的。 “这一句太硬”“这张图看不清”“这个指标口径哪里来的”“这个按钮不像我们品牌”。批注把反馈固定到具体位置，AI 才更容易只改该改的地方。从工作流角度看，这比一次性生成更接近真实协作。 ## 真正的变化，是 Codex 开始理解岗位交付 OpenAI 还给了几个内部和客户案例。 OpenAI 内部的非技术团队用 Codex 做内部应用、高管材料、仪表盘，也把创意简报变成符合品牌和设计约束的工作。 Zapier 团队用 Codex 从 Slack、Google Docs、Coda 等工具里提取知识，再整理成事故复盘、响应计划和功能票据。 NVIDIA 研究员用 Codex 加速实验流程，从找研究想法到写机器学习基础设施脚本。这些案例都有同一个特点：Codex 处理的是一串带上下文的工作，孤立任务只是其中一小段。这也是角色插件、Sites、批注三件事放在一起的原因。插件负责接入岗位上下文，Sites 负责把输出变成协作空间，批注负责把反馈循环接起来。如果说早期 Codex 的问题是“能不能帮开发者写代码”，这次更新的问题变成了“能不能帮一个团队完成工作”。可用性方面，角色插件会在支持地区逐步向 Codex 用户开放。管理员可以在 workspace 设置里控制底层应用权限。 Sites 目前面向 Business 和 Enterprise 团队在 Codex app 中预览，Enterprise 管理员可以在后台启用。 OpenAI 还提到，后续会推出更多角色插件，包括企业财务、私募投资、营销策略、战略咨询和法律。这不是一个小功能更新。它更像是 OpenAI 在说：AI 工具的下一站，会从更聪明的聊天窗口，走向更懂组织分工的工作系统。代码只是第一块拼图。下一块，是那些没人想手动整理、但每家公司都离不开的工作。原文：Codex for every role, tool, and workflow

译OpenAI披露，Codex每周用户已超500万，其中非开发者用户约占20%，增速是开发者的3倍多。此次更新旨在将其从开发工具推向通用工作系统，主要推出三项能力：1) 面向数据分析、销售等角色的角色插件，覆盖62个应用和110个技能；2) 面向企业客户的Sites功能，可将计划生成为可协作的交互式网站；3) 扩展到文档、表格等的批注功能，支持局部修改。这些更新旨在让Codex更好地理解岗位上下文，进入企业现有工具链。

Tibo@thsottiaux · 6月3日67

Tons of goodies for use of codex for day to day work. If you are on a business plan you can now host and share websites, we launched vastly improved plugins and skills for broad roles and you can give feedback to your agent through visual annotations in docs, slides, sheets and more.

译Codex 日常工作使用中新增大量实用功能。如果你使用商业计划，现在可以托管和分享网站，我们推出了大幅改进的插件和技能以适应广泛的角色，并且你可以在文档、幻灯片、表格等中通过视觉注释向你的智能体提供反馈。

🚨 AI News | TestingCatalog@testingcatalog · 6月3日70

MICROSOFT 🔥: New MAI Code 1 Flash and MAI Thinking 1 models have been revealed on the official MAI website! Also, MAI Image 2.5, MAI Voice 2, and MAI Transcribe 1.5 are there too. > MAI-Code-1-Flash plans and reasons through complex coding tasks from start to finish, so you spend less time debugging and more time building. > MAI-Thinking-1 (35B active, ~1T total parameters, MoE) has a smaller inference footprint than much larger models, yet is competitive with Claude Opus 4.6 on SWE-Bench Pro. h/t @MeetPatelTech

译微软在官网更新了 MAI 模型系列，重点发布了 MAI Code 1 Flash 和 MAI Thinking 1。MAI Thinking 1 拥有 35B 活跃参数和约 1T 总参数，采用 MoE 架构，其推理成本低于更大型模型，但在 SWE-Bench Pro 上的表现可与 Claude Opus 4.6 竞争。MAI Code 1 Flash 则专注于通过规划和推理来完成端到端的复杂编码任务。此外，MAI Image 2.5、MAI Voice 2 及 MAI Transcribe 1.5 也同步上线。

Chubby♨️@kimmonismus · 6月3日54

GitHub copilot app revealed

译GitHub Copilot 应用曝光

Rohan Paul@rohanpaul_ai · 6月3日72

Factory just introduced Factory Router, a coding-agent model selector. Claude Opus-class results while cutting AI session spend by 20-25%. Reports 99% of Claude Opus 4.7’s Terminal-Bench 2. Basically it works by treating each coding-agent run as a routing decision: it first sends the task to the cheapest model class that should be strong enough for that kind of work, then escalates to a stronger frontier model if the session starts failing or needs deeper reasoning. Frontier AI should be reserved for frontier work.

译Factory推出Factory Router，一个编码智能体模型选择器。它通过将每次编码任务视为路由决策，首先使用最具性价比的模型处理，仅在遇到失败或需要深度推理时升级至更强前沿模型。该方案旨在保持与Claude Opus 4.7相近的性能（报告称达到其Terminal-Bench 2分数的99%），同时将AI会话成本降低20-25%。其核心理念是“前沿AI应保留给前沿工作”。

jason@jxnlco · 6月3日41

10 takeaways from OpenAI’s new report on knowledge work and Codex. codex isnt about coding anymore, but all knowledge work!

译OpenAI关于知识工作与Codex新报告的10个要点。 Codex不再仅限于编码，而是面向所有知识工作！

jason@jxnlco · 6月3日66

You can now observe codex with Logfire and also query Logfire in codex with their new plugins! https://pydantic.dev/articles/codex-logfire-plugins

译你现在可以通过 Logfire 观察 Codex，也可以在 Codex 中通过他们的新插件查询 Logfire！

Rohan Paul@rohanpaul_ai · 6月3日62

Kombai 2.0 just announced an AI design engineer. So product taste, UI design, and real code will be one shared workflow. Most AI coding tools can produce working code, but they often miss spacing, hierarchy, motion, visual polish, and the tiny interface choices. The time has come with AI, when the design-to-engineering handoff should vanish when one AI design engineer understands both the interface and the code.

译Kombai 2.0 被定位为首个AI设计工程师，旨在将产品品味、UI设计和真实代码整合到一个共享工作流中。它指出，现有AI编码工具常忽略视觉细节和交互质感，而设计工具不理解代码库，根源在于设计与工程的传统割裂。Kombai 2.0 致力于弥合这一鸿沟，让设计师能交付代码，工程师无需交接，共同构建富有品味的用户体验。

AYi@AYi_AInotes · 6月3日73

Damn，一个独立开发者用 23.5 小时 + Codex，把 Whoop 5.0 的订阅墙拆了。没有破解，没有越狱，就是一个开源 App，连上你的 Whoop，心率、血氧、恢复数据直接看， zero 订阅费。这大概是订阅制硬件今年最不想看到的故事。这个 App 叫 Goose， GitHub 上完全开源。作者Bennett晒了时间线，从 0 到能连上 Whoop 5.0 并读出 HR、SpO2、皮肤温度、恢复分数，总共 23.5 小时，而且大比例代码是 Codex 写的。能做到这件事，不是因为 Whoop 太弱，是因为它的 BLE 广播协议本身没封死。 Judes Club 之前就有过完整的 Whoop BLE 分析， Goose 本质上是在这个公开协议底子上，用 Rust 搭了座桥，SwiftUI 做了层皮，把原本必须走 Whoop 服务器的数据，直接留在了本地。很多人以为 Whoop 的护城河是硬件精度，其实不是。 Whoop 真正的护城河，是你一旦戴了半年，历史数据、恢复曲线、睡眠趋势全锁在服务器里，你根本懒得走。 Goose 撕开的不是技术缺口，是订阅制硬件最脆弱的那层窗户纸：用户惯性。这就好比住高级酒店和买房的区别。 Whoop 让你交年费，给你一张房卡，房间里确实打扫得挺干净，但你的所有行李、照片、生活习惯，退房那天全得留在房间里。 Goose 不是另盖了一家酒店，是告诉你：这房子本来就有扇后门，你自己拿钥匙进去，东西全带走，不用看前台脸色。过去两年 AI 写代码的故事，大家都在比谁写得更快。但 Bennett 这件事真正的信号是， AI 把一人挑战封闭硬件生态的成本，压到了一天之内。以前逆向硬件需要团队、需要数月、需要固件提取，现在一个开发者 + Codex， 23.5 小时就能让订阅墙看起来像个笑话。当然，我非律师，这只是我作为开发者的技术观察。硬件公司的护城河不会明天消失，但定义已经在变。过去的护城河是传感器精度和 App 封闭，未来的护城河，可能是你愿不愿意把数据主权交出去。当一个人的周末就能拆一扇墙，订阅制硬件的终局可能不是被另一家硬件公司打败，是被一群不想交月租的开发者，一人一天，逐个拆光。所以说，Whoop 真正的对手不是 Apple Watch，是每一个周末有空、手边有 Codex、还觉得 30 美金月费有点贵的开发者。 GitHub 仓库我放一楼了，有用自取。

译独立开发者Bennett利用Codex AI编程工具，在23.5小时内开发出开源App Goose。该应用可直接通过蓝牙读取Whoop 5.0的健康数据，无需订阅。实现基于公开的BLE协议分析，使用Rust和SwiftUI将数据本地化存储。此举暴露了依赖用户数据锁定和惯性构成的订阅制硬件护城河的脆弱性，并展示了AI工具如何降低个人挑战封闭生态的成本。

🚨 AI News | TestingCatalog@testingcatalog · 6月3日68

OPENAI 🔥: New Sites, role-specific Plugins, and Annotations features are rolling out in preview for Business and Enterprise plans. > Today, we’re introducing new ways to do more of your work with Codex: plugins that adapt Codex to your role and tools, annotations that help you refine the result in place, and a preview of the ability to create interactive websites and apps you can share with your workspace using a URL.

译OPENAI 🔥：新站点、角色专属插件和注释功能正面向商业和企业计划用户推出预览版。 > 今日，我们推出使用 Codex 的新方式：可适配您角色和工具的插件、帮助您就地优化结果的注释，以及通过 URL 创建可与工作区共享的交互式网站和应用的预览功能。

Berryxia.AI@berryxia · 6月3日50

今天这个视频又被很多人挖出来转发，是因为啥呢？ 🤔 半个月之前发布的视频，开始动起来了…

译Moonshot AI创始人杨植麟的40分钟视频近日被广泛转发。他在视频中详细拆解了Kimi K2的训练过程，其核心突破在于仅以460万美元的极低成本完成训练。在近期一场8模型实时编程大赛中，Kimi K2获得第一名。杨植麟通过分享强调了极致优化与架构设计的重要性。

Chubby♨️@kimmonismus · 6月3日62

New update: Codex Sites turns your ideas, plans, and work into interactive websites or apps your team can use and share, rolling out first to Business and Enterprise users. https://x.com/OpenAI/status/2061845949170045346/video/1

译新更新：Codex Sites 将您的想法、计划和工作转化为团队可以使用和分享的交互式网站或应用，首先向 Business 和 Enterprise 用户推出。

Rohan Paul@rohanpaul_ai · 6月3日81

OpenAI just gave Codex a major upgrade. From a coding assistant into a workspace builder that can create interactive sites, apps, dashboards, planners, and review tools from plain work instructions. The most important new feature they released is "Sites". i.e. Codex can generate a hosted interactive workspace instead of only producing a document, spreadsheet, slide, or code file. OpenAI is also adding plugins for different jobs, so Codex knows how to help analysts, marketers, sales teams, product designers, investors, and bankers using the tools they already use. A data analyst might ask Codex to explain why sales dropped, then Codex could pull from data tools and create a dashboard. A sales team might ask Codex to prepare for a customer meeting, then Codex could collect account history, risks, follow-ups, and next steps into one shared page. The third feature is annotations, which means you can click a specific part of the result and ask Codex to fix only that part. Codex already reaches 5M weekly users, and OpenAI says 20% of them are now non-developers, with that group growing over 3x faster than developers.

译OpenAI 为 Codex 带来重大升级，将其从编码助手转变为可构建交互式工作空间的“空间构建器”。核心新功能“Sites”能生成托管的交互式工作区，而不仅是文档或代码文件。同时新增插件以适配不同职业，并推出“标注”功能允许用户对结果的特定部分进行修复。Codex 目前拥有500万周活跃用户，其中20%为非开发者，该群体增长速度是开发者的3倍以上。“Sites”功能正面向 Business 和 Enterprise 计划推出。

向阳乔木@vista8 · 6月3日75

这个有点厉害，Codex 出 Python SDK了。安装指令：pip install openai-codex 整合到自己的代码中，相当于直接内置了顶级编程和生图Agent？最关键的是，可以复用 Codex 登录态。

Replit ⠕@Replit · 6月3日63

Replit Canvas has a few new updates! ⭐️ Learn more at: http://replit.com/canvas Open thread 🧵 ↓

译Replit Canvas 有一些新更新！⭐️ 了解更多请访问：http://replit.com/canvas 展开讨论 🧵 ↓

Replit ⠕@Replit · 6月3日70

Using Parallel Agents to Move Faster in Replit https://x.com/i/broadcasts/1NxarrEMVOnKj

译在 Replit 中使用并行智能体来提升速度 https://x.com/i/broadcasts/1NxarrEMVOnKj

Chubby♨️@kimmonismus · 6月3日47

Incredible. The best design work doesn’t happen in a chat box. The fact that you can generate motion assets inside the canvas skips your image-gen hops. Super cool that changes here get synced back to your codebase.

译Kombai 2.0 被定位为首个AI设计工程师，旨在融合设计与工程。该工具允许用户在画布内直接生成动画素材，跳过了传统图像生成的中间环节，并能将设计变更同步回代码库。其目标是打破设计和工程分属不同工作流的旧模式，服务于一个设计师能交付代码、工程师寻求无缝集成、所有人都想构建优秀用户体验的新世界。

Rohan Paul@rohanpaul_ai · 6月3日65

Another brilliant launch removing friction from front-end development. Kombai just launched a frontend-specific AI coding agent and it beats general coding agents on real repo tasks. The problem with generic agents is that they often fail frontend work because UI code mixes visual judgment, component reuse, CSS behavior, browser bugs, accessibility etc. Kombai is attacking that problem with specialization: it reads design context, browser state, existing components, hooks, design tokens, and DevTools data so the agent can edit the product the way a frontend engineer would. checkout their demo, where it adds a complex feature to an OSS codebase with 500K+ lines of code. They also open-sourced the dataset that anyone can use to benchmark agents for complex front-end tasks.

译Kombai 推出了首个专用于前端开发的AI编程智能体。针对通用智能体在处理前端任务时的不足，Kombai 通过读取设计上下文、浏览器状态、组件等数据，像前端工程师一样进行代码编辑。推文称，Kombai 在真实代码库任务上的表现超越了 SOTA 模型和通用编程助手，并在一个超过 50 万行的开源代码库中演示了添加复杂功能。此外，Kombai 还开源了一个可用于评测复杂前端任务的基准数据集。

Chubby♨️@kimmonismus · 6月2日56

OpenAI is on a winning streak: Codex passed 4M weekly users, 5x since February. Knowledge workers are now a fifth of them, growing 3x faster than developers. The tool OpenAI built for coders is being adopted fastest by people who don't code. All figures from OpenAI's own report, shared first with Axios.

译OpenAI 势头正盛： Codex 周活用户突破400万，自2月以来增长5倍。知识工作者现占用户总数的五分之一，其增速是开发者的3倍。这款 OpenAI 为程序员打造的工具，正被非编程人群最快地采用。所有数据均来自 OpenAI 自己的报告，该报告首先分享给了 Axios。

StepFun@StepFun_ai · 6月2日73

Open weights are moving from model cards into real coding workflows. Step 3.7 Flash is designed for fast agentic coding, reliable tool calling, and multimodal understanding. Big thanks for the blog from the @kilocode team: https://blog.kilo.ai/p/new-models-from-stepfun-and-minimax

译阶跃星辰发布 Step 3.7 Flash 模型，强调其为快速智能体编程设计，具备可靠的工具调用与多模态理解能力。该模型采用开放权重。同期，MiniMax 也开源了 M3 模型。两者已均在 Kilo 中上线。此次发布凸显了开放权重模型正从模型卡片走向实际编程工作流的趋势。

ginobefun@hongming731 · 6月2日55

从 Markdown 文稿到视频：Cursor + Remotion+ FFmpeg

MiniMax (official)@MiniMax_AI · 6月2日72

Watch M3 reach the frontier 🚀

译MiniMax发布M3模型，宣称是首个将编程与智能体能力、1M上下文长度及原生多模态三大前沿能力结合的开源权重模型。其编程与智能体能力在多个评测中表现突出：SWE-Bench Pro得分59.0%，Terminal Bench 2.1得分66.0%，SWE-fficiency 34.8%，KernelBench Hard 28.8%，MCP Atlas 74.2%。模型通过MiniMax Sparse Attention技术支持1M上下文。官方提供了API接入与新的MiniMax Code服务，模型权重和技术报告预计约10天后发布。

Ethan Mollick@emollick · 6月2日70

Big paper on AI coding agents using Github & other data The auto-complete tools (Copilot) led to 2.2x more code, local agents like original Claude Code led to 7.4x, & current remote coding agents 17.3x(!) But human bottlenecks in coding means actual releases "only" went up 30%

译关于使用Github及其他数据的AI编程智能体的重要论文自动补全工具（如Copilot）使代码量增加2.2倍，本地智能体（如初版Claude Code）增加7.4倍，而当前远程编程智能体增加17.3倍（！）但编程中的人类瓶颈意味着实际发布量“仅”增加了30%

Tibo@thsottiaux · 6月2日27

You can just codex ... a farm https://chatgptpro.substack.com/p/hiroki-tomiyasu

译你只需用 Codex 就能……生成一个农场 https://chatgptpro.substack.com/p/hiroki-tomiyasu

MiniMax (official)@MiniMax_AI · 6月2日78

Watch open source reach the frontier. 🚀

译MiniMax宣布推出首个开源权重模型M3。该模型结合了三大前沿能力：在编程与智能体方面，它在SWE-Bench Pro等评测上取得了具体分数；通过MiniMax Sparse Attention技术，其上下文窗口可扩展至1M tokens；并且模型从零开始原生支持多模态。模型的权重与技术报告将在约10天后发布。

小互@xiaohu · 6月2日28

今晚 codex 将有大更新…

Chubby♨️@kimmonismus · 6月2日53

OpenAI is releasing a major Codex update tomorrow. Months in development, something quite special. It certainly sounds different from GPT-5.6.

译OpenAI 将于明天发布一次重要的 Codex 更新。经过数月开发，这将是一个相当特别的功能。它听起来与 GPT-5.6 确实不同。

meng shao@shao__meng · 6月2日54

$10K Cursor Credits 到期了，很想念它 😄 5月放开用 Cursor，差不多用了 $2K，大致整理了 Cursor 使用体验： · 100% 时间都在用 Agent Windows，传统 IDE 界面没打开过 · 多模型切换用的很少，有🪜时用 GPT-5.5 多，没🪜时用 Composer 2.5 多，Composer 2.5 Fast 模式确实块，而且它很喜欢输出 Diagram 流程图 · Context 使用明细确实方便 · Cursor 的 Agent 输出界面默认不是 Markdown，也不支持拷贝为 Markdown，这一点有点不方便，我基本都要指定它输出到 Markdown 文件中 · Agent 界面右侧的扩展窗口比较好用，Terminal、Browser、File 和 Canvas 都有各自用处最后的最后，还有一个小惊喜，5月份除了送 $10K Credits，还送了两个月 Ultra 订阅吗？

译用户邵猛在获得 Cursor 团队赠送的 $10K 额度及两个月 Ultra 订阅后，实际使用了约 $2 千美元。其使用体验显示，100% 的时间都在使用 Agent Windows 模式，很少进行多模型切换。模型选择上，有外网访问时偏好 GPT-5.5，否则常用 Composer 2.5，因其 Fast 模式速度快且喜欢输出 Diagram 流程图。优点在于 Context 使用明细清晰，但 Agent 输出界面默认非 Markdown 且不支持拷贝为 Markdown。此外，Agent 界面右侧的扩展窗口（Terminal、Browser、File、Canvas）被认为比较实用。