The capabilities of Claude Code and Codex have expanded a lot in recent months, they added many ways to approach work (subagents, skills, goal, workflows, plugins, etc). Given the AI labs can use their own AI to help documentation, a surprising amount is effectively undocumented

译近几个月来，Claude Code和Codex的能力大幅扩展，增加了许多工作方式（子智能体、技能、目标、工作流、插件等）。考虑到AI实验室可以用自己的AI来辅助文档编写，令人惊讶的是，大量功能实际上没有文档。

karminski-牙医@karminski3 · 6月4日64

给大家带来 MiniMax-M3 实测! 本次测试包含了复杂前端, 后端 Agentic Coding, Agent 能力测试, 以及我的使用经验总结. 来看结论: 前端能力上, 可以完全适配 KCORES2026p2 的前端测试题目, 无论是空间理解, 建模精确度, 场景美学都十分在线, 其中我最满意的是美学部分, 它的颜色运用非常好. 不足的地方主要体现在复杂需求不能一次性写对(比如光追引擎), 需要迭代一下就可以了. 后端能力测试这次也是突飞猛进, 得分超过了 deepseek-v4-pro 和其他一众国产大模型, 略逊于 GPT-5.4-Pro(xhigh). Agent 能力上表现同样亮眼, 达成了榜单第二的接单量, 证明它的规划能力特别强。下面是我在测试和实际使用中, 总结出来的 M3 使用经验, 供大家参考: 我的体感是 M3 特别喜欢推理, 它可以单次执行超长的推理. 在咱们的这些前端测试中, 它最长的输出甚至达到了我规定的 64k token上限, 所以, 不要上来就写一个超级复杂的 prompt 让它执行, 而是需要先把需求形成 plan, 然后让 agent 蜂群去执行, 这样才能得到理想的效果, 所以 M3 先天适合放在带 plan 模式的 Coding Agent 中使用. 如果把它嵌入到 Agent 框架中使用, 那么 prompt 编排就一定要做好, 不要一股脑把大量的 tool call 或者超大的 system prompt 丢给它. 还是需要下功夫好好编排一下的. 本次 M3 相比之前的 2.7 版本有了大幅度的提升, 模型偏好上来看, M3 是一个规划能力极强的模型, 所以特别适合用在一些规划性质的 Agent 框架中, 比如任务拆分, 日程管理, 流程设计等. 而本次暴露出来的不足则是执行过程中约束不够强, 比如 prompt 中设置的复杂规则, 一定要增加代码级别的 harness 闭环流程来进行约束, 而不能只靠模型本身来管理自己的行为. #minimaxm3 #minimax #agenticcoding #aiagent #harness

译MiniMax-M3 实测：前端适配 KCORES2026p2，空间理解、建模精度、美学表现优秀，颜色运用佳；复杂需求如光追引擎需迭代。后端得分超 deepseek-v4-pro 及国产模型，略逊 GPT-5.4-Pro (xhigh)。Agent 能力达榜单第二接单量，规划突出。使用经验：M3 偏好长推理，单次输出可达 64k token，适合嵌入带 plan 模式的 Coding Agent，需做好 prompt 编排，避免大量 tool call；执行约束不足，需增加代码级 harness 闭环。

AYi@AYi_AInotes · 6月4日64

关于Codex的使用，分享下我的一些思考，如果从前几天我分享的使用AI的底层心法是以道御术的角度看，省额度是术，看清生产力归谁管是道。如果你也在用 Codex，并且习惯把额度省着点用—— 我劝你看完这条再决定要不要继续省，昨天那次 reset，可能正好把你攒的 buffer 覆盖掉了。 OpenAI Codex 负责人 Tibo（@thsottiaux）发帖，说过去 24 小时出了 3 次小可靠性事故，给所有付费计划统一重置了额度，配了一句 May the tokens flow again。评论区一片感谢，刷屏 Saint Tibo、he did it again，我翻了一圈，想说一句可能没人爱听的话，你这几天精打细算省下来的额度，大概率是白省了。先讲讲 Codex 这套额度怎么算的，没按 token，而是按推理时间（reasoning time）算的。一个 5 小时窗口，本地和云任务共用。据社区实测，Plus 计划下 GPT-5.4 大概跑 40 分钟推理就把这 5 小时额度烧到 100%，GPT-5.3 大概 60 分钟。也就是说你开个 /goal 让它自己 plan→act→test→iterate 连轴转，额度掉得比你想象快得多，你只看到一个百分比，看不见它每分钟在烧多少。现在把 reset 叠进来看，据社区讨论，这个 reset 很多时候不是凭空补额度，是把你下一个计费周期的起点往前提了。所以那些 reset 后立刻开跑的人，有人一口气跑了 11 小时＋推理；而你为了周末大项目辛辛苦苦攒的 buffer，一次 reset 直接被覆盖归零。省的人亏，冲的人赚。 4、5 月到这次，Tibo 已经 reset 好几轮了，这不是偶发，属于常态了。所以在现在这套规则下，精打细算反而是次优解。不是让你瞎浪费，是这系统在奖励立刻消耗的人，你得顺着它的规则走。但真正让我在意的，不是怎么省额度，是这件事意味着什么。把 Codex、把额度、把 reset 这几个词去掉，这是所有用云端 AI 干活的人的同一个故事，你的生产力，不在你手里，在一套你看不透、也补偿得不透明的系统手里。今天 Tibo 心情好给你 reset，明天他换岗了呢。靠一个好心负责人的 goodwill 续命的信任，他在的时候特别暖，他一走，账要一次性集中还。所以这事真正的解法，不是蹲着等下一次 reset，是别把生产力全押在一个你控制不了的池子里。本地模型兜底 + 云端冲峰值，自己记一份 burn rate（40 分钟≈100%，倒推 4 分钟≈10%），把节律攥回自己手里。我觉得AI 工具的下一道分水岭，已经不在模型多强了，关键看是我们的生产力到底归谁管。

译OpenAI Codex 负责人 Tibo 因 24 小时内 3 次可靠性事故，重置所有付费计划额度。Codex 按推理时间计费：Plus 下 GPT-5.4 约 40 分钟耗尽 5 小时窗口，GPT-5.3 约 60 分钟。重置常将下个计费周期提前，导致精打细算攒的额度被覆盖，立即消耗者反而获得更多推理时间。作者认为系统奖励即刻消耗，建议本地模型兜底、云端冲峰值，夺回生产力控制权。

meng shao@shao__meng · 6月4日59

ChatGPT App (Android) Add Codex shortcut.

译ChatGPT App (Android) 新增 Codex 快捷方式。

meng shao@shao__meng · 6月4日17

人肉总结: 选计算机专业！

译一条推文围绕“是否该选计算机专业”展开：引用称AI导致程序员大量失业，劝别选；主推文则坚持选。引用虚构一家startup全员AI native，CTO带中文系硕士用AI半个月写操作系统并自建微信、飞书、Office，卖数千万，导致腾讯、字节VP上门求停，飞书总裁承认6000员工五年成果被AI三个月颠覆。

宝玉@dotey · 6月4日57

最近 Codex GPT-5.5 给我的感觉是干活不如 Claude Opus 4.8，当然可能是因为我在开发 Mac 应用，Opus 更擅长一些

译宝玉 (@dotey) 表示，Codex GPT-5.5 在干活上不如 Claude Opus 4.8，尤其在开发 Mac 应用时 Opus 更擅长。@jesselaunz 也反馈 Codex 突然“降智”，原本预期 2 天的目标仅 20 分钟就交付，用户给出了评分以来最低的 5/10 分。

meng shao@shao__meng · 6月4日50

2026 年 6 月 18 日起 Gemini CLI 和 Gemini Code Assist 的部分免费/消费级接入将停止服务，但企业版和付费 API key 接入并不受这次变化影响。 Gemini CLI 我都还没用过 🤦🏻‍♀️

译2026年6月18日起，Gemini CLI和Gemini Code Assist的部分免费/消费级接入将停止服务，但企业版和付费API key接入不受影响。推文作者表示尚未使用过Gemini CLI。

meng shao@shao__meng · 6月4日65

Cursor Debug Mode 有什么用？核心问题：AI Agent 修复 Bug 的局限性 @ericzakariasson 指出，传统的 AI Agent 在处理 Bug 时通常依赖静态推理： · 阅读代码 → 形成理论假设 → 直接修改代码 → 期望修复成功 · 这种方式经常产生“看起来自信但实际掩盖了真正 Bug”的假修复这反映了当前 AI 编码工具的普遍痛点：缺少运行时真实证据，只能靠模型的先验知识和代码上下文“猜”。 Debug Mode 的解决方案与工作原理 Debug Mode 的核心理念是：让 Agent 通过运行时日志获取证据，而不是纯猜测。具体循环流程： · Agent 对 Bug 提出多个假设，并优先处理最合理的那个。不直接修改实现代码，而是先添加临时日志来验证假设。 · 通过一个轻量级的调试服务器，将程序运行时的输出收集到 .cursor/debug.log 文件中。 · 用户手动复现 Bug，Agent 随后读取日志，基于真实运行数据理解问题根源。 · Agent 定位根因后，进行真正修复，并自动移除之前添加的临时日志。这个过程将“猜测”转变为“基于证据的诊断”，显著提升了修复的可靠性和透明度。帖中附带了一个真实 Bug 的演示视频，直观展示了整个流程：Agent 添加日志 → 用户复现 → 读取日志 → 精准修复。实际应用案例（Cursor 团队内部使用） · 概率性 Race Condition（1/20 概率出现，破坏 Git 元数据）：传统方式极难复现，Debug Mode 在不到一小时内定位。 · 内存泄漏：通过日志一次追踪到前端框架误用，修复仅需一行代码。 · C++ 原生崩溃（Electron 崩溃）：原本大家倾向于绕过，日志让问题变得可定位。 · SSR 闪烁/渲染 Bug：长期被放弃的顽疾，通过运行时页面行为观察得以修复。这些案例覆盖了并发、内存、本地崩溃、UI 渲染等不同类型问题，显示 Debug Mode 对难以通过静态分析诊断的 Bug 特别有效。

译Cursor 推出 Debug Mode，解决传统 AI Agent 依赖静态推理易产生“假修复”的问题。其核心是通过添加临时日志、让用户复现 Bug，收集运行时证据进行诊断，再自动清除日志。Cursor 团队内部案例显示，该模式能高效定位概率性竞态条件、内存泄漏、C++ 原生崩溃及 SSR 渲染等难以静态分析的 Bug，将“猜测”转为“基于证据的诊断”。

jason@jxnlco · 6月4日52

He actually pressed the reset rate button three times in quick succession.

译过去24小时内，Codex 发生了三次独立的小事故，影响其可靠性。团队已重置所有付费计划的使用限制，希望 token 再次顺畅流动。对此，Jason Liu 评论说，他实际上连续按了三次重置速率按钮。

Tibo@thsottiaux · 6月4日59

Hi. Over the last 24 hours we had three separate small incidents that affected Codex reliability. Those are three too many and we are taking active steps for them to not reproduce. I have reset usage limits for Codex across all paid plans. May the tokens flow again.

译嗨。过去24小时内，我们发生了三起独立的小事故，影响了Codex的可靠性。这三次太多，我们正在采取积极措施以防再次发生。我已重置所有付费计划的Codex使用限制。愿token再次流动。

Berryxia.AI@berryxia · 6月4日37

卧槽！这下Codex真的要起飞了……

宝玉@dotey · 6月4日26

请教：Claude Code （Desktop）总是弹窗要确认权限，有没有办法避免总是要 Allow，很烦人，已经启用了 Bypass Permissions

Greg Brockman@gdb · 6月4日25

fly with codex

译是时候起飞了。与 Codex 一起飞翔。

OpenCode@opencode · 6月4日59

Qwen3.7 Plus now available in Go text · image · 1M context cheaper than 3.6

译Qwen3.7 Plus 现已在 Go 中可用，支持文本和图像，1M 上下文，比 3.6 更便宜。

StepFun@StepFun_ai · 6月4日44

Great demo by @atomic_chat_hq. Step 3.7 Flash was designed for real-world agentic coding tasks — not just generating code fast, but keeping logic, visuals, and execution coherent across complex outputs. Love seeing builders test it in creative ways!

译阶跃星辰（StepFun）称其 Step 3.7 Flash 在与 DeepSeek V4-Flash 的物理编程测试中全面胜出。测试要求在不使用库的情况下，生成一个包含高尔顿板、旋转六边形弹球和同步节拍器三个场景的自包含 HTML5 canvas 动画，并实现真实物理。Step 3.7 Flash 输出 59.6k tokens（耗时 9分57秒），DeepSeek V4-Flash 输出 52.5k tokens（耗时 6分21秒）。尽管 DeepSeek 更快，但 StepFun 模型在物理模拟、视觉效果和逻辑渲染上均占优。主推文指出 Step 3.7 Flash 专为真实世界 agentic 编码任务设计，能保持复杂输出中逻辑、视觉和执行的一致性。

eric zakariasson@ericzakariasson · 6月4日74

http://x.com/i/article/2061967596568875008 # Don't let your agent guess, give it runtime context If you've ever watched an agent try to fix a bug, you've watched it guess. It reads the code, comes up with a theory, makes an edit, and hopes. Sometimes it's right. A lot of the time you get a fix that looks confident and quietly hides the real bug. Debug Mode is what we built for that. Instead of sitting there reasoning about the code, the agent goes and gets evidence about what the code does when it runs. Here's the loop 1. Agent comes up with multiple hypotheses, and starts to work on the most plausible first 1. Then, logging is added to test one hypothesis (without touching implementation) 1. A little debug server collects the runtime output to .cursor/debug.log while your program runs. 1. You reproduce the bug, and agent can now read the logs and understand what happened instead of having to guess 1. Cursor finds the root cause in the logs, makes the fix, and pulls out the logging it added. Here it is on a real bug, sped up to about a minute: ## How the team uses it Some interesting things that we've solved internally with debug mode: - A race condition that hit 1 in 20 runs. It was corrupting git metadata in our best-of-N runs. Debug Mode pinned it down in under an hour - A memory leak, traced in one pass. It came down to a misuse of our frontend framework. The fix was a single line. - A native crash deep in C++. An Electron crash people would normally route around. The logs made it findable. - An SSR flicker that had been given up on. A rendering bug nobody wanted to touch, fixed once the agent could see what the page was doing at runtime. Try it with Shift+Tab (it's in the CLI too, via /debug). I'm sure people are using it in ways I haven't thought of, so let me know!

译Cursor 发布 Debug Mode，解决 AI 智能体靠猜测修 Bug 的问题。工作流程：Agent 先生成多个假设，为最可能的假设添加日志（不修改代码）；调试服务器在程序运行时收集输出到 `.cursor/debug.log`；用户重现 Bug 后，Agent 读取日志而非猜测；最后 Cursor 从日志找到根因并修复，自动移除添加的日志。内部案例：追踪 1/20 概率出现的 git 元数据竞争条件（1 小时内定位）；一次单趟追踪内存泄漏（修复仅一行）；定位 Electron 中 C++ 原生崩溃；修复此前无人敢碰的 SSR 闪烁问题。用户可通过 Shift+Tab 或在 CLI 中使用 `/debug` 触发。

向阳乔木@vista8 · 6月3日63

GPT 5.5 Pro 调研生成了一份关于 Codex 的Goal指令如何用的文档。仔细阅读学到了两个技巧： 1. 觉得写不好goal时，先用plan模式，让AI反问自己一些问题，让AI帮收敛写Goal指令。提示词模板： /plan Help me turn this vague task into a strong Codex goal. Interview me for missing success criteria, verification commands, constraints, boundaries, iteration policy, and blocked stop conditions. Then draft a final `/goal ...` command. 2. 写好Goal的六要素：结果、验证、约束、边界、迭代和阻塞条件官方标准模板如下： /goal [Outcome]. Verification: [commands/artifacts/evidence]. Constraints: [what must not change]. Boundaries: [allowed writes / forbidden paths]. Iteration policy: [one focused change, rerun checks, log progress]. Stop when: [evidence proves completion]. Pause if: [blocked conditions / human decisions / budget cap]. 详细调研报告见评论区，有不少模板可直接用。

译GPT 5.5 Pro 调研生成了一份 Codex 的 Goal 指令使用文档，分享两个技巧：1. 写不好 Goal 时先用 /plan 模式，让 AI 反问用户来完善命令，提示词模板为 `/plan Help me turn this vague task into a strong Codex goal...`；2. 写好 Goal 的六要素：结果、验证、约束、边界、迭代策略、阻塞条件。官方标准模板为 `/goal [Outcome]. Verification: [...] Constraints: [...] Boundaries: [...] Iteration policy: [...] Stop when: [...] Pause if: [...]`。详细报告含多个可直接使用的模板。

OpenRouter@OpenRouter · 6月3日58

The Pareto Router is now processing almost 1B tokens per day: https://openrouter.ai/openrouter/pareto-code The Auto Router is processing 12B: https://openrouter.ai/openrouter/auto See the @theinformation's article below 👇

译OpenRouter 宣布其智能路由系统处理量大幅增长：Pareto Router（编程专用）每日处理近 10 亿 tokens，Auto Router 每日处理 120 亿 tokens。Pareto Router 让用户设定智能等级和成本上限，系统自动选择最佳模型执行编程任务，以节省 AI 编程成本。此外，工作区功能允许设置最大使用量，进一步控制支出。

Lee Robinson@leerob · 6月3日61

"Engineering, product, and design are all merging into a 'builder' role" Yeah... I'm not so sure. This feels like an oversimplification and podcast talking point. Reality is a lot more complex. Even with 1000 "Member of Technical Staff" titles, someone still has to wake up and care 100x more about Product or Design than anyone else. It is their Main Thing™ That's not to say MTS titles are universally bad, but I think they're an example of this 'builder' talking point that's become bastardized. AI and coding agents have made generating code easy and yet... you're in for a world of pain if non-engineers ship a bunch of slop and don't have great engineers to tame the complexity. The SF hivemind has a tendency to overfit what works at startups for every company. And to be fair, sometimes this is true! Startups can be a leading indicator for how the industry is changing and often cause disruption. However, it is going to be incredibly hard to disrupt the extremely human parts of corporate jobs. You really think there's going to be a PM who also does some engineering and design on the side at JPMorgan Chase? This is true for the simple parts of most jobs, like people wanting to have ownership over something and do good work, move up a career ladder, support their family, get paid well, make an honest living... And also the hard parts: internal politics, some critical business system that has a bus factor of 1 which has been running for 15 years and isn't documented anywhere because it's that guy's job security. The real world has a lot of this stuff. It's easy to pontificate about all roles collapsing but it's actually really nice to have a specific person or team who is an expert in one thing that you can work with. I don't expect that to change. Further, I think AI disruption to knowledge work will take decades to play out because it is more fundamental to the human condition (e.g. sociological/organizational) than pure intelligence.

译Lee Robinson 认为该说法是过度简化的播客话术。现实更复杂：即便大量“技术专家”存在，仍需要有人百分百专注产品或设计；AI 虽让生成代码变易，但缺乏优秀工程师会导致灾难。硅谷常把创业公司经验套用于大公司，却难以颠覆内部政治、遗留系统等极度人性化的部分。他判断 AI 颠覆知识工作需要数十年，因为本质是社会/组织问题，而非纯智力问题。

向阳乔木@vista8 · 6月3日63

Codex 小技巧：一台电脑远程指挥另一台写代码如果你多台电脑都安装了 Codex，且登录ChatGPT账号。可以在设置 -> 连接 -> 控制其他设备，添加其他电脑。这样设置后，本机创建项目时，能选添加远程项目。比如远程控制家里电脑中的Codex工作。

译若多台电脑均安装 Codex 并登录同一 ChatGPT 账号，可在设置 -> 连接 -> 控制其他设备中添加其他电脑。之后本机创建项目时即可选择添加远程项目，例如远程控制家中电脑上的 Codex 进行代码编写。该功能无需额外配置，利用账号同步实现跨设备协作。

小互@xiaohu · 6月3日16

Codex 成瘾患者正在接受治疗... 😅

Alibaba Cloud@alibaba_cloud · 6月3日44

What if you could code faster, spend less, and ship predictably without compromising your stack? 🚀 Whether you're shipping solo or scaling as a team, Agentic Coding helps you to: ⚡ Accelerate development cycles with AI that handles the heavy lifting 💰 Lock in predictable costs — fixed monthly quotas, zero surprise bills 🔌 Integrate instantly — connect your favorite AI tools with zero friction, zero downtime 🎯 Ship faster, smarter — focus on innovation while Qwen handles the complexity 👉 See the Agentic Coding stack in action and save up to 70%: https://int.alibabacloud.com/m/1000413949/ #AlibabaCloud #Qwen #AI #Coding #Programming

译阿里云推出基于 Qwen 的 Agentic Coding，帮助开发者加速开发周期、锁定可预测成本（固定月配额，零意外账单），并能无缝集成主流 AI 工具。官方称使用该方案可节省高达 70% 的成本，同时保持技术栈不变。

Greg Brockman@gdb · 6月3日73

codex for computer work is growing very fast

译Codex 的计算机工作应用增长非常快。

jason@jxnlco · 6月3日43

Love cloudflare

译喜欢 Cloudflare。

AYi@AYi_AInotes · 6月3日68

哇偶，Claude 官方这个 ant CLI 有点意思啊，把 Claude Platform 全套 API 塞进终端，每个端点都能通过命令行直接跑。 ant 是 Claude Platform 的原生命令行工具，Messages API、hosted agents，结果直接 pipe 进 shell，不用翻文档拼 curl。 Ant能解决什么问题？以前调 Claude API 要：翻文档 → 拼 HTTP → 处理 JSON → 写脚本封装，现在：终端里直接调，输出直接进你的 pipeline，agent 也能从命令行启动。怎么用Ant？ ant CLI 被设计成 coding agent 友好型，Claude Code 用 claude-api skill 就能读懂它，你的 agent 不光能写代码，还能直接调用 Claude 官方 API 干活。一些实用场景： 1. 批量处理本地文件，直接 pipe 给 Claude 分析 2. shell 脚本里自动化调用，省掉 Python 胶水代码 3. CI/CD 流水线里集成 Claude 能力 4. Claude Code 里让 agent 自己调 API，闭环更深说白了，Claude 正在从网页聊天工具往终端基础设施切。对于写代码的人，终端就是主场，那么它这次直接切进了你的主场。视频 30 秒，建议先扫一眼 👇

译Claude 推出了名为 ant 的 CLI 原生工具，它将 Claude Platform 的 Messages API、托管 Agent 等全部 API 端点集成到了命令行中。用户现在可以直接在终端调用这些功能，并将结果通过管道（pipe）输出到 shell，省去了以往翻阅文档、拼接请求和处理 JSON 的步骤。该工具对 coding agent 友好，Claude Code 能通过 claude-api skill 理解并使用 ant，从而更直接地调用官方 API。这标志着 Claude 正从网页工具延伸向终端基础设施。

宝玉@dotey · 6月3日60

Codex 这个小功能我很喜欢，直接一键 commit changes，自动生成 commit message

Berryxia.AI@berryxia · 6月3日29

Codex 刚刚遇到多次 exceed retry limit 429的错误，好像挺多人遇到，这又是闹哪出？

Ethan Mollick@emollick · 6月3日54

Had Claude Code build a snake game where the snake becomes aware it is in the game and then... stuff happens. Some impressive creative decisions by the AI (& also some very AI ones), I just gave a first prompt and some feedback on the game as it went. https://snake-awakening.netlify.app/

译让 Claude Code 构建了一个贪吃蛇游戏，其中蛇意识到自己身处游戏之中，然后……事情发生了。AI 做出了一些令人印象深刻的创意决策（也有一些非常“AI”的决策），我只给了第一个提示词，并在游戏进行中提供了一些反馈。https://snake-awakening.netlify.app/

宝玉@dotey · 6月3日52

虽然很多人吐槽 Opus 4.8，但是写 Mac App UI 真的强，Claude Design 设计出来，用 Opus 4.8 去实现，还原度相当不错。感觉我要发布一个 Mac App for X 了

译推文指出，尽管有人批评 Opus 4.8，但它在编写 Mac App UI 时能力很强，配合 Claude Design 使用，界面还原度相当不错。作者同时引用了对 Cursor Agent 的评价作为对比：在常用 GUI Agent 中排名为 Codex App、Cursor 和 Claude Desktop。Cursor 的亮点包括支持多任务并行和灵活选择模型，Plan 模式步骤详细稳定；不足是暂不支持 /goal、手机版，且调试功能仅有内置浏览器。

jason@jxnlco · 6月3日39

We’re aware of another codex issue with too many requests.

译我们已知悉另一个 Codex 问题，即请求过多。

AYi@AYi_AInotes · 6月3日57

Damn，Codex真的要杀疯了😭 最近1-2年爆发的上千家初创公司都得完蛋了，尤其是vibe coding、prompt-to-app工具和无代码内部工具平台这些

译天哪，Codex 真的要大杀四方了😭 最近1-2年爆发的上千家初创公司都得完蛋了，尤其是 vibe coding、prompt-to-app 工具和无代码内部工具平台这些

SemiAnalysis@SemiAnalysis_ · 6月3日64

OPINION: Codex Desktop App UX & in-app browser is so good for vibing now. Once the OpenAI base model gets better at design, I can imagine codex beating Claude Code CLI soon on SemiAnalysis VibeMAX benchmark just due to better UX. Right now Claude is S tier on VibeMAX & Codex is A+ tier on VibeMAX. Anthropic over investing in Claude Code terminal CLI & underinvesting in Claude Code Desktop App is a fork in the road in the wrong direction.

译观点：Codex桌面应用UX和内置浏览器现在非常适合“氛围编程”。一旦OpenAI基础模型在设计能力上提升，我预计Codex凭借更好的UX，很快就能在SemiAnalysis VibeMAX基准上超越Claude Code CLI。目前Claude在VibeMAX上是S级，Codex是A+级。Anthropic过度投资Claude Code终端CLI，而对Claude Code桌面应用投入不足，这是走错了岔路。

向阳乔木@vista8 · 6月3日26

越来越喜欢用Codex了，身边朋友也是。今天让朋友写个新书推荐语，发了书稿样章，朋友说待会我丢给 Codex 😂 查看最近Codex的Token统计，已不间断连续用了11天，最长任务8小时。欢迎留言晒数据，打开 Codex -> 个人资料能查看。

译推文表达了对 Codex 工具的喜爱。用户提到让朋友用 Codex 撰写新书推荐语，并分享了自己的使用数据：已不间断连续使用 11 天，单次最长任务时长为 8 小时。推文最后邀请其他用户在 Codex 个人资料中查看并分享自己的 Token 使用统计。

歸藏(guizang.ai)@op7418 · 6月3日55

Codex 昨晚上线的这个 Site 插件非常厉害。它本质上感觉类似于 Claude Design，帮你设计和生成网页，同时还帮你部署好了，可以直接给别人访问。比较遗憾的是 Pro 用户不能用，只有那些 Business 和有组织的用户可以用。

译Codex平台近日上线了名为Site的新插件。该插件功能类似于Claude Design，能够帮助用户设计并生成网页，并自动完成部署，生成可直接访问的链接。目前此功能的使用权限受限，Pro用户无法使用，仅向Business及组织类用户开放。

MiniMax (official)@MiniMax_AI · 6月3日71

Day-0 on SiliconFlow and 50% off 🔥 the first week frontier coding, 1M context, and native multimodal, all in one open-weights model. This is what we built M3 for. Go try it 👇

译MiniMax 官方宣布，其开源权重模型 M3 已在 SiliconFlow 平台上线，并提供为期 7 天的 50% 限时折扣。该模型号称是首个结合编程与智能体能力（在 SWE-Bench Pro 上超越 GPT-5.5 和 Gemini 3.1 Pro）、通过 MiniMax Sparse Attention 支持 100 万 token 上下文窗口、并原生支持多模态（涵盖图像、视频与计算机使用）的三大前沿能力的开源模型。SiliconFlow 当前优惠价为：缓存 $0.06、输入 $0.30、输出 $1.20 每百万 token（原价 $0.12/$0.60/$2.40）。

Greg Brockman@gdb · 6月3日61

Build and launch apps to your team, using Codex:

译使用 Codex 为你的团队构建并发布应用： [引用 @OpenAI]：构建应用从未如此简单。借助 Sites，Codex 可以将你的工作、想法和计划转化为一个交互式网站或应用，你的团队可以通过一个 URL 进行探索、使用和分享。该功能将首先向 Business 和 Enterprise 计划用户推出，之后再逐步扩大范围。

Berryxia.AI@berryxia · 6月3日74

老树开新花了，这个老大哥微软今天发布新模型了😄 刷一波存在感哈哈哈，不然都没有人记得了~ Microsoft AI今天直接甩出七个全新MAI模型。官方说：不是简单迭代，而是从零开始、干净数据血统、零蒸馏训练的一整个家族。 MAI-Thinking-1主推理、MAI-Code-1-Flash主编码、MAI-Image-2.5主图像、MAI-Transcribe-1.5主转录、MAI-Voice-2主语音，还有各自的Flash版本。最狠的是MAI-Code-1-Flash，直接在SWE-Bench Verified上干到71.6，比Claude Haiku 4.5高5分，Pro榜单高16分，还省60% token，现在已经在Copilot里逐步上线。 MAI-Image-2.5在Arena图像编辑排第二、文本生图排第三，精准保留人脸、logo和细节，已经直接塞进PowerPoint和OneDrive。 MAI-Transcribe-1.5在43种语言上同时拿准度和速度第一，一小时音频15秒搞定。 MAI-Voice-2能控情绪、支持多语言code-switching，长内容说话人身份也稳。它们不是各自为战，而是设计成一个能无缝协作的家族。Microsoft这次没玩“一个大模型通吃”，而是把每个任务拆开，用干净数据从头训，公开所有技术细节和学习心得。这其实把行业当前最主流的路径反过来了。大家都在卷参数规模、卷蒸馏别人家的输出，Microsoft却在说：真正长期有竞争力的，是从零构建、血统干净、任务专精、还能互相配合的模型家族。实际效果如何，其实还有待大家的测试~~期待看看实际表现！

译微软在Build大会宣布推出七个全新的MAI模型家族。该家族以“干净数据血统”从零开始训练，旨在任务专精并能无缝协作。其中，MAI-Code-1-Flash在SWE-Bench Verified上得分71.6，比Claude Haiku 4.5高出5分，并能节省60% token。MAI-Transcribe-1.5处理一小时音频仅需15秒，在43种语言上实现速度与准度领先。微软此次发布旨在展示其从零构建、专精且能协同工作的模型发展路径。

meng shao@shao__meng · 6月3日61

Windsurf is DEAD, long live Devin Desktop ? 😠 标题党了：Windsurf → Devin Desktop https://devin.ai/blog/windsurf-is-now-devin-desktop @cognition 收购 Windsurf 一年后，终于把「IDE + 自主 Agent」两条产品线彻底合并为一！ One Devin, every surface · Devin Desktop → 桌面 IDE + Agent 管理 · Devin Cloud → 云端长时自主 Agent · Devin CLI → 终端 · Devin Review → 每次 diff 的代码审查新 Devin Desktop 三项新功能 1. Agent Command Center（指挥中心） 2. ACP 开放协议 3. Devin Local（Cascade 继任者）

译Cognition 在收购 Windsurf 一年后，将 Windsurf 与 Devin 两条产品线整合为统一的 Devin 平台。新推出的 Devin Desktop 被定位为下一代产品，集成了桌面 IDE 与智能体管理功能，使用户能从单一界面管理本地与云端的智能体舰队。完整的平台还包括 Devin Cloud（云端长期自主智能体）、Devin CLI（终端）和 Devin Review（代码审查）三个组件。此次更新引入了三项新功能：Agent Command Center（智能体指挥中心）、ACP 开放协议以及 Devin Local（作为 Cascade 的继任者）。

meng shao@shao__meng · 6月3日75

Agentic Engineering 实战窍门全录（2026年6月版）来自 @mvanhorn 的分享 👏🏻，他三个月内从「高中后没发布过有价值软件」到 last30days（27K stars）、Printing Press、Agent Cookie，以及对 Python、Go 等主流项目的实质贡献（结尾列出作者推荐全部工具）看看 Agentic Engineering 给软件开发带来了什么变化 · 80% 编码，20% 规划 -> 规划交给 agent，人做方向与品味 · 人在键盘前执行 -> 人做 signal（信号），agent 做 volume（产出量） · IDE 是中心 -> 终端 + plan.md + 语音是中心方法论骨架：Research → Plan → Work /last30days（社区现况调研） ↓ /ce-plan（结构化 plan.md，含验收标准） ↓ /ce-work（机械执行，可跨 session 续跑） ↓ Human Signal（品味、取舍、纠偏） Compound Engineering 是使这套循环落地的插件（/ce-plan、/ce-work、/ce-brainstorm）。plan.md 的价值不在于给人读，而在于约束 agent 不偷懒——有研究、有方案、有 checkbox，执行才完整。 # 22 条 Hack 的精简归类一、规划层（最重要） 1. 有想法立刻 /ce-plan，不先想、不先写代码；模糊时用 /ce-brainstorm 再 plan。 2. plan 给人看，但作者几乎不读——plan 是 agent 的作业；人只 skim 标题，有疑问 inline 问（TLDR / eli5 / why this approach）。 3. 非工程任务同样适用：「make a plan for the plan」——先规划如何产出 deliverable，再执行，避免 LLM 直接写成品时偷工减料。 4. plan.md 也是协作介质：Proof 把 plan 变成可评论文档，非终端用户也能 review。二、执行与并行 5. cmux 多 tab（4–6 个）：plan 一个、build 一个、测 bug 一个……research 和 build 并行，cycle 回来第一个已完。 6. 新 terminal tab 默认进 Claude/Codex，不是 shell——降低开 session 成本。 7. YOLO 权限：bypassPermissions + skipDangerousModePermissionPrompt；多 session 无法逐条点确认。配合 Stop hook 音效，知道哪个 session 结束。 8. Claude 规划 + Codex 构建：Claude xhigh 关 fast mode；Codex xhigh 开 fast mode。通过 IDE 扩展、/ce-work --codex、Printing Press 委托，不必切 CLI。三、输入方式 9. 语音优先：Monologue / Wispr Flow（Mac）+ 鹅颈麦；手机用 Apple 听写即可——LLM 能补全转写错误。共享办公室仍是痛点。 10. Granola raw transcript 直接丢进 /ce-plan，不先摘要；配合 Printing Press Granola CLI 检索历史会议。 11. last30days 在 plan 前跑：Reddit/X/HN/YouTube 等并行搜，让 plan 基于「社区当下认知」而非训练数据 cutoff。四、随处可达 12. Remote control 常开：桌面 session 手机续接。 13. 给 Claude 一个邮箱（AgentMail + agentmail-to-claude-code）：邮件/附件触发新 session；Hermes 的 cc <task> 从手机派活。 14. Mac mini 远程：Mosh（低延迟 SSH）、tmux（断网续跑）、Hermes/OpenClaw 自治、Agent Cookie 同步 cookie/.env。五、产出扩展 15. HyperFrames：视频 = HTML composition → MP4；与代码 loop 同构（script.md → render）。 16. 笔记即 RAG：Bear CLI、Obsidian、gbrain、supermemory——agent 可读写的个人知识库，plan 质量随历史 compound。 17. 自写 Skills：重复两次以上的 workflow 固化；抄 Compound Engineering skill 的结构让 agent 脚手架。 18. 开源贡献：同一 /ce-plan + /ce-work loop；Discord 建人脉，PR 是入场券。六、Printing Press 与现实 errands 19. Agent-native CLI 舰队：Tesla 预热、Instacart、ESPN 盯赛、Alaska 订票——agent 跑生活琐事，不只是写代码。 20. Agent Cookie：把真实浏览器 session 交给 CLI，解决 auth 痛点。七、硬件与诚实反思 21. M5 Max 64GB + 禁 sleep + Anker 充电宝——多 agent 并行极耗电。 22. AI Psychosis：构建 loop 像最好玩的游戏，容易沉迷、忽视用户与身边人；允许「只为自己 build」；要 audience 则走长期积累路径。 # 工具栈一览（可执行清单） · 规划执行：Compound Engineering, Proof · 终端：cmux, Ghostty（读同一 config） · 语音：Monologue / Wispr Flow · 调研：last30days (+ ScrapeCreators key) · 会议：Granola, Printing Press Granola CLI · 远程：Mosh, tmux, AgentMail, Hermes, OpenClaw, Agent Cookie · 视频：HyperFrames · 笔记：Bear CLI, gbrain, supermemory · 生活 CLI：Printing Press, Agent Cookie · 第二引擎：Codex (xhigh + fast)

译该内容源自@mvanhorn的分享，介绍了“智能体工程”如何重塑软件开发。其核心是从“人主导编码”转向“人主导方向、智能体执行”，中心从IDE变为终端与计划文件。方法论遵循Research → Plan → Work循环，核心是让plan.md约束智能体行为。分享者总结了22条实战技巧，涵盖规划、并行执行、输入方式、远程控制等方面，并列出了完整的工具栈。

Chubby♨️@kimmonismus · 6月3日61

OpenAI is merging ChatGPT, Codex and its Atlas browser into one desktop app and recasting Codex from a coding tool into a productivity app it says anyone can use. The figures it has been handing out to support that: 5 million weekly Codex users, enterprise revenue up 50% week over week, usage growing 5% a day. Those come from an all-hands and an internal staff note, relayed by people familiar with the remarks. Codex is increasingly evolving into a true work platform. And GPT-5.6 is also on the horizon. Great things are expected from OpenAI in the near future. Via the information

译OpenAI计划将ChatGPT、编程工具Codex及Atlas浏览器整合为一个桌面应用，并将Codex从纯编码工具转型为面向所有人的生产力平台。公司内部数据显示，Codex周活跃用户达500万，企业收入周环比增长50%，用量每日增长5%。此外，GPT-5.6模型也即将推出。