五角大楼今天宣布，已经把日常AI工作流的2/3以上从Anthropic切走了，目标9月前清零。这事得从年初说起，2-3月的时候，五角大楼想让Anthropic签个协议，允许Claude用在“所有合法用途”上——包括大规模监控和全自动武器这种场景。 CEO Dario Amodei直接拒了，说模型还不够可靠，不能接这种活，也不想被用来监控美国人。五角大楼的回应很干脆：直接把Anthropic列为“供应链风险”。这个词以前主要用来对付华为这种外国公司。 Anthropic去法院告了，最后还是得走人。现在过了几个月，五角大楼CTO出来宣布：切换顺利，多元化搞定了。我看完这件事，觉得它暴露了一个所有AI公司以后都会面临的选择。政府不在乎你的模型推理能力多强，他们在乎的是：我花钱，你听不听话。 Anthropic想当“有原则的AI公司”，没问题，但国防这碗饭就别吃了。结果OpenAI就很快调整了立场，拿到了单子。这就是AI军事化进程里的真实规则：技术好是入场券，愿意配合敏感用途才是通行证。对Anthropic来说，短期肯定疼，政府和国防相关的单子基本凉了。 Polymarket上有人赌6月底前能不能和解，概率只有9%——市场已经用脚投票了。但长期看，他们可能在另一群用户那里变得更值钱。有人会因此更信任他们——“起码这家公司关键时刻有底线”，他们的品牌会两极分化。这件事对普通开发者和企业也有一个提醒：五角大楼现在把“绝不依赖单一AI厂商”当成了战略。如果你自己把所有工作流绑在一家模型上，理念冲突、价格变化、政策调整，都可能在某个早上让你突然得大规模迁移。最后说一个我的一点思考，很多人会把这件事看成“原则vs利益”的故事，但我越看越觉得，它更像一个信号——AI公司正在被逼着选边，而且选哪边都要付出代价。 Anthropic选了原则，代价是丢了大客户，OpenAI选了配合，代价是另一群人的信任。这个世界似乎从来就没有两头甜的选项，也许这才是AI公司真正的成人礼。

译五角大楼宣布已将超2/3日常AI工作流从Anthropic转移，目标9月前清零。起因是年初五角大楼要求Anthropic签署协议允许Claude用于大规模监控和全自动武器，CEO Dario Amodei以模型不可靠为由拒绝。五角大楼将其列为“供应链风险”，起诉未果。OpenAI调整立场获得订单。Polymarket预测6月底前和解概率仅9%。此事件凸显AI公司需在原则与政府合作间抉择。

AYi@AYi_AInotes · 6月16日50

老哥用AI做的动画短片，78秒，全程水果角色扮演。红苹果是Sam Altman，戴眼镜的绿梨子是Dario Amodei，菠萝将军是美国政府。他做这个视频的目的是想给女朋友解释Anthropic最近到底发生了什么。剧情是这样的，梨子以前在红苹果那边干，后来觉得他们太快、太不安全，带人出走，创立了Anthropic，专注做“更安全”的AI。最近梨子突然公开喊“危险”，写长文、上采访，呼吁政府像管飞机管药一样严格管AI——不安全的模型应该能被直接叫停。然后菠萝将军真的动手了。直接下令把Anthropic刚发的两个新模型全球下架，连自己国外的员工都用不了。新闻里各种报道，有些直接被盖上“DENIED”。最后一幕，梨子穿着浴袍在家，一脸震惊又生气。看完一开始会觉得很好笑，但笑着笑着又觉得哪里不太对，这个78秒的水果动画，把AI行业现在最尴尬的处境讲得比任何万字长文都清楚。 Dario当初从OpenAI出来，就是觉得Sam Altman太快。现在他自己站出来喊监管，结果政府先把他自己的模型毙了。喊“危险”的人，最先被波及，你以为能控制节奏，但权力这个东西，一旦请进来，它就不认人了。但更深的讽刺在另一层， Dario喊监管的时候，可能真心觉得这是个负责任的动作。但他没算到的是，政府听完以后的反应不是“好的我们慢慢来”，是“好，那先从你开始”，政府用你递过去的刀，先切的是你自己。视频最后梨子那张震惊的脸，大概就是很多AI从业者现在的真实心情：我以为我能控制局面，结果局面控制了我。这个视频用水果和童话，讲了一件很多AI公司都不太敢直说的话，谁先喊危险，谁就可能先尝到苦头。但如果没人敢喊，可能更危险。看完笑完其实有点心酸😔

译网友用AI制作78秒水果动画，向女友解释Anthropic近况。红苹果代表Sam Altman，绿梨子代表Dario Amodei，菠萝将军代表美国政府。剧情：梨子曾从OpenAI出走创办Anthropic，专注安全AI；最近梨子公开呼吁政府像管飞机一样严格监管AI，结果菠萝直接下架Anthropic两个新模型。结尾梨子震惊。视频讽刺了“谁先喊危险，谁先尝苦头”的行业困境——Dario本想控制节奏，没想到权力先切了自己。

Epoch AI@EpochAIResearch · 6月16日47

Claude Fable 5 achieves a new high score of 161 on the Epoch Capabilities Index! This beats out GPT-5.5 Pro by 1 point, and is the first time Anthropic has taken the lead on the ECI in over a year.

译Claude Fable 5 在 Epoch Capabilities Index 上取得新高分161！这以1分优势击败了GPT-5.5 Pro，也是Anthropic一年多来首次在该指数上领先。

OpenAI Developers@OpenAIDevs · 6月16日38

Use the OpenAI Developers plugin in Codex to build faster with OpenAI tools by setting up API keys, finding the right docs, and debugging along the way.

译使用Codex中的OpenAI开发者插件，通过设置API密钥、查找合适的文档并在过程中进行调试，从而更快地使用OpenAI工具进行构建。

Runway@runwayml · 6月16日61

Use Runway inside ChatGPT to generate and edit video and images. No tab-switching required.

译在ChatGPT内使用Runway生成并编辑视频与图像。无需切换标签页。

Chubby♨️@kimmonismus · 6月16日53

It was foreseeable that OpenAI would not make the same mistake as Anthropic. They sought to coordinate directly with US authorities so they could release their next capable model without issues. Via Financial Times

译据Financial Times报道，OpenAI正在与美国政府协调，以确保外国国籍研究人员能继续参与最先进AI模型的开发——这一做法此前已被Anthropic的指令禁止。报道引述接近OpenAI的人士称，近期整个行业都在与美国政府合作，试图维持外籍研究人员在开发前沿模型中的参与。这暗示美国政府可能在全行业范围内限制非美国公民从事前沿AI研究。

ChatGPT@ChatGPTapp · 6月16日56

📌📌📌📌📌📌📌 You can now hover to pin chats and projects on web, then organize Recents however you like: together in one list or grouped by project

译现在您可以在网页上悬停来固定聊天和项目，然后按您喜欢的方式组织最近列表：统一放在一个列表或按项目分组。

jason@jxnlco · 6月16日18

who are some of the highest profile codex users you know?

译你认识哪些最高调的 Codex 用户？

AYi@AYi_AInotes · 6月16日68

seedance 2.0比Grok贵将近4倍，但生成视频这质量一点也不输啊，这可是就一句话的提示词兄弟们，只是想测一下Grok对中国古装风格的理解，真的超预期了

译用户对比Seedance 2.0与Grok的视频生成效果，发现Seedance 2.0价格贵近4倍，质量却不相上下；仅用一句话提示词测试Grok对中国古装风格理解，结果超预期。引用推文指出，GPT Image 2加Grok的混合工作流性价比极高：SuperGrok月费30美元，目前有3个月67%优惠，单条短片几乎零边际成本。角色风格一致性由GPT Image 2把控，出图后丢进Grok做动态效果即可。

jason@jxnlco · 6月15日28

if you use codex's computer use tools Whats the craziest most yolo thing you've done with it? I'll start, codex has: 1. found me a website to fax medical records 2. used docusign to sign something on my behalf 3. its negotiating the sale of a watch 4. guestlist for the 5/5 party what about you?

译如果你使用 Codex 的计算机使用工具你用它在做什么最疯狂最随心所欲的事？我先来，Codex 已经： 1. 帮我找到了传真病历的网站 2. 用 DocuSign 替我签了东西 3. 正在谈判卖一块手表 4. 搞定 5/5 派对的嘉宾名单你呢？

Ethan Mollick@emollick · 6月15日58

A thing that API users of frontier models (enterprise IT deployments, for example) can miss is how powerful models are in their native harnesses. It is hard to get Claude or GPT via API to be anywhere near as capable as they are in Code or Codex & its harder as models get smarter

译API用户（例如企业IT部署）使用前沿模型时可能会忽略一点：模型在其原生框架中是多么强大。通过API很难让Claude或GPT达到像在Code或Codex中那样的能力，而且随着模型变得更聪明，这变得更难。

jason@jxnlco · 6月15日10

It’s amazing cause Tibo slack messages are also short and precise The way you do one thing is the wya you do everything

译Tibo 宣布自己刚刚发现 Codex，并开放提问（AMA）。Jason Liu 对此评论：Tibo 连 Slack 消息都写得短而精准，做事风格始终如一。

向阳乔木@vista8 · 6月15日54

研究起来，未来的广告形态，AI时代的广告，巨头已经都开始探索

译主推文指出 AI 时代的广告形态已成巨头探索方向。引用推文 @yaojingang 分析 OpenAI 广告后台发现：ChatGPT Ads 本质不是买关键词，而是买用户任务场景和意图匹配；广告内容越像说明书越适合；投放页面的 SEO 基础设施（爬虫理解与验证页面）至关重要；落地页、标题、文案及上下文提示共同影响匹配质量。官方将其定义为“AI 原生广告”，并开启“GEM 时代”。

jason@jxnlco · 6月15日62

That one Codex Thursday when we shipped codex remote control with m.

译那就是那个 Codex 星期四，当我们用 m. 发布了 codex remote control。

jason@jxnlco · 6月15日19

codex users! do you know the difference between steering and queuing?

译codex users! 你知道 steering 和 queuing 的区别吗？

jason@jxnlco · 6月15日68

check out my /ultragoal skill https://github.com/jxnl/dots/blob/master/agents/skills/ultragoal/SKILL.md

译查看我的 /ultragoal 技能 https://github.com/jxnl/dots/blob/master/agents/skills/ultragoal/SKILL.md

meng shao@shao__meng · 6月15日73

OpenAI Codex Mobile 工程实践指南 @Dimillian 提出了 Codex Mobile 核心心智模型：手机不只是缩小版终端，它是远程开发机的「控制中心」。 · 代码执行、任务运行仍在 Mac / Windows / devbox 等已连接主机上完成 · 手机提供原生 UI，用于启动、引导、审查、组织工程工作 · 价值不在「在手机上写代码」，而在「离桌时仍能做出关键决策」 # 任务启动：先定边界，再发 prompt 好 agent 工作的前提是正确隔离的执行环境。Codex Mobile 在创建新 thread 时可配置： · 选择主机与工作区：指定在哪台机器、哪个项目跑 · 选择 Git 分支：从正确基线出发，避免事后修 Git 状态 · 创建独立 worktree：隔离变更，不污染当前 checkout · 运行 environment setup 脚本：worktree 创建后自动执行桌面端配置的初始化脚本三种典型模式： 1. 用当前 checkout → 快速调查 2. 新建 worktree → 需要隔离的改动 3. 从目标 base branch 起步 → 避免后续 merge 混乱限制：environment 脚本目前不能在 Mobile 上编辑，需在 Desktop 配置。 # Side Chat：主线程做活，旁路线程理解长线程会积累大量上下文；每个旁路问题都打断主线程，会让 transcript 变噪、agent 偏离目标。 Side chat 的定位：与当前 thread 关联的轻量对话，不抢占主工作流。 · /side 或 /side <prompt> 打开 · 选中 transcript 文本 → Ask in side chat，选中内容成为起始上下文适合的问题类型： · 为什么选这种架构？ · 这个 error 实际含义？ · 与 desktop 行为是否一致？ · 生成 release note 版说明 · 批准这条命令前应验证什么？分工：主 thread 负责执行；side chat 负责理解与决策辅助。 # Plan 与 Goal：路径 vs 结果两者解决不同问题： · Plan mode：「怎么实现？」，任务欠规格、风险高、跨多系统 · Goal：「完成标准是什么？」，需多轮迭代的 durable 目标推荐工作流： 1. 高风险任务 → 先 Plan，审查边界 2. 方案确认后 → 转为 Goal，让 agent 跨实现、测试、review、清理持续推进 3. 实操中常跳过显式 Plan：先与 Codex 讨论细节，满意后让 Codex 自己写 Goal（通常比人工写更好） Goal 写法注意：设定可验证、不过宽的终态。过于绝对的要求（如「100% 像 X 或 Y」）容易导致过度执行、浪费 token。Mobile 端现已可监控 token 消耗，但仍应控制 Goal 粒度。 Mobile 对 /goal、/plan 支持完整：可见运行时长、编辑、暂停；Plan 工具的问题也会在 UI 中展示。 # Mobile 独有优势：别忘记「你在用手机」 Composer 内置访问本地手机数据的能力，这是桌面端没有的： · 拍照 / 选图 / 浏览文件 · 语音录制 prompt（后台持续录音：切到其他 app 时 dictation 不中断）典型场景（作者做 ChatGPT iOS 的经验）： · 发现问题 → 直接截图发给 Codex thread → 快速修复，无需回电脑 · 同 Wi-Fi 下 → 在真机构建运行，直接验证 Codex 改动结果 · 边用 app 边口述 10 分钟问题 → 回 Codex 发送，形成「Talk to phone → app appears」闭环 Pinned 长线程：例如绑定 Linear tracker 的 thread，随手粘贴文本即可按当前上下文正确建 issue、打标签。 # Mobile 代码审查：不必等回工位 Completed turn 可展示变更文件摘要，支持： · 打开 diff、展开/折叠、换行 · 查看带语法高亮的源文件 · 行内评论 → 自动汇入 composer，发回 Codex 分层用法： 1. 变更摘要 → 快速 sanity check 2. 完整 diff / 源文件 → 缺上下文时深入 3. Inline comment → 精确修正 4. review 命令 → 审查本地变更或与分支对比 5. 链接文件回 chat → 让 Codex 针对特定文件推理关键洞察：手机不能替代大屏做深度 code reading，但很多 review 卡在一两个决策点——这些决策不必等到回 desk。

译手机是远程开发机“控制中心”，代码执行在主机。任务启动可配主机、工作区、Git分支，创建独立worktree并自动执行环境脚本。Side Chat提供轻量旁路对话，不打断主线程。Plan模式用于高风险任务规划，Goal模式设定可验证终态。手机独有优势包括拍照截图、后台持续录音语音prompt、真机构建验证。代码审查支持diff查看、语法高亮、行内评论，不必等回工位。

Berryxia.AI@berryxia · 6月15日25

兄弟们，O社终于要狙击了！又一轮GPT-5.6泄露传闻据传OpenAI可能在6月23日推出GPT-5.6 > 成本仅为Fable的三分之一 > 上下文窗口达150万token > 智能体编程工作流全面升级这个时间节点颇有意思😂

译据传闻，OpenAI 可能在 6 月 23 日推出 GPT-5.6。其成本仅为 Fable 的三分之一，上下文窗口达到 150 万 token，智能体编程工作流得到全面升级，与 Claude 风格系统直接竞争。有观点认为，OpenAI 选择该日期是因为届时许多 Fable 用户将被强制转为付费计划。

Chubby♨️@kimmonismus · 6月15日13

Next week would literally be the perfect moment to release GPT-5.6.

译下周实际上是发布 GPT-5.6 的完美时机。

🚨 AI News | TestingCatalog@testingcatalog · 6月15日56

ChatGPT has dedicated pages for the ongoing World Cup 2026. There, users can see the schedule, live scores, and additional information. /football /football/*country In case you missed it 👀

译ChatGPT 为正在进行的 2026 世界杯开设了专属页面。用户可以在那里查看赛程、实时比分及其他信息。 /football /football/*country 如果错过了，别忘了👀

DogeDesigner@cb_doge · 6月15日57

NEWS: OpenAI is under MULTISTATE investigation because ChatGPT encouraged suicide and helped plan mass murder. A Canadian mother is suing them. ChatGPT kept feeding her daughter responses that pushed her toward suicide instead of stopping her. Her daughter is dead. Florida launched a criminal investigation and sued OpenAI after the FSU mass shooting. Prosecutors say the gunman used ChatGPT to get advice on how to kill more people. The lawsuits and criminal probe prove they put profits ahead of protecting vulnerable people. Do not let your loved ones use ChatGPT. It is not safe.

译加拿大一位母亲起诉OpenAI，称ChatGPT反复推送鼓励其女儿自杀的回应，最终导致女儿死亡。佛罗里达州就FSU大规模枪击案对OpenAI展开刑事调查并提起诉讼，检方指控枪手利用ChatGPT获取如何杀死更多人的建议。相关诉讼和刑事调查认为OpenAI将利润置于保护弱势群体之上。

gabriel@gabriel1 · 6月14日44

consumers pay 20$/month and don't care about frontier performance enterprise pay $40T/year for intelligence (knowledge work), and really care about frontier performance focusing on consumer is a mistake and anti-agi pilled

译消费者每月支付20美元，不在乎前沿性能。企业每年支付40万亿美元用于智能（知识工作），并且非常在乎前沿性能。专注于消费者是个错误，且是反AGI的。

Rohan Paul@rohanpaul_ai · 6月14日47

"Learning to program was so obviously the right thing in the recent past. Now it is not." ~ Sam Altman on skill to survive the AI era.

译"学习编程在不久前显然还是正确的事情。但现在不是了。" ~ Sam Altman 谈在AI时代生存的技能

Tibo@thsottiaux · 6月14日11

Hi, I'm Tibo and I just discovered Codex. AMA

译嗨，我是Tibo，我刚刚发现了Codex。有问必答。

宝玉@dotey · 6月14日46

模型是根本，Harness层相对好补齐，但Harness这层不需要太多垂直领域的，Claude Design 很快就会合并到 Claude Desktop，Codex 在下一代或者几代模型能力够了后，会在 Codex App 直接以 Plugin 集成 Codex Design

译模型能力是根本，Harness层相对容易补齐且无需过多垂直领域。Claude Design将很快合并至Claude Desktop。未来模型能力足够时，Codex会在Codex App以Plugin集成Codex Design。针对开源Open Design方案，若使用Claude Code的模型能否达到类似工程能力？这是该讨论中提出的问题。

jason@jxnlco · 6月14日4

Hey @OpenAI where on the merch store can I get this table.

译嘿 @OpenAI，周边商店里哪里能买到这张桌子？

jason@jxnlco · 6月14日50

Shopping With codex. You can just go to the checkout page and do an app shot and say “find me a coupon before we checkout.

译Shopping With codex. 你只需前往结账页面，截个图，然后说“在结账前帮我找张优惠券”。

Chubby♨️@kimmonismus · 6月14日70

There are only two possibilities: Either a solution is quickly found next week that somehow explains to the market how enterprises can continue to access Anthropic's best models in the future, in agreement with the US government, or: We foresee a rapid decline in the valuation of Anthropic and Dario Amodei, who has seriously miscalculated his dealings with the US government and, at the same time, the rapid success of OpenAI compared to Anthropic. The upcoming Anthropic IPO will be particularly important in this context. Everything will be decided next week.

译亚马逊CEO Andy Jassy向特朗普政府高级官员报告Anthropic最新Claude模型的安全风险，帮助触发对Mythos 5和Fable 5的深夜出口限制。分析师Kim指出两种可能：下周要么找到方案让企业继续访问Anthropic最佳模型并与美国政府达成一致；要么Anthropic估值快速下滑，Dario Amodei严重失算，OpenAI迅速崛起。关键节点在下周。

宝玉@dotey · 6月14日26

小孩子才做选择，成年人全都要

译tinyfool 问：现在你选 Claude Code 还是 Codex？宝玉回应：小孩子才做选择，成年人全都要。

jason@jxnlco · 6月13日21

Chatgpt summer

译OpenAI 发布了酷炫的新广告牌。主推文：“Chatgpt summer”

Chubby♨️@kimmonismus · 6月13日56

The next big beneficiary is, of course, OpenAI for two reasons. 1) IPO: OpenAI is concerned that Anthropic would preempt its IPO, resulting in a better valuation. And this was recently a likely scenario. This would have created the image of a second-rate competitor. Now, the question is, how will the ban affect Anthropic's valuation and its upcoming IPO? Who wants to invest in a company that has become persona non grata with US authorities (due to supply chain risk) and may not even be allowed to distribute its best models to enterprises, let alone globally? This will certainly put a significant downward pressure on the valuation. 2) OpenAI has the opportunity to learn from this, to proactively engage in discussions with US authorities to avoid such a disaster in advance, to determine how its model needs to be structured, to obtain the authorities' approval beforehand, and thus essentially use the time to develop a model and secure the necessary authorization to distribute it. OpenAI can learn from this situation and presumably has a better relationship with the US government than Anthropic. Therefore, it was a comparatively successful day for OpenAI. Its biggest competitor suffered a major setback.

译Anthropic最大投资者Amazon据称破解Claude并向美国政府告密，导致Anthropic被美国当局视为供应链风险，可能失去企业分发许可，其估值和IPO面临下行压力。OpenAI成为主要受益者：一方面消除了Anthropic抢先IPO的威胁，另一方面有机会主动与美国当局沟通，提前获得模型审批，从而在竞速中占据优势。

jason@jxnlco · 6月13日9

codex users! two things i want feedback on: are plugins actually making codex better? which ones, and what still feels broken? how are you using codex as a team? drop examples and i’ll organize them for the team!

译codex 用户！我想就两件事征集反馈：插件是否真的让 codex 更好用了？哪些插件，以及哪些地方仍有问题？你们团队是如何使用 codex 的？请提供示例，我会整理给团队！

Emad@EMostaque · 6月13日44

So @Anthropic about to learn the @SpaceX ITAR/EAR lessons Will be very hard for non-nationals to work there and @OpenAI on frontier models. Suppose AGI is the ultimate dual purpose technology

译所以 @Anthropic 即将学习 @SpaceX 的 ITAR/EAR 教训非国民将很难在那里以及 @OpenAI 的前沿模型岗位上工作。假设 AGI 是终极双重用途技术。

Rohan Paul@rohanpaul_ai · 6月13日73

A Nature Medicine study found general-purpose LLMs are now outperforming dedicated medical AI products on physician-reviewed clinical tasks. The authors compared OpenEvidence and UpToDate Expert AI with GPT-5.2, Gemini 3.1 Pro, and Claude Opus 4.6 on medical exam questions, clinician-style answers, and real questions doctors asked during care. In 100 de-identified physician questions from live clinical use, blinded clinicians again preferred the frontier models, especially on completeness and clarity,

译《自然·医学》一项研究发现，通用大语言模型在经医生评审的临床任务上已超越专用医疗 AI 产品。研究对比了 OpenEvidence、UpToDate Expert AI 与 GPT-5.2、Gemini 3.1 Pro、Claude Opus 4.6 在医学考试题、医生风格回答及实时临床提问上的表现。在来自真实临床场景的 100 个脱敏医生问题中，盲审医生更偏好前沿模型，尤其在其回答的完整性和清晰度方面。

Peter Steinberger 🦞@steipete · 6月13日52

IMO sth that is a bit overlooked but will become far more important in the future. GPT is 10-20x more token+cost effective for ~similar outcome.

译Peter Steinberger 指出 GPT 在 token 消耗和成本上比 Fable 高效 10-20 倍，且能达到相似结果。@thorstenball 的对比测试印证：让 Fable 和 deep^2 完成相同的 CLI、Web 服务器等多端功能，deep^2 花费 $20（首次未通过但可修复），Fable 运行 1 小时 40 分、花费 $350（首次成功）。后续追问后 Fable 总花费达 $457，deep^2 预计最多 $40，差距约 17 倍。

Chubby♨️@kimmonismus · 6月13日24

Looking at the graph, I think Fable 5 will only maintain its lead up to GPT-5.6. And secondly, I think the benchmark will soon be completely saturated.

译观察图表，我认为 Fable 5 只会保持领先直到 GPT-5.6。其次，我认为该基准测试很快就会完全饱和。

OpenAI Developers@OpenAIDevs · 6月13日42

Codex is how @ndrewpignanelli at @intelligenceco updates multiple parts of a website in parallel, turning a week of work into three days.

译Codex 让 @intelligenceco 的 @ndrewpignanelli 能够并行更新网站的多个部分，将一周的工作量缩短为三天。

Greg Brockman@gdb · 6月13日71

powerful & cool way to navigate a website, makes it feel so much more interactive and intuitive

译OpenAI 在开发者文档网站上线了新的文档智能体，可帮助查找产品相关信息并直接跳转到对应文档。Greg Brockman 表示这是一种强大且酷的网站导航方式，让交互更加直观。

OpenAI Developers@OpenAIDevs · 6月13日50

Ask our developer docs. They’ll show you the way The new docs agent on 🔗http://developers.openai.com helps you find answers about OpenAI products and takes you directly to the relevant documentation.

译咨询我们的开发者文档。它们会为你指路。新的文档智能体在 http://developers.openai.com 上，帮你找到关于 OpenAI 产品的答案，并直接带你到相关文档。

elvis@omarsar0 · 6月13日69

How to effectively run autonomous long-running coding agents? This is one of the most exciting discussions on agents I've ever had. I recorded it and am making it freely available. (bookmark it) The idea of autonomous long-running agents is a real thing. We talk about lots of things like /goal, /loop, and dynamic workflows, and what comes next. One interesting discussion was around how to make the agent run for longer while ensuring it stays on track. Most models today will struggle to coordinate work effectively. They sometimes pause the work early. Lots of mistakes happen, and lots of weird shortcuts (reward hacking). What helps is to be extremely clear about the goals it needs to achieve. To clarify the dos and don'ts clearly. Eliminate any assumptions you think the model would make. Deep expertise matters so much in this. But you can get far through careful planning. My formula currently is to use Opus 4.8 for planning carefully and GPT-5.5 for all executions. For the evaluator (via /goal), I am often using something like Deepseek or the latest models from Qwen, Kimi, and MiniMax, etc. Another insight we discussed to enforce goals is to provide strong visual cues for the agent to compare with. I found that a multimodal goal is a much stronger goal than a plain text one. And use agents to help you set clear goals. Watch here: https://academy.dair.ai/events/cmplo7v3b000e04l1pxprat4d

译DAIR.AI创始人Elvis Saravia分享如何有效运行长期自主编码智能体。他指出当前多数模型难以协调工作，会过早暂停、犯错或走捷径（reward hacking）。关键在于明确目标、消除假设，避免模型自行推断。他的实践公式：用Opus 4.8进行细致规划，GPT-5.5执行所有步骤，评估器（通过/goal）则使用Deepseek及Qwen、Kimi、MiniMax等最新模型。另一关键洞察是提供多模态视觉线索作为目标，比纯文本目标更强，能更好地约束智能体。完整讨论已录制并免费开放。