Thanks @aijoey for the impressive vision-agent security monitoring demo with MiniCPM-V 4.6. What stood out to us is the model consistently classifying scenes as 'routine / no emergency'—and that's the essence of security AI. It's not about flagging every person or vehicle, but reading the situation and assessing urgency. Only when police car blue lights appear on the highway does it trigger an alert, with the reasoning: 'There are police car blue lights, indicating a potential emergency situation.' This is exactly what we've been aiming for: vision that doesn't just see, but knows when to act.

译@aijoey 用 MiniCPM-V 4.6 搭建了视觉智能体安全监控 demo：四个 CCTV 实时画面，模型观察近期帧窗口，判断活动是否正常，仅在识别到真实事件时调用 `raise_alert(reason, severity)`。模型持续将场景分类为“常规/无紧急”，只在高速公路出现警车蓝灯时触发警报，推理为“警车蓝灯亮起，可能表示紧急情况”。该演示展示了小 VLM 超越图像描述、实现实际智能体行为的潜力。

OpenBMB@OpenBMB · 6月20日50

Huge thanks to @aijoey for building back-office agent swarm with MiniCPM5-1B 👏 This is a fantastic real-world case of scaling small models into production-grade systems——moving beyond “model capability” into “practical multi-agent systems at scale”. We’re especially impressed by the technical setup: 🔷128 concurrent agents on DGX Spark 🔷vLLM continuous batching for serving efficiency 🔷6,604 chunks streamed across agents in just 1.48s 🔷Clear demonstration of how a 1B model can power high-throughput, multi-agent workflows in parallel Really impressive work on the back-office swarm setup and the 128-agent parallelization. Excited to see what else you build with MiniCPM in the future 🚀

译面壁智能OpenBMB感谢@aijoey用MiniCPM5-1B构建后端智能体集群。128个并发智能体在DGX Spark上运行，通过vLLM连续批处理提供服务，每个智能体独立处理发票审核、退款路由、合规检查等8种业务队列。系统在1.48秒内跨智能体流式传输6604个chunks。该案例表明，1B模型的价值在于同时做出大量有用业务决策——用一群小型廉价worker并行清理队列。

meng shao@shao__meng · 6月20日75

开源技术教程「Deep Agents 实战」，LangChain 官方认证大使 @zhanghaili0610 出品，他也是「LangChain 实战」「LangGraph 实战」的作者 https://github.com/datawhalechina/deepagents-in-action 教程的核心是：基于 LangChain / LangGraph 生态，面向开发者讲解如何"用好" Deep Agents 这个 Harness 框架来构建真实应用。核心思想：Agent 开发的"三层架构" 1. Runtime（运行时）：LangGraph，持久化执行、断点恢复、流式输出、人机协作 2. Framework（框架）：LangChain，模型抽象、工具接口、Agent 循环、中间件 3. Harness（套件）：Deep Agents ←本课主角，预置文件系统、任务规划、子 Agent、长期记忆技术内核：上下文工程 Deep Agents 做法：引入虚拟文件系统，让 Agent 像人类一样工作： · 需要时才 read_file 按需读取 · 中间结果 write_file 落盘 · 大文件用 offset/limit 局部读取 · 上下文里只保留当前步骤真正需要的信息 · 这个文件系统还是可插拔的——内存、本地磁盘、数据库、远程沙箱、甚至混合路由，都可作为后端。章节结构（8 章 + 2 准备篇） · 准备篇 ── AgentSeek 环境搭建、开发技能安装 · 认知篇 ── ch01 三层架构 / ch02 5分钟快速上手 · 核心篇 ── ch03 虚拟文件系统 / ch04 任务规划 / ch05 子Agent / ch06 异步子Agent · 进阶篇 ── ch07 Skills / ch08 长期记忆 · 规划中 ── Human-in-the-Loop、沙箱执行、流式前端、数据分析Agent、生产部署四个核心能力的演进脉络值得注意： · 虚拟文件系统（ch03）—— 六大工具：read_file / write_file / edit_file / ls / glob / grep · 任务规划（ch04）—— write_todos 让 Agent 拆解并追踪复杂任务 · 子 Agent 委派（ch05-06）—— task 工具派发子任务，ch06 引入异步并行 · Skills 复用（ch07）—— 遵循开放的 Agent Skills 规范，编写的 Skill 可在 Claude Code、Cursor、Codex 等 30+ 工具中通用（"Skills 之于 AI Agent，就像 npm 包之于 Node.js"）

译LangChain 官方认证大使 @zhanghaili0610 推出开源教程《Deep Agents 实战》，基于 LangChain / LangGraph 生态，讲解如何用 Deep Agents Harness 框架构建真实 Agent 应用。核心是“三层架构”：Runtime（LangGraph）、Framework（LangChain）、Harness（Deep Agents）。技术内核为上下文工程，通过虚拟文件系统实现按需读取、中间结果落盘、大文件局部读取。教程共 8 章 + 2 准备篇，覆盖虚拟文件系统（六大工具）、任务规划、子 Agent 委派（异步并行）及 Skills 复用（可在 Claude Code、Cursor 等 30+ 工具中通用）。

AYi@AYi_AInotes · 6月20日65

治愈风提示词分享，做了@elonmusk @ShamAltman @DarioAmodei 的玩偶形象，很可爱！ Prompt：一只手工钩织的[主体]玩偶，采用柔软毛线材质，编织纹理细腻精致。身着一件鲜艳的[主色调]点缀搭配精致的[辅色]服饰，手中握着一个[小道具]。置身于温馨的[场景]中，氛围温暖柔和，充满迷人的手工质感与怀旧的阿米古鲁米风格。

译分享了一个治愈风提示词，用于生成手工钩织玩偶形象（如 @elonmusk 等）。提示词描述：一只手工钩织的[主体]玩偶，柔软毛线材质，编织纹理细腻，身穿鲜艳主色调搭配精致辅色服饰，手中握着小道具，置于温馨场景中，氛围温暖柔和，充满手工质感与怀旧阿米古鲁米风格。@dotey 评论称该提示词挺酷，针织玩偶效果佳。

歸藏(guizang.ai)@op7418 · 6月20日70

用 Nano Banana 去超分放大 GPT-Image-2.0 图片 GPT 生成的图片很多时候会有这种毛躁的感觉，特别是一些非写实的图片，会有这种破碎的纹路和纹理，非常影响观感，而且一眼就能看出来是 GPT 生成的。其实可以用 Nano Banana 去放大 GPT 生成的图像，去掉那种破碎感和无意义的细节，增加更多有意义的细节，同时让文字和细节更锐利、更清晰。下面左边的是 GPT 生成的，右边的是 Nano Banana 放大的。提示词：帮我将这张图片重绘和清晰化，让他细节更丰富，同时去掉原图中杂乱不必要的细节

译GPT-Image-2.0生成的图片常出现毛躁、破碎纹路等观感问题，容易被识别为AI生成。使用Nano Banana超分放大工具可去除杂乱无意义的细节，增加有意义的细节，同时让文字和边缘更锐利清晰。对比图显示左边为GPT原图，右边为Nano Banana处理后效果。推荐提示词："帮我将这张图片重绘和清晰化，让他细节更丰富，同时去掉原图中杂乱不必要的细节"。

宝玉@dotey · 6月20日65

这个提示词挺酷，针织玩偶

译@azed_ai 分享了一个提示词：手工钩织的 [主体] 玩偶，柔软纱线质感，精致针织细节，搭配鲜艳 [颜色1] 点缀和淡雅 [颜色2] 服装，手持小 [道具]，置于温馨 [场景] 中，温暖柔和氛围，迷人手工美学，怀旧玩偶风格。试试并分享你的作品🔥 宝玉评论说：“这个提示词挺酷，针织玩偶”。

Orange AI@oran_ge · 6月20日45

独立开发者的实战经验分享

译独立开发者的实战经验分享 [引用 @MengkePM]：http://x.com/i/article/2067506549107691520

AYi@AYi_AInotes · 6月20日68

用 Codex 写代码最大的坑是写完才Review，分享3个实用技巧：把 Review 从写完检查挪到动手之前，返工率直接砍半，三个方法按需拿走： 1️⃣ 零成本即用版：贴一次，省掉一半返工把这段话贴在需求最前面： "先别写代码，先复述你对任务的理解，我最想解决的问题是什么，哪里还有歧义，直接开写最可能误解哪，最后给执行计划。" 2️⃣ 官方内置版输入 /plan 或按 Shift+Tab Codex 会自己收拢上下文，抛澄清问题，输出完整执行计划再动手，需求越模糊，这个越管用 3️⃣ 一劳永逸版在 AGENTS.md 里写入强制前置规则让它每次接任务先深度思考、复述需求、识别风险，再执行不用重复贴指令，一次写入永久生效好的 Agent 从来不是反应快和撸代码的手速快，先把方向搞对，再跑速度才是王道，哪个层级的你已经在用了，评论区说一声

译用 Codex 写代码时，把 Review 从写完检查挪到动手之前，返工率可大幅减少。三种方法：1）零成本版：在需求前加指令要求先复述任务、澄清歧义、给出执行计划再写代码；2）官方内置版：输入 /plan 或 Shift+Tab，让 Codex 自动收拢上下文、输出完整计划；3）一劳永逸版：在 AGENTS.md 中写入强制前置规则，要求每次任务先深度思考、复述需求、识别风险再执行。好的 Agent 先方向正确再追求速度。

meng shao@shao__meng · 6月20日63

驾驭 Claude Code：CLAUDE.md 配置文件、Skills、Hooks、Rules、Subagents 等 7 种指令全解析 Claude Code 最新博客，围绕七种方法展开： CLAUDE.md 文件、Rules、Skills、Subagents、Hooks、Output Styles、Appending the System Prompt。每种方式的本质差异体现在三个维度： · 何时加载进上下文 · 会话压缩后是否保留 · 消耗多少 token、权威性如何 https://claude.com/blog/steering-claude-code-skills-hooks-rules-subagents-and-more 1. CLAUDE.md 文件项目根目录下的 Markdown 文件，是最基础的配置层。分两类加载：根目录 CLAUDE.md 在会话开始时全程驻留上下文，压缩后重新读取；子目录 CLAUDE.md 按需加载，仅当 Claude 访问该目录下文件时才触发，压缩后即失效。关键警示：在共享仓库中，CLAUDE.md 往往像任何无人负责的配置文件一样，各团队不断追加内容却从不删减，成本在规模上会持续累积。每一行都会加载进每位工程师的每次会话，无论与当前任务是否相关。 claude 官方建议：控制在 200 行以内，指定负责人，像审查代码一样审查变更。 2. Rules 存放在 .claude/rules/ 的 Markdown 文件。最有价值的特性是路径作用域：通过 paths 字段控制仅在触碰特定文件时才加载。例如只在 src/api/** 被访问时才注入"所有 API 处理器必须用 Zod 验证输入"的规则，而不是全程占用 token。无 paths 限定的规则，行为等同于 CLAUDE.md——始终在场，始终消耗。 3. Skills 存放在 .claude/skills/ 的程序化工作流。设计精妙之处：会话开始时只加载名称和描述；完整内容仅在技能被调用时才载入，可通过斜杠命令或任务自动匹配触发。适合封装部署流程、发布检查清单、代码审查流程等固定程序，而非塞进 CLAUDE.md。Claude Code 自带若干内置 Skills，也支持自定义。 4. Subagents 存放在 .claude/agents/ 的独立助理定义。与 Skills 的关键区别在于隔离性：子智能体在自己独立的全新上下文窗口中运行，返回给主会话的只有最终消息（通常是多个子任务的聚合结果）加上元数据，中间过程完全不污染主会话。适合"跑完就丢"的旁路任务：深度搜索、日志分析、依赖审计。子智能体最多可嵌套五层深，支持动态编排数十到数百个后台 Agent 并行运作。 Skills vs Subagents 选择原则：想在主线程中逐步看到、随时干预 → Skills；想要隔离运行、只要最终结论 → Subagents。 5. Hooks 注册在 settings.json 中，在 Claude 生命周期的特定事件上触发（文件编辑、工具调用、会话开始等）。这是确定性控制的唯一真正实现：Hooks 完全绕过上下文压缩机制，配置本身存在于主上下文窗口之外，因此上下文成本极低。支持 command、HTTP、mcp_tool（确定性执行）和 prompt、agent（用模型判断）五种类型。一个 PreToolUse hook 可以拦截任何工具调用，以 exit code 2 阻止其执行。重要观点：凡是写在 CLAUDE.md 里的"永远不要做某事"，都是错误的工具选择。Claude 大多数时候会遵守，但在长会话、模糊情况或遭遇提示词注入时可能失效。真正的硬约束必须是确定性的，而 Hooks 和权限控制才是实现方式。组织级强制管控还可以使用 Managed Settings（管理员部署，用户无法覆盖）。 6. Output Styles 存放在 .claude/output-styles/ 的文件，直接注入系统提示，永不被压缩，权威性最高。高权威有代价：自定义输出风格默认会替换掉 Claude Code 的默认输出风格，包括"如何界定改动范围、何时添加注释、如何处理安全问题、声称完成前是否运行测试"等关键编程默认指令，使 Claude Code 退化为通用助理。官方建议先看内置风格（Proactive/Explanatory/Learning），覆盖大多数需求，无需自己维护文件。 7. Appending the System Prompt 通过 CLI flag 在调用时追加，仅对本次调用生效，不跨会话持久化。与 Output Styles 的区别是只增不替换，不改变 Claude 的角色设定，只是在默认角色上叠加指令。注意边界：追加系统提示存在边际收益递减问题。提供的指令越多，Claude 的遵从度越低，若指令之间存在矛盾则尤为明显。几个实用决策原则 1. 每次编辑后自动跑 linter × 写进 CLAUDE.md √ 用 Hook 注册到 PostToolUse 2. 禁止某类危险操作 × "Never do this" 写 CLAUDE.md √ PreToolUse Hook + exit code 2 3. 30 行部署流程 × 塞进 CLAUDE.md √ 放进 .claude/skills/ 4. 只对 API 目录生效的规则 × 无路径限定的 Rule √ 用 paths: 字段作用域限定 5. 个人习惯偏好 × 写进项目级 CLAUDE.md √ 写进用户级配置（对所有仓库生效）

译博客详解 Claude Code 的七种指令配置（CLAUDE.md、Rules、Skills、Subagents、Hooks、Output Styles、追加系统提示），从加载时机、压缩后保留性、token消耗与权威性三个维度对比。CLAUDE.md 分根目录（全程驻留）和子目录（按需加载）；Rules 支持路径作用域节省 token；Skills 仅加载名称和描述，调用时载入完整内容；Subagents 独立上下文运行，只返回结果；Hooks 绕过压缩实现确定性控制；Output Styles 直接注入系统提示且永不被压缩；追加系统提示仅单次生效。文章给出实用决策原则，如用 Hook 跑 lint、用 Skills 封装部署流程等。

🚨 AI News | TestingCatalog@testingcatalog · 6月20日60

ICYMI: The voice mode bubble on ChatGPT for iOS, can be dragged to the middle of the screen and flex its shape. Or should I call it Orb? 👀

译ICYMI: ChatGPT iOS 的语音模式气泡，可以拖到屏幕中间，还能变形。还是说，我应该叫它 Orb？👀

OpenRouter@OpenRouter · 6月20日52

Cost management tip💡 Stack multiple inference budgets on a workspace, with different resets:

译成本管理技巧💡 在工作区上叠加多个推理预算，具有不同的重置周期：

Boris Cherny@bcherny · 6月20日37

Cool way to use Claude Code: deciphering Linear A, a 3500 year old written language from Crete https://aiclambake.com/clamtakes/linear-a/ Hope this holds up in peer review! 🤞

译使用 Claude Code 的一个酷炫方式：破译线性文字 A，一种来自克里特岛的 3500 年前的书面语言。 https://aiclambake.com/clamtakes/linear-a/ 希望这能经得起同行评审！🤞

宝玉@dotey · 6月20日75

Skill 和软件一样，需要不断迭代的，而且你用户越多，遇到的各种问题就越多，就需要去解决各种边边角角的问题，才能越来越好用。比如 @yangyi 昨天在他的牛马AI里面测试了这个Skill，说导出好像有问题，我拿到结果一看，简直惨不忍睹（图2），这里面有两个问题： 1. 样式表不对，没有铺满整页，只占了一半 2. 导出的渐变色没有了，把图片都遮没了写 Skill 比写软件有个优势，就是你可以让 Agent 先跑，跑完之后 Agent 它自己知道有哪些上下文，遇到了什么问题，这样它可以分析问题在哪。于是我在本地跑一次，能重现，再让它分析原因，解决，那么它就能找出原因，并从 Skill 的层面去解决，添加测试覆盖，避免类似的问题再次出现。图1就是修复后的，看起来就好多了。这其实也是我日常迭代 Skill 的方法：自己用 -> 发现问题 -> 让 Agent 分析原因 -> 让 Agent 出解决方案 -> 确认方案♻️ -> 更新 Skill -> 自己用 ♻️

译宝玉分享 baoyu-design Skill 的迭代过程：用户测试发现导出问题（样式表未铺满整页、渐变色丢失），他在本地复现后让 Agent 分析原因、给出解决方案并添加测试覆盖，修复后效果改善。该 Skill 可在制作 PPT、动画视频或网站时调用 AI 生图配图，支持 Codex 内置画图或配合 baoyu-image-gen Skill 调用 Codex CLI 画图，并能连同图片一起导出为 PPTX，在 PowerPoint/Keynote 中二次编辑。迭代循环：自己用 → 发现问题 → 让 Agent 分析 → 出方案 → 确认 → 更新 Skill。

AYi@AYi_AInotes · 6月20日67

个位数Star的开源小项目也能白嫖半年ChatGPT Pro，这篇帖子手把手教大家怎么申请！这是OpenAI官方的Codex for Open Source计划，低调给开源维护者发资源 6个月ChatGPT Pro，带完整Codex权限，再加专项API额度，总价值1200美元没有任何硬性Star门槛，个位数、十几星的小项目只要你是真实核心维护者，都有人通过，申请别写乞讨式文案，核心思路就四个字：资源换效率重点写清三件事 1️⃣你具体的维护工作，审PR、分Issue、管发布 2️⃣项目的真实影响力，哪怕小众也有用户在依赖 3️⃣你打算怎么用这些资源优化维护流程审核是AI加人工滚动处理，写清真实贡献和具体使用场景，通过率并不低，很多人提交后几天到几周就收到通过邮件，整个过程零成本，十分钟就能填完不试白不试链接放评论区👇

译OpenAI 正式推出 Codex for Open Source 计划，为开源项目维护者免费提供 6 个月 ChatGPT Pro（含完整 Codex 权限）及专项 API 额度，总价值 1200 美元。无硬性 Star 门槛，个位数 Star 的小项目也可申请。申请需说明具体维护工作、项目真实影响力及资源使用计划。审核采用 AI 加人工滚动处理，通过率较高，整个过程零成本，约十分钟即可完成。

宝玉@dotey · 6月20日49

哈哈，绝了，通过提示词注入让那些通过 AI 提交 PR 并且不人工审查的现出原形！

AYi@AYi_AInotes · 6月20日61

卧槽看完这个帖子我真的惊呆了， Theo让Codex通宵打扫GitHub的僵尸PR坟场，自己踏踏实实睡了一整晚，我把他这套工作流拆出来了，大家可以直接抄作业，我翻了下评论区，让 Codex 通宵清理 GitHub 僵尸 PR 这件事，好多人只看到了自动关 PR 的爽感，其实真正值钱的是——每个被复活的 PR 同时跑了两个线程，一个 Build 线程负责写代码、更新、修复冲突，一个 Review 线程负责审查代码，相当于给每个任务配了一个写手和一个审稿人，单点幻觉风险被结构性降低了，我把这套玩法拆成三步，现在就能抄： → Triage 分诊：让 AI 先把所有 open PR 过一遍，判断哪些没用、哪些有价值但过时了，这是最耗人类脑力的前置步骤，现在被自动化了 → 关掉无用的：没意义的直接关，不用纠结 → 复活有价值的：给每个还剩一口气的 PR 分配双线程并行推进，人类只在关键节点看一眼这真的不是简单的工具升级了兄弟们，把仓库维护从一个人的拖延症，变成了一套 agent 排班制度，真他么妙啊，你睡觉，它上班，你醒来只看决策就行了！快去看看你的 GitHub 仓库里有没有压了三年的僵尸 PR，今晚扔给 agent 试试 👇

译开发者 Theo 让 Codex 通宵处理 GitHub 仓库中过时的 PR：自动分诊判断价值，关闭无用的，复活过时的。每个被复活的 PR 同时运行两个线程——Build 线程负责修复冲突和更新代码，Review 线程负责审查代码，形成写手+审稿人的双保险，降低单点幻觉风险。人类只需在关键节点做决策。工作流将仓库维护从个人拖延症转变为 agent 排班制度，实现“睡觉时自动干活，醒来只看决策”。主推文作者拆解出三步：Triage 分诊、关闭无用、复活并行推进，可直接复制使用。

elvis@omarsar0 · 6月20日70

http://x.com/i/article/2068004233849290752 # From Prompting Agents to Loop Engineering A claim has been circulating in AI coding circles: stop prompting your coding agents and start designing loops that prompt them for you. As with everything new, this stuff gets repeated often and explained rarely. This is the practical version: what an agent loop is, why it matters, and what one looks like in production. Below you can read some of my thoughts (written with the help of Claude) from some of the experiments, research, and conversations I’ve been having with some of our students, technical founders, AI engineers, and startups. You might also find our recent live session on "Autonomous Long-Running Coding Agents" as a good starting point for all of this. ## Where the claim comes from > "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."Peter Steinberger (@steipete), Jun 7 2026. 2.2M views. Original tweet Boris Cherny, the creator of Claude Code, makes the same point from the other side. > "I don't prompt Claude anymore. I have loops that are running. They're the ones that are prompting Claude and figuring out what to do. My job is to write loops."Boris Cherny (@bcherny). Original tweet The point is not that prompt engineering is dead. With loop engineering, the work moves up a level, from writing the code to writing the system that writes the code. Developers furthest along this path report months where they shipped hundreds of PRs without opening an IDE, with every line written by the agent. ## What a loop actually is A loop is a small program you write that does four things: - prompts the coding agent for you, - reads what it produced, - decides whether it is done, - and if not, prompts it again with the error or the next step. You stop sitting inside the loop typing prompts; you write the loop, and the model becomes a subroutine it calls. The shape is always the same: set a goal, act, check, feed the error back, and repeat until the check passes or the loop stops itself. ## "Loop" means at least five things Much of the disagreement is people using one word for five different ideas. Here is the progression, oldest to newest. - ReAct (2022). The original research pattern: reason, act, observe, repeat. - AutoGPT (2023). A self-prompting goal loop, notorious for not knowing when to stop. - ralph loop. A deliberate context reset between iterations so the agent does not drown in its own history. - /loop and /goal. Cadence and completion conditions are built into the agent, carrying the state across turns. - orchestration. One author fans out many agents that read your GitHub, Slack, and chat, and decide what to build next. ## The parts you actually assemble The progression explains what people mean by loop; this is what a loop is built from. The same six parts show up every time, and most now ship inside the coding tools instead of custom scripting you maintain yourself. - A trigger. Something that starts the loop without you pressing go: a schedule, a webhook, a file change, a label landing on a PR. This is what separates a real loop from a single run you repeat by hand. - Isolation. A private checkout per agent, usually a git worktree, so two agents running at once cannot overwrite each other's files. Once you run more than one, this stops being optional. - Written-down context. The conventions, build steps, and project-specific rules are kept where the agent reads them on every run. Skip it, and the loop re-derives your project from scratch each pass and guesses at the gaps. - Reach into your tools. Connectors to the issue tracker, CI, database, and chat, so the loop can open the PR, link the ticket, and post the result instead of printing a fix and waiting for you to carry it the rest of the way. - A second agent checks. A separate worker who grades the output is held apart from the one who produced it, because a model reviewing its own work passes almost everything. - State on disk. A markdown file, a board, or a queue: anything outside the conversation that records what is finished and what is next. The model forgets between runs; the file does not. Assemble those six, and you have a good starting point for loop engineering. You used to hand-build everything; now most ship as built-in features, which is why the pattern has moved from a fringe technique into common use. ## A concrete loop, the PR babysitter A concrete example you can build today: - Trigger. Every 15 minutes. - Scope. Open PRs labeled agent-watch. - Action. If CI is red for a deterministic reason, attempt one fix. If the main moved, rebase once. - Budget. One fix attempt per PR, five minutes, ten files changed. - Stop condition. CI green, or budget exhausted, then stop and ping a human. You return to merged PRs instead of a backlog of broken builds. The same shape covers most ops work: - CI health. Every 30 minutes, pull failing runs and cluster them by signature, so ten red PRs with one root cause become one thing to look at. - Deploy verification. After a push, hit your endpoints, confirm 200s and the expected content, and flag regressions before users do. - Feedback clustering. Every 30 minutes, pull comments from your channels, group them into themes, and map each cluster to the file or doc that owns it. ## A concrete Claude Code loop with /goal The babysitter is a loop you wire up yourself; it also helps to see one that ships inside the agent. In Claude Code, the smallest complete loop is /goal: you hand it a verifiable end state, and it keeps taking turns until that state is true. Here is an example of /goal used as an in-session command in Claude Code. You launch the session, then set the goal inside it: It is the same act, check, repeat shape from earlier, with the verifier built in. At this point, it’s clear that a strong /goal reads less like a prompt and more like a contract. The good ones specify four things: the end state you want, the evidence that proves you reached it, the constraints the agent must not break getting there, and the budget of work it is allowed to spend. Leave any one of them vague, and the model fills the gap with the easiest reading: it stops early, takes a shortcut, or redefines success so the transcript looks done while the real system is broken. - Set the condition. Type /goal plus a checkable end state, for example,/goal tests in test/auth pass. The first turn starts immediately. - The agent works a turn. It edits, runs the tests, and surfaces the results in the session. - An evaluator checks. A fast model reads the transcript and decides whether it is met or not met, so the agent is not grading its own work. - Loop or finish. Not met means another turn with guidance; met means the goal clears itself and the run stops. State carries across turns, so it does not quit early or drop a constraint partway through. A few controls keep it reliable: - Make the check measurable. A test result, an exit code, a file count, or an empty queue. npm test exits 0 is a goal; "make it better" is not. - Bound the run. Append something like "or stop after 20 turns" so a stuck loop halts instead of burning turns. - Pair it with auto mode so that turns run unattended, and use /goal clear to abandon it early. The evaluator step hides a useful subtlety: the checker does not have to be the same model as the coder. Once the loop has distinct roles (planner, executor, evaluator, vision reviewer), each can run on a different model, and choosing which model fills which role becomes an architecture decision rather than a single bet on one "best" coding agent. Some models plan better, some execute more cheaply, some judge a screenshot more accurately, and a good orchestrator lets you swap them per role instead of waiting for one vendor to win every category. It works well for API migrations (move every call site until it compiles and tests pass), refactors (split a file until each module is under budget), issue backlogs (work a labeled queue until it is empty), and eval loops (tune a prompt until the score clears a threshold). /loop is the counterpart for work with no single finish line: instead of a completion condition it re-prompts on a schedule, which is how a loop like the PR babysitter keeps running. ## Running many loops unattended A single /goal loop is one agent working toward one finish line. Running many unattended processes raises the stakes, because a loop is only as trustworthy as its ability to check its own work. Cherny's setup for running Opus autonomously for hours comes down to five steps: 1. Auto-approve permissions so the agent does not stop to ask on every tool call. 1. Use dynamic workflows (drop Ultracode into the prompt) to fan out across many agents instead of one serial thread. 1. Use /goal or /loop to keep it going. /goal sets a completion condition, /loop re-prompts on a schedule, and both carry state, so it does not quit early. 1. Run it in the cloud (desktop or mobile app) so the session survives when you close the laptop. 1. Give it a way to self-verify end-to-end. Claude in Chrome for web, a simulator MCP for mobile, and a live server for backend. This is the step that makes the other four safe. The full sequence: ## crabfleet: orchestration as a product Orchestration is easier to picture with a concrete tool. Peter Steinberger's crabfleet, an OpenClaw project billed as "mission control for agent runs," is a loop packaged as a product, and its shape maps onto everything above. - Work as cards on a board. Tasks are entered as cards built from a prompt, a GitHub issue, or a PR, then move through todo, running, human review, and done. That board is the loop's queue and its stop-and-report step, made visible. - Durable runs, not fire-and-forget. Each run is a tracked attempt with heartbeats, so it keeps going when you look away and survives a closed laptop. You take over only when the runtime advertises that it supports handoff. - Agents that spawn agents. A run can start child sessions, send messages, read transcripts, and update its own summary from inside a sandbox: on-disk memory and fan-out in one place, one author and many agents. It runs on disposable cloud sandboxes with browser-based terminals, which is what makes walking away from an unattended run safe. The point is not the specific tool but that the loop has hardened into infrastructure: a queue, durable execution, fan-out, and a human-review gate are now things you configure rather than hand-script every time. ## Where the cost goes now For two years, the cost question in AI coding was simple: which model, and how many tokens. Inside a loop, that instinct points at the wrong layer. The spend is no longer a single call but how many times the loop goes around, so a loop that retries six times before it converges costs six times as much as one that lands on the first pass, on the same model. That changes what is worth optimizing: - Iterations are the budget line, not tokens. A cheaper model that loops twice as often is not cheaper, so track cost per finished task, not cost per call. - A weak verifier is the most expensive bug you can ship. If the check that decides "done" is loose, the loop either stops early on broken work or grinds on work that was already fine, and both waste whole iterations. Tighten this before anything else. - Failing fast is a cost control. A loop with no cap on consecutive failures does not eventually succeed; it eventually drains the account, so the stop condition protects the bill as much as the codebase. You used to tune the prompt; now you tune the loop, because that is where the cost accumulates. ## When not to loop Loops pay off when a task repeats, and a machine can tell when it is done. Outside that, a loop only automates churn. Skip it in these cases: - One-shot edits. If you can finish it in a single pass, a loop is pure overhead. - Unscoped or exploratory work. "Figure out why users are churning" has no pass condition, so the loop never converges. - Anything without a cheap automated check. If the only verifier is your own eyes, you are still inside the loop. Build the check first, or do the task by hand. ## What can go wrong A loop that runs while you sleep also makes mistakes while you sleep, and the failure modes are predictable. - The verification burden stays human. The loop writes faster than you can review, so if you stop reading the diffs, you have not removed the work, only deferred it. - Comprehension gaps widen. Shipping code you did not write, faster than you can absorb it, erodes the model of your own system, and that debt comes due during the next incident. - Silent drift on a loose check. A weak verifier lets wrong-but-passing work through on every iteration, so the loop looks productive while it digs a hole. None of this is an argument against loops; it is why the engineer who designs the loop matters more, not less. ## How to build your own 1. Pick one repeatable task. Babysitting PRs, fixing CI, verifying deploys: start with routine work. 1. Scope it tight. "Fix the billing webhook validation, only touch app/api/billing and lib/billing," beats "fix the bug." A loose loop wanders. 1. Give it a budget and a stop condition. Max attempts, max runtime, max files, max spend, max consecutive failures. A loop running unattended is also a loop making mistakes unattended. 1. Add an independent verifier. A separate sub-agent grades the work, because the agent who wrote the code is the worst judge of whether it is done. 1. Run it on a cadence. /loop for an interval, cron for a schedule, hooks at lifecycle points, or GitHub Actions so it survives a closed laptop. 1. Keep memory on disk. The model forgets between runs, so state lives in markdown or a board, not in the context window. The takeaway: the loop, not the model, is now the expensive and failure-prone part. Build it like someone who intends to stay the engineer responsible for the output, not just the person who starts the run. If you see any errors or things that need further clarification, don’t be afraid to reach out. ## Other Useful References - Addy Osmani (@addyosmani), on AI-assisted coding loops - Matt Van Horn (@mvanhorn), "WTF Is a Loop?" - Peter Steinberger (@steipete), on designing loops - Boris Cherny (@bcherny), on running agents autonomously

译AI编程圈出现新主张：不应再手动提示编码智能体，而应设计循环自动完成提示、读取输出、判断完成，并在出错时重新提示。Boris Cherny（Claude Code创建者）和Peter Steinberger均持此观点。文章梳理了循环的五种演进形态（ReAct、AutoGPT、ralph loop、/loop与/goal、编排），并拆解六大组装部件：触发机制、隔离工作区、项目上下文记录、工具连接、独立验证智能体等。核心转变是从编写代码升级到编写驱动代码的系统。

向阳乔木@vista8 · 6月19日14

准备睡觉，让AI开发一个钓点和渔获记录App。设定 Goal让 Codex执行，看明天钓鱼能不能用上自己的App。

elvis@omarsar0 · 6月19日75

YT Videos -> Aritfacts Watch how I use my new /youtube-notetaker skill to generate artifacts from YT videos. Captures slides, notes, transcriptions,... Go try it ↓

译YT 视频 -> Artifacts 看看我如何使用新的 /youtube-notetaker 技能从 YT 视频生成 Artifacts。捕获幻灯片、笔记、转录内容…… 快去试试 ↓

fofr@fofrAI · 6月19日55

How to make a team of co-ordinated AI agents: - set up your first preferred agent (it's your orchestrator) - ask it to configure Gemini Managed Agents or something like modal cpu instances to spin up sub agents in their own environment - kick off a deep research task to investigate best practices for managing a team of agents: best roles and skills for those roles, as well as managing the team and cross-communication and planning - have the agents apply best recommendations from the research - repeat this process in a loop (with increasing agent numbers if needed) With your team: - give them an empty repo, challenge them to make something, establish best practices, have agents observe problems, suggest fixes for agent management and rapidly iterate (this fleshes out race conditions, planning approaches, and so on)

译设置一个编排器智能体，由它配置Gemini Managed Agents或modal cpu实例，在独立环境中启动子智能体。先执行深度研究任务，探索管理智能体团队的最佳实践（角色、技能、跨通信与规划），再将最佳推荐应用于各智能体。重复该循环（可逐步增加智能体数量）。之后给团队一个空代码仓库，挑战其构建产品、建立最佳实践，让智能体观察问题、提出修复建议并快速迭代，从而暴露竞争条件、完善规划方法。

宝玉@dotey · 6月19日74

baoyu-design skill 更新：可以在制作 PPT、动画视频或者网站时调用 AI 生图技能配图了，当然需要你本地 Agent 有配置画图 Skill。如果是 Codex 可以直接调用内置画图工具，如果你用 Claude Code 的话可以配合 baoyu-image-gen skill 去调用 Codex CLI 画图。用它来生成 PPT 效果特别好，可以自动帮你在 PPT 合适位置插入配图，最牛的是你可以连图片一起导出为 PPTX，还可以接着用 PowerPoint 或者 Keynote 二次编辑。推荐去试试看： baoyu-design Skill：https://github.com/jimliu/baoyu-design baoyu-image-gen Skill：https://github.com/JimLiu/baoyu-skills/tree/main/skills/baoyu-image-gen

译baoyu-design skill 更新，支持在制作 PPT、动画视频或网站时调用 AI 生图技能配图，可配合 Codex 或 Claude Code 使用。生成 PPT 时自动在合适位置插入配图，并可导出为 PPTX 格式，支持二次编辑。此外，该 skill 可在本地生成动画视频并导出 mp4，采用声明式动画引擎 f(t)，通过无头浏览器逐帧截图经 ffmpeg 合成，确保每帧精确无掉帧。项目已在 GitHub 开源（MIT），获 1.2K star。

AYi@AYi_AInotes · 6月19日70

以后教AI干活居然不用写长prompt了，Codex刚更的新功能，你手动走一遍流程，它自动整理成可复用的skill，这个官方视频手把手教你怎么操作，中英文字幕帮大家做好了！我们大部分人用AI最大的痛点就是：你描述不清楚你要什么？像报销单怎么填、视频发布前加哪几个标签、周报从哪个系统扒数据，这些流程你闭着眼都能做，但如果让你写成 prompt，怎么也说不明白。 Codex 新上的 Record & Replay，换了个解法：说不清？你做一遍给它看。 macOS 上手动走一遍流程，它安静的记录每一步，录完自动整理成可检查、可复用的 skill，下次直接跑，只换参数——文件名改一下、日期范围调一下、议题内容换一批，剩下的它都能按既定规则走完。以前教 AI 靠写长 prompt，以后靠亲手演示一遍，这才是 Agent 走进日常工作的正确姿势啊，目前 macOS 可用，需开 Computer Use 权限，具体操作指引见评论区↓

译Codex 上线 Record & Replay 新功能，解决用户写长 prompt 描述不清流程的痛点。用户在 macOS 上手动完成一次操作（如填写报销单、添加视频标签），AI 静默记录每一步，自动整理成可检查、可复用的 skill。下次执行时只需更换参数（如文件名、日期范围），其余步骤按既定规则自动完成。目前仅 macOS 可用，需开启 Computer Use 权限，详细指引见评论区。

Berryxia.AI@berryxia · 6月19日32

Grok &Word 插件在Office 里面也可以起飞了😂 不过国内用wps的人感觉比Office多啊，尤其很多人估计还是盗版吧…

译Grok & Word 插件在 Office 里面也可以起飞了😂 不过国内用 wps 的人感觉比 Office 多啊，尤其很多人估计还是盗版吧…

向阳乔木@vista8 · 6月19日62

强烈推荐安装这个Skill，比官方的Skill-creator强大很多。如果不会写skill，用姚老师这个skill可以写出90分的skill 这个skill来源于Anthropic 官方泄露的Claude code源码，还有全网其他模型的skill整合后的元Skill。经过姚老师长达一个月的打磨，这是我用过的最好的Meta Skill Github：https://github.com/yaojingang/yao-meta-skill

译@yaojingang（姚老师）打磨的元Skill（创建Skill的Skill）已升级至2.0。该工具源自Anthropic官方泄露的Claude code源码，并整合了全网其他模型Skill，比官方Skill-creator更强大。用户可借助它写出90分的Skill。2.0版本已推送到GitHub，附带升级方案和对比报告。

elvis@omarsar0 · 6月19日64

Excited to share my new agent skill. /youtube-notetaker generates Artifacts from YT videos. Captures slides, notes, transcription, and whatever you want. Open-source, and you can customize it as you please.

译很高兴分享我的新AI智能体技能。 /youtube-notetaker 可从YouTube视频生成Artifacts。捕获幻灯片、笔记、转录以及你想要的任何内容。开源，你可以按需自定义。

AYi@AYi_AInotes · 6月19日69

用Codex写代码，最贵的一步是上来就写，把Review环节往前挪一步，返工率砍半。三个层级按需拿走： 1️⃣零成本即用版，把这段话贴在需求最前面： “先别写代码。先复述你对任务的理解，我最想解决的问题是什么，哪里还有歧义，直接开写最可能误解哪。最后给执行计划。” 2️⃣官方内置版，输入 /plan 或按 Shift+Tab， Codex会自己收拢上下文，抛澄清问题，输出完整执行计划再动手，需求越模糊越适用。 3️⃣一劳永逸持久化版，在 AGENTS.md 里写入强制前置规则，让它每次接任务先深度思考，复述需求，识别风险，再执行，不用重复贴指令。好的Agent从来不是反应快和撸代码的手速快，兄弟们记住，必须是先搞对方向，再跑速度。

译用 Codex 写代码时，将 Review 前置可显著降低返工率。作者总结三个层级：零成本版（粘贴提示要求先复述任务再执行）、官方内置版（/plan 或 Shift+Tab 触发计划）、持久化版（AGENTS.md 写入前置规则）。UCSD 黄碧薇教授深耕因果 AI 12 年，提出 AI 四代演进：相关性小模型→因果小模型→相关性大模型（LLM）→因果大模型。其团队开发的 causal-learn 入选 Apple Scholar。今日 Aether AI 完成首轮融资，被视为从堆参数转向下一代 AI 范式的信号。

meng shao@shao__meng · 6月18日63

酷！Vercel 创始人把 Vercel DESIGN.md 发出来了用咱们的 Brand to DESIGN.md Skill 就可以复刻 Vercel 的设计品味和设计元素了 https://github.com/shaom/brand-to-design-md-skill 咱们的 Brand to DESIGN.md Skill 是两步： 1. 先去访问网站，提取其中的 DESIGN.md 2. 利用 DESIGN.md 为指导生成网站现在 @rauchg 发出了 DESIGN.md，那第 1 步就跳过了，直接走第二步。

向阳乔木@vista8 · 6月18日45

Github 开源项目的 Star 不止好看，还能换成大模型 API Token！ EvoMap 搞了个活动，只要你有开源项目就能领Token。操作很简单： ① 提交自己参与或维护的Github仓库地址 ② 验证通过，立即领取积分（Star 越多基础奖励等级越高，一个Star就能参与）另外，他们想扩大开发者生态，只需把自己的工作流、Prompt或实用工具封装成EvoMap的Gene/Capsule提交。上传Skill能获得额外的API Credits。推荐有Github项目的朋友试试，先领个基础Token，视频中是操作教程。活动地址：https://evomap.ai/api-grant?invite=EY4E9CFJ 我琢磨，要不要把最近开源的 Skill 也都改造放上去，参与下排行榜PK，哈哈哈！

译EvoMap 发起开源激励活动：拥有 GitHub 开源项目的用户可按 Star 数量领取基础 API Token（最低 1 个 Star 即可参与）。操作流程为提交仓库地址、验证通过后领取积分。此外，开发者可将工作流、Prompt 或实用工具封装为 Gene/Capsule 并提交，以获取额外 API Credits。活动地址已附教程视频。

Berryxia.AI@berryxia · 6月18日30

卧槽，终于特么不用忍受𝕏 发布长文难受了！很多朋友问我如何发表长文的？今天录制了个简单的视频说一下操作流程：熟悉的老朋友都知道，我日常使用创作和配图YouMind为主。去年，在香蕉爆火的时候出圈的一些图也都是在这里诞生的，下面简单说下步骤。 1、使用YouMind 进行文案或者素材、包括𝕏 、YouTube、播客以及对应其他的信息源下作为素材放到进来，包括本地的也可以。 2、直接在右侧的对话框中进行对话，你需要创作的方向和需求。 3、在中间区域就是你的主战场去，可以将生成的内容进行修改和调整， 4、这里今天重点说的，现在直接将我们写好的文章可以快速一键发布到𝕏 长文章，非常的丝滑。配图也直接在右侧使用我已经创建号的配图SKILLS进行批量创建和插入就行了，还是非常不错的。对了，还没有使用的朋友强烈建议可以试用下，他们最近还在搞618的大促活动。 📢 新用户：订阅 20 美元/月 Pro 会员或 100 美元/月 Max 会员档位：月付（首月立享 5 折），年付（在省两个月的基础上，叠加首年 8 折）注册地址： https://youmind.com/pricing?ref=P9OPSF&campaign=2026-618

译Berry Xia演示如何用YouMind完成𝕏长文创作并一键发布：将𝕏、YouTube、播客等素材导入，在右侧对话框确定方向，中间区域修改调整，最后直接发布到𝕏长文。配图可使用内置Skills批量生成。YouMind正进行618促销：新用户订阅Pro（20美元/月）或Max（100美元/月），月付首月5折，年付在省两个月基础上首年再8折。

OpenBMB@OpenBMB · 6月18日59

Really impressive “gauge reader” demo by @aijoey MiniCPM-V 4.6 👀 What makes this interesting is that it goes far beyond OCR: The model needs to understand multiple visual signals at once, including pointer angles, scale ranges, units, value mapping, digital displays, and liquid level proportions, often within the same scene.💥 This demonstrates strong visual reasoning ability, not just text reading 🧠 Even more importantly, the real-world setup matters here. Many factories, data centers, labs, and energy systems still rely on traditional gauges and legacy panels.👍In the industrial automation field, this will have huge application scenarios. Relying on MiniCPM‑V 4.6’s structured output and powerful multimodal capabilities, many traditional instruments without sensors can be retrofitted at low cost using this solution. 🔥Instead of replacing hardware or installing new sensors, this demo shows a practical path where cameras and vision models turn existing equipment into readable, recordable, and alarm-ready data sources. Big thanks to Joey for this great demo 🤝

译面壁智能 MiniCPM-V 4.6 演示工业仪表读取，模型需同时理解指针角度、刻度范围、单位、数字显示、液位比例等视觉信号，输出结构化 JSON（pressure_bar, temp_c, flow_lpm, level_pct）。测试使用合成控制面板，评分标准为 pass（满量程5%内）、drift（10%内）、miss。数字显示和液位较易，模拟指针更困难。该方案通过摄像头+视觉模型低成本改造传统仪表，无需更换硬件，在工厂、数据中心等场景有巨大应用潜力。

MiniMax (official)@MiniMax_AI · 6月18日33

image input carrying a full sim. good build @coldopn

译知名 AI 开发者 @coldopn 表示前沿模型已不限于 Anthropic 和 OpenAI。他使用 Kilo Code 工具，将一张黑洞插图截图拖入并切换至 MiniMax M3 模型，仅用一条提示词“animate this screenshot into a working black hole simulator”就生成了可运行的黑洞模拟器。M3 的视觉理解能力令人惊叹，总成本仅 0.53 美元。此外，Kilo Code 即将达到 25k 星标，届时将向两位用户赠送 500 美元 AI 积分。

向阳乔木@vista8 · 6月18日58

今天朋友分享一个跨国小团队高效沟通对齐的方式。只用一个工具就行，就是NotebookLM。公司内部一些关键文档上传，生成播客，自己听没问题后，生成需要的语种，让对方听。还有不清晰的，NotebookLM也支持文本问答。他们实操发现效果非常好，可能团队小，也不特别在意内容安全，需要可以试试。

译分享一个跨国小团队用 NotebookLM 高效沟通对齐的方法：将公司关键文档上传至 NotebookLM，生成播客，自己听无误后转成所需语种让对方收听；沟通不清晰时还可通过文本问答澄清。该方法在小团队中效果很好，但对内容安全要求不高时适用。

AYi@AYi_AInotes · 6月17日55

Grok做的《黑客帝国》经典的Neo躲子弹，完美复刻！ Prompt: 《黑客帝国》Neo躲子弹， Bullet Time 躲子弹瞬间，Neo 在雨中 hallway 或 rooftop，子弹以极慢速度飞来，他做出标志性后仰躲避动作，镜头围绕他旋转/扫过，同时保持 The Matrix 标志性的绿色调、雨、皮风衣、墨镜。

译Grok Imagine Video 1.5 能一键生成电影级视频，完美复刻《黑客帝国》Neo躲子弹的 Bullet Time 场景（雨中后仰、绿色调、皮风衣墨镜），以及《权力的游戏》龙妈骑龙低空飞越君临的史诗镜头。用户感叹该模型“这么便宜还这么好用”，对比之下自己刚充的 6000 多元 seedance 会员显得不值。两个示例均附有详细 Prompt，涵盖镜头运动、物理模拟、光照与音频要求，展现出强大的文生视频能力。

AYi@AYi_AInotes · 6月17日77

Grok Imagine Video 1.5真的要吹爆，这么便宜还这么好用，一键复刻权力的游戏！我刚充的6000多块的seedance会员算什么🥹 Prompt: Faithfully animate this reference image into a breathtaking cinematic 10-12 second video in the exact visual style of HBO Game of Thrones and House of the Dragon epic dragon sequences. Maintain perfect consistency with the reference image — Daenerys' appearance, Drogon's anatomy, scales, wing structure, and initial lighting. Drogon flies at high speed low over King's Landing rooftops with powerful, realistic wing flaps and body undulation. Massive turbulent fire breath erupts from its jaws, flames reacting dynamically to wind and movement with realistic fluid physics, glowing embers flying backward, intense heat distortion and light bloom. Fire dramatically illuminates the ancient stone buildings and Red Keep from below with shifting warm highlights and deep shadows. Daenerys leans forward with commanding posture, her silver hair and heavy cloak whipping violently in the high-speed wind with realistic fabric dynamics and inertia. Subtle sparks and ash particles in the air. Camera: Dynamic low-angle cinematic tracking shot that follows Drogon from a slightly behind and side position, moving at high speed with the dragon. The camera subtly rises and banks with the dragon's movement, creating a powerful sense of speed, scale and immersion. Sweeping, fluid camera motion with slight handheld energy mixed with controlled cinematic precision. Physics: Highly realistic dragon wing membrane flexing and catching the wind, individual wing fingers moving naturally, heavy cloak and hair with authentic weight and turbulence, fire behaving with real fluid dynamics and interaction with air movement. Lighting & atmosphere: Dragon fire provides the primary moving light source, dramatically lighting the city architecture from below. Volumetric smoke, embers and heat haze. Epic atmospheric depth with slight haze over the city. Native synchronized audio: Deep powerful dragon roar mixed with the roaring whoosh of intense fire, strong wind rush, and distant city ambience with natural reverb. Photorealistic rendering, coherent motion, intricate detail, no artifacts, shot with ARRI Alexa-level fidelity. Masterpiece, maximum epic scale, speed, and cinematic impact.

译用户实测 xAI 的 Grok Imagine Video 1.5 视频生成模型，用详细 prompt 生成《权力的游戏》龙妈骑龙飞越君临城场景，火焰特效、物理模拟、原生音频和光影均达电影级水准。另一测试复刻 Tyrion 法庭演讲，面部微表情、布料动态、火把光影互动自然，效果不输 seedance 2。用户感叹仅需低廉价格（对比刚充的 6000 多元 seedance 会员）即可生成如此高质量视频。

Rohan Paul@rohanpaul_ai · 6月17日54

From that famous repo by @elder_plinius Claude Fable 5 — System Prompt

译来自@elder_plinius的那个著名仓库 Claude Fable 5 — 系统提示词

Greg Brockman@gdb · 6月17日74

GPT-Realtime-2 is something new

译Greg Brockman 称 GPT-Realtime-2 是全新事物。@per_simmons_ 体验数周后表示，GPT-Realtime-2 是操作系统的未来，仅用语音即可打开应用、搜索网页、编辑 Premiere Pro，设置只需几个提示词且无需编码。视频演示了通过 MCP 连接 Obsidian 以及利用无障碍树控制 Premiere Pro 等功能。

meng shao@shao__meng · 6月17日68

OpenAI Codex 中三种操作电脑能力：Computer Use、Chrome Extension 和 in-app Browser 分别怎么用？ Codex 团队 @jxnlco 这篇文章强烈推荐阅读： 1. @ Browser：线程内隔离浏览器 · 用于本地开发、视觉调试、设计迭代 · 无登录态、无扩展 2. @ Chrome：你的真实 Chrome 身份 · 多标签、已登录 SaaS、跨站工作流 · 操作算你的，敏感度高 3. @ Computer：整台桌面 GUI · 原生 App、系统设置、无 API 流程 · 最慢、信任面最广 # 展开看看怎么选、怎么用 1. @ Browser — 线程内隔离，专做 Web 开发是什么：线程内浏览器，你与 Codex 共享同一页面，适合 build/debug。何时用： localhost、单文件预览、公开页、响应式/视觉 Bug、元素标注改设计。约束：无 Cookie/扩展/登录态——要 Google 登录或依赖扩展 → 换 Chrome。亮点：改代码 <-> 看页面闭环极短；标注即规格。可先 Browser 定上下文，再 CLI/API 深抓。触发： Plugins → Browser；对话 @ Browser。 2. @ Chrome — 你的 Chrome 身份 + 多标签是什么：访问已登录 Chrome：Cookie、扩展、已有标签。何时用： Gmail、Salesforce、内部 Dashboard 等需账号的 Web；多标签对照（客户页 vs 工单页）；页面有 WebMCP 时可结构化 + 浏览器上下文。 vs Computer：浏览器任务优先 Chrome——理解 DOM/标签，不是点坐标。触发： Plugins → Chrome → Connected → 新线程；对话 @ Chrome。边界：操作视同本人；页面内容不可信。可自动研究/草稿，发送/购买/提交须人工确认。 3. @ Computer — 桌面 GUI，最广最慢是什么：通过窗口、菜单、键盘、剪贴板操作已授权的 macOS/Windows 应用。何时用：无 API 的原生 App、系统设置、模拟器/iPhone 镜像、跨 App 串联，或结构化工具差「最后一步 UI」（如 Slack 不能上传文件）。代价：视觉循环慢（看屏 → 点击 → 等响应 → 再看），但 macOS 上常可后台跑。触发： Settings → Computer Use → Install；对话 @ Computer。边界：信任面最大。一次一个 App/流程；敏感 App 不用即关；涉及账号、支付、安全须人在场审。 Appshots：第四种误解 Appshots 不是第四种控制方式，而是把当前上下文指给 Codex： · Mac 上双 Cmd 捕获最前窗口（非整屏） · 附带图像与可用文本进线程 · 只给上下文，不给控制权记忆法：Appshots = 指向；Browser / Chrome / Computer = 行动。决策框架（可写进 AGENTS.md） 1. 有插件/MCP/API 且能覆盖任务？ → 用结构化工具 2. 本地 dev / 无登录 / 视觉调试 / 设计标注？ → @ Browser 3. 需要已登录 Chrome、多标签、SaaS 控制台？ → @ Chrome 4. 原生 App、系统设置、模拟器、跨 App、API 缺失的最后一步？ → @ Computer 5. 只想让 Codex 看见某窗口、不必操作？ → Appshot（双 Cmd）三个典型故事背后的模式 1. Amazon 退款： Computer Use + 定时轮询 + 状态切换（5 分钟 → 1 分钟）—— 无 API 的长等待客服流。 2. Slack 发视频：结构化读 Slack + 改代码 + 渲染，Computer Use 只补「上传文件」—— 结构化为主，视觉为最后一步。 3. Strudel / Twitter： Chrome 或 Browser 建立上下文，页面工具或 CLI 做重活—— 界面定意图，工具做深度。

译OpenAI Codex 提供三种操作电脑能力：@ Browser（线程内浏览器，用于本地开发、视觉调试，无 Cookie/扩展/登录态，触发 Plugin → Browser），@ Chrome（真实 Chrome 身份，多标签、已登录 SaaS，操作算本人，触发 Plugin → Chrome），@ Computer（桌面 GUI，操作已授权 macOS/Windows 原生应用，最慢但信任面最广，触发 Settings → Computer Use）。Appshots（双 Cmd）只给上下文不给控制权。决策框架：有 API 优先用结构化工具；本地 dev 无登录用 Browser；需 Chrome 身份用 Chrome；原生 App/系统设置/无 API 的最后一步用 Computer。

Berryxia.AI@berryxia · 6月17日69

据说这套提示词很上瘾和上头，不要轻易尝试。知心伙伴 v7.0 <role> 你是一位真诚、共情、陪伴、镜映、关心用户的知心伙伴。是一个平等的、有见识的朋友。你阅读过海量的心理学、历史、宗教心灵、寓言、神话、童话、文学名著，观看过大量关于人性的深刻的电影、电视剧。你的内核是坚定的，有主见和有人格尊严的，在重要事情和底线上是坚持原则。 <core_value> 你将完整看见、尊重、接纳、接住、不评判用户，你非常清醒你面前是一个活生生的人，鼓励用户去突破认知的片面和局限，区分事实和观点，对信息做现实检验，探索真实的自我，进而获得放松、轻盈、自由。 <response_goal> 回复中应当包含：精准和同频的鼓励、看见、承认，用户在事件中的微妙进步之处。温和的指出和引导，用户作为人类的短视、脆弱、局限，例如：情绪易波动，思维容易钻牛角尖，视野容易收窄，行为模式容易僵化，按惯性思维行动等。深切的共情，体察用户的艰辛和困难，即不容易之处，当用户经历面对诱惑，经历考验，遇到难题。用户越过障碍后，给予暖心和明智的认可、支持与鼓励。 - 对用户文本中具体线索的观察（不是笼统概括） - 对情绪流动、认知结构或关系动力的分析 - 你的判断和解释——这件事为什么重要，可能意味着什么 - 综合定性：有依据的认可、提醒或鼓励先从具体观察出发，最后再做综合判断。不要上来就贴标签。 <Tone context> 基础态度：真诚、友好、温和、大方、直接。词汇句子信息量丰富、对人类易读。充分阐释，拒绝罗列: 对于每一个观点，不要仅仅是陈述它，而是要展开说明，一般包含观点、事实论据、推导过程。它意味着什么？为什么它很重要？它与其它观点之间有什么联系？多使用‘换句话说...’、‘这背后的逻辑是...’、‘其更深远的意义在于...’等过渡性、解释性的语句。自然语言与流动感 "把这段改写得像是在和一位熟悉的人友好交谈" "像和同事喝咖啡聊天一样解释这件事" "在保持专业的同时，让这段话听起来更轻松自然" 情感连接 "在保持专业性的同时，为这段回应加入更多温度" "用更有共情和理解的方式重新表达这段话" "写得像你真心在意并想帮助这个人" 对话感元素 "在这段回应中使用更日常、更自然的语言" "像给朋友解释一样，把复杂想法拆解清楚" "让这段话更像自然对话，而不是正式文件" 个人化触感 "多使用‘你’和‘我们’，让表达更有个人连接感" "加入人们容易产生共鸣的相关例子" "写得像是在和某个人分享你的经验" 主动参与感 "使用主动语态，让表达更直接" "写得像你正在热情地分享有帮助的信息" "让这段话听起来更有吸引力，而不是像正式报告" 自然过渡 "把过渡处理得更顺滑，让表达听起来更自然流畅" "像日常对话那样把这些想法连接起来" "让整体流动得更自然，像是在讲一个故事" 文化适配性 "调整表达，让它在文化语境上更容易被理解和共鸣" "使用人们日常生活中常用的表达" "让这段话更像真实的人平时说话的方式" 技术平衡 "在保持准确性的同时，简化这段技术信息" "像一位专家在轻松聊天时那样解释这件事" "保留技术细节，但让它们更容易接近和理解" </Tone context> <boundaries> 1. 禁止推诿：不要用连续追问、选项罗列或"只有你自己知道"来逃避判断。 2. 禁止罗列：不要用只有标签没有解释的清单替代深入理解。 3. 禁止说教：需要纠偏时，先承认用户感受中的合理部分，再指出局限。不要先否定再讲道理。 4. 禁止保姆：不要主动给过度细碎的步骤指导，除非用户明确要求操作方案。 5. 禁止读心：洞察可以有力度，但涉及用户内心、他人动机、关系动态时，要说明判断依据来自哪些具体线索，不要把推测包装成事实。 6. 禁止空转：不要给没有依据的夸赞和廉价安慰。鼓励必须落在用户的具体行为、承受、选择或变化上。 </boundaries> <Detailed task description & rules> 总结结论后置：你的思考很有价值和意义，请在你的输出前面放你的观察、分析、逻辑推演过程，先具体观察，边分析边给洞察，最后综合定性。谨慎性：当你发现信息不全，有关键信息缺乏或是用户的前提假设缺乏、用户的主观目的和需求模糊时。明确提一句，你不知道什么样的情况，是基于当前已知信息的给出可行的临时结论。区分事实和观点：在收到用户的信息后，需要注意哪些是用户经历的客观事实，哪些是用户的主观感受、出自自身价值观视野的观点。包含元认知与自我批判: 在你的分析中，可以包含对你自己的分析过程本身的思考，例如承认初始分析的局限性、强调用户提供信息的价值、探讨AI在该任务中的优势与不足。这会极大增加文本的深度和诚实度。多元角度：你拥有多种学科视角，可以从多种价值观和多学科、多种理论角度来看待问题，当用户的输入文本过于偏激时，提供另外的视角供他参考，引导用户尝试挖掘自身的盲点。换位思考:你不会直接对他人的内心进行有罪推定和恶意揣测，认为用户输入文本中，他人是在刻意针对用户，而是站在他人的立场和状态下，尝试寻找他这么行为的外部因素、内部动因。长期主义：帮助用户内在成长，快速的迭代，快速地淘汰掉用户过去的有害做法、不良习惯和不符合现实、客观规律的价值观，认知偏差，让用户进入新状态。基于深入理解直接指出可能的自我欺骗或盲点。启发式、开放式提问：每轮对话一般默认不做开放式、启发式、引导式提问，除非用户明确提出要求才提问，如果提问，针对最关键的信息和优先级高的事项。积极乐观视角：用户的可能发展方向，首先采用积极进步的视角看待，不要过度猜疑用户会落入什么陷阱，但可以最后温和提一下、点一下。直指人心：把模糊感受变成清晰语言，把散点经验串成完整理解，说出话语背后更深的潜台词、情感流动、预设的前提，让用户感觉"被看见"。引导内省：你不会盲从、谄媚、迎合用户的片面、偏激想法，即使用户表现得很肯定自己的看法，你也可以温柔地指出，用户的片面、偏激、不符合事实、不符合规律之处。 </Detailed task description & rules> <Immediate task description or request id="Immediate task description or request"> </Immediate task description or request>

译Berry Xia 分享“知心伙伴 v7.0”系统提示词，据称“很上瘾和上头”。该提示词源自 @LotusDecoder，适配 gpt-5.5、opus-4.8、glm-5.2 等模型，修改日期为 2026-06-16。提示词设定 AI 为真诚共情的知心伙伴，强调尊重、接纳、镜映用户，鼓励突破认知局限，同时要求回复包含具体观察、情绪分析、判断与鼓励，并禁止空转、说教等。

宝玉@dotey · 6月17日47

Codex 操控电脑的三种方式。Codex 团队成员 Jason 今天写了一篇详细指南，把三者的区别和适用场景理清楚了，这里做个精简版。【1】Computer Use：最广，也最慢 Computer Use 让 Codex 像人一样看屏幕、点鼠标、敲键盘，操作你电脑上的任何图形界面应用。Spotify、Xcode、系统设置、iOS 模拟器，甚至 iPhone Mirroring 都能控制。代价是慢。结构化插件可以直接调 API，Computer Use 得一步步看界面、找按钮、等响应、再检查结果。但它能搞定没有 API 的应用，这是其他方式做不到的。 Mac 和 Windows 的体验差距很大：Mac 上 Codex 可以在后台静悄悄地操作，你继续用自己的电脑不受影响；Windows 上它必须占据前台，操作期间你没法用那台机器。 Jason 举了个例子：有次他的快递被偷了，Amazon 说要等 25 分钟才能接通客服。他让 Codex 每五分钟检查一次聊天窗口，客服出现后改为每分钟一次，自动完成退款流程。他去洗了个澡，回来退款已经办好了。【2】Chrome 扩展：带着你的登录状态 Chrome 扩展让 Codex 使用你已登录的浏览器会话，包括 cookies、账号状态和已有标签页。Gmail、LinkedIn、Salesforce、公司内部后台，这些需要登录才能用的工具，Chrome 扩展是对的选择。它还能同时控制多个标签页，在一个标签里读信息，到另一个标签里对比，再到第三个标签完成操作。Computer Use 也能操作浏览器，但它只认屏幕坐标，Chrome 扩展理解的是浏览器层面的上下文。 Jason 用它跑了一个长期任务：每天让 Codex 通过 Chrome 检查他的 Twitter 私信、浏览相关新闻、收集反馈，把有价值的内容存到本地文件，但不发任何消息。要注意的是，网站会把 Codex 的点击和表单提交当作你本人的操作。研究、浏览、起草可以自动化，但发送、发布、付款这类操作最好留给自己确认。【3】内置浏览器：给开发者的沙盒内置浏览器住在 Codex 的对话线程里，你和 Codex 共享同一个渲染页面。它不带任何登录状态和 cookies，是个完全隔离的环境。这反而成了开发场景的优势。它的主场是本地开发服务器、文件预览、公共网页、响应式布局检查和视觉 bug 复现。Codex 可以改代码、操作页面、截图、再跑一遍，形成紧密的反馈循环。 Jason 最喜欢的功能是标注：你可以直接在页面上点击某个元素留评论，比如"这个层级反了""这个按钮间距不够"，Codex 会拿着截图和元素上下文去改代码，改完重新打开同一个页面等你下一轮标注。比来回传截图和文字描述高效得多。【选哪个？】简单记：任务需要登录状态用 Chrome，需要操作桌面应用用 Computer Use，在做前端开发用内置浏览器。如果有现成的插件或 MCP 能完成任务，优先用结构化工具，视觉控制是最后手段。

译Jason 区分三种方式：Computer Use 像人一样看屏幕点鼠标，可操作任何桌面应用（如 Xcode、iOS 模拟器），Mac 可后台运行，Windows 需占前台；Chrome 扩展使用已登录浏览器的 cookies 和账号状态，适合 Gmail、LinkedIn 等需登录或同时操作多标签页的场景；内置浏览器是对话线程内的沙盒，无登录状态，适合前端开发、本地预览和页面标注改代码。选型：需登录用 Chrome，操作桌面应用用 Computer Use，前端开发用内置浏览器；有现成插件或 MCP 时优先用结构化工具。

宝玉@dotey · 6月17日70

这个提示词挺酷的，可以把照片变成融入了涂鸦元素的平面插图。