TestingCatalog News 🗞@testingcatalog · 4月17日

Early look at Grok Build and Computer 🔥 Yes, there is a strong assumption that both will arrive at the same time as a desktop app. > Grok Build UI has 2 versions, Local and Remote. The local version uses a Grok agent running locally for execution, which only makes sense if it were a desktop app. > Grok Computer is likely a Grok Desktop app with Grok Build built in. Since it will be web-based, there is a high chance that both macOS and Windows versions will drop at the same time. > Grok Build will arrive with Connectors support, Arena mode, and Parallel Agents mode. > Grok UI for Grok Computer may get an earlier-spotted Fire animation. Grok Build local 👀

译xAI或将同步推出Grok桌面应用、Grok Build与Grok Computer。Grok Build提供Local和Remote双版本，本地版依托桌面应用运行Grok agent实现本地执行。基于网页技术的Grok Computer有望同时支持macOS与Windows，并内置Grok Build。新功能涵盖Connectors支持、Arena模式及Parallel Agents并行模式，UI可能采用Fire火焰动画效果。

Chubby♨️@kimmonismus · 4月17日

I've now spent several hours using Opus 4.7 and comparing it to 4.6, and it's like night and day for me. Opus 4.7 feels like a disgruntled employee whose results you can't judge and have to check afterward. The trust you had with 4.6 is gone. It's like hiring a new employee who had excellent grades in their application but is totally sloppy and disgruntled in practice and doesn't follow instructions. The consequence: fire them. So, for now, I'm going back to 4.6. Seriously: did not expect such release from Anthropic. Biggest win for OpenAI was Anthropics Opus release.

译资深用户强烈批评Claude Opus 4.7相比4.6质量断崖式下跌，形容其如同"不满的员工"，输出结果不可靠且需反复核查，完全丧失前代建立的信任。该用户决定放弃4.7并回退至4.6，质疑Anthropic此次发布过于仓促，并讽刺称这是OpenAI的最大胜利。

Yuchen Jin@Yuchenj_UW · 4月17日

Used Opus 4.7 (max effort) in Claude Code all day. It’s really, really good. Not sure why people dunk on it. big jump: – actually understands large codebases – produces clean, readable architecture diagrams – more agentic Did hit one dumb misread of my instruction, not sure if that’s harness or just jagged intelligence. Feels like a new base model.

译全天在 Claude Code 中使用 Opus 4.7（max effort）。真的，真的很好。不明白为什么有人要贬低它。巨大飞跃： – 真正理解大型代码库 – 生成清晰、可读的架构图 – 更具代理性确实遇到一次愚蠢的指令误读，不确定是系统限制还是智能的不均衡。感觉像是一个全新的基础模型。

karminski-牙医@karminski3 · 4月17日20

年度最佳Cursor教程 👍

译一条推文分享了被称为“年度最佳Cursor教程”的趣事。其中引用的对话显示，用户@ProbiusOfficial起初误以为Cursor界面中央的空白区域是“没用的区域”，适合用来看视频，随后被其他用户纠正“那他妈是编辑框”。该对话以幽默方式凸显了用户对AI代码编辑器Cursor界面设计或功能的不熟悉，主推文则将此互动作为反面或调侃性质的“教程”进行传播。

Tibo@thsottiaux · 4月17日

Hi! To celebrate its 1-year anniversary, I have allowed Codex to reset its own rate limits across all plans. Enjoy all the new features.

译嗨！为庆祝其一周年纪念，我已允许 Codex 重置其在所有套餐中的速率限制。享受所有新功能。

Rohan Paul@rohanpaul_ai · 4月17日

FT: The White House is moving to give major US agencies access to a modified Anthropic Mythos model built to hunt dangerous software flaws before attackers find them. That makes Mythos useful for defense because a model that can find a weakness in an operating system, browser, or server can help patch it faster. Looks like Washington is treating AI for cyber defense as too strong to ignore and too dangerous to hand out without tight control. --- ft .com/content/c9f5b690-a10e-4c66-9245-017f8bfbc7b4

译白宫拟向主要联邦机构提供Anthropic Mythos模型，用于主动猎捕软件漏洞。该模型可在攻击者之前识别操作系统、浏览器及服务器中的安全缺陷，加速修复进程。此举体现美国政府将AI网络防御视为关键战略能力，既承认其不可替代的防御价值，又强调必须通过严格管控防止技术滥用。

swyx 🐣@swyx · 4月17日

in retrospect putting the slop cannons (@_lopopolo) on @aiDotEngineer talks day 1 and putting the grown ups (@badlogicgames) on talks day 2 is working out pretty well for faithfully representing the most impt split in AI engineering right now

译回想起来，把 slop cannons（@_lopopolo）放在 @aiDotEngineer 第一天演讲，把 grown ups（@badlogicgames）放在第二天，很好地如实反映了当前 AI 工程界最重要的分歧 [引用 @chintanzalani]：科技公司将仅剩的 4 个工作。 Credits: @yrechtman

宝玉@dotey · 4月17日39

GitHub Copilot 里面 Opus 4.7 居然是 7.5x，Opus 4.6 是 3x

Sam Altman@sama · 4月17日

I am happy everyone is switching to Codex, but Tibo if you start rate limiting me or making me use worse models...

译我很高兴大家都在转向 Codex，但 Tibo，如果你开始限制我的速率或让我使用更差的模型... Codex 计算高效 ✅ 永远在线，从不宕机 ✅ 最擅长硬核工程 ✅ 超棒的应用，首个突破终端的 ✅

宝玉@dotey · 4月17日40

Codex Computer Use Mac 版本这交互确实很赞👍

Tibo@thsottiaux · 4月17日

Codex Compute efficient ✅ Always up, never down ✅ Best at hardcore engineering ✅ Crazy good app, first to escape the terminal ✅

译Codex 计算高效 ✅ 始终在线，永不宕机 ✅ 硬核工程最强 ✅ 应用超赞，首个突破终端 ✅

Thariq@trq212 · 4月17日

a quick fix if you saw higher rate limit usage in Opus 4.7 today- hope you enjoy trying it out

译如果你今天在 Opus 4.7 中看到更高的速率限制使用量，这是一个快速修复——希望你享受试用 [引用 @ClaudeDevs]：我们修复了一个 bug，Claude 订阅的速率限制在 Opus 4.7 的长上下文请求中没有正确调整。我们已重置 5 小时和每周的速率限制。享受 Opus 4.7！

Thariq@trq212 · 4月17日

We’ve heard your feedback and we’re working on making it easier to follow everything that’s happening with Claude Code. First, we’re introducing @ClaudeDevs, the official channel to follow for all updates on Claude Code and the Claude platform.

译我们听取了你们的反馈，正在努力让大家更容易跟进 Claude Code 的所有动态。首先，我们推出了 @ClaudeDevs，这是获取 Claude Code 和 Claude 平台所有更新的官方频道。 [引用 @ClaudeDevs]：面向使用 Claude 开发的开发者，来自团队的直接沟通渠道。关注以获取更新日志、API 发布、社区更新和深度解析。

Peter Steinberger 🦞@steipete · 4月17日

they: OpenClaw is so insecure look at all these GHSAs! reality: we are just an indicator of the coming storm

译他们：OpenClaw 太不安全了，看看这些 GHSA！现实是：我们只是暴风雨来临的指示器 [引用 @samsaffron]：13 年后，我们绝不会关闭 @discourse 的源代码。相反，我们在安全方面大力投入，并适应时代。上个月的发布版本有 50 个 CVE，这得益于使用 GPT 5.4 xhigh 进行的多日扫描。https://x.com/pumfleet/status/2044406553508274554

宝玉@dotey · 4月17日46

Codex 现在能做类似 Cowork 的事，还不像 Cowork 那样被沙盒限制，能做的事很多，能力挺强

宝玉@dotey · 4月17日

Codex 刚刚上线了一个重磅新功能——自带“评论模式”的应用内浏览器现在，你可以直接在代码编辑器里浏览任何网页。只需简单点点鼠标，就能快速和你的 AI Agent 进行迭代。 Codex 会自动帮你搞定所有繁琐的步骤：它能瞬间截取网页屏幕，精准抓取 DOM 元素（DOM element），然后把这些信息作为最精准的上下文，直接无缝投喂到你接下来的对话窗口中。这功能我印象中最早是 v0 上有的，没想到 codex 现在也支持了。

译Codex 推出应用内浏览器功能，支持"评论模式"。用户无需离开编辑器即可浏览网页，通过点击与 AI Agent 实时交互。系统自动捕获网页截图及 DOM 元素，将其作为精确上下文无缝投喂至对话窗口。该功能省去了切换浏览器、手动截图等繁琐步骤，既适用于前端开发调试，也支持针对文档内容的即时提问，显著提升开发效率。

宝玉@dotey · 4月17日

Boris Cherny 根据自己最近几周深度使用 Claude Opus 4.7 的经验，分享了几个实用技巧，让你也能高效发挥这款新模型的威力。首先是新上线的“自动模式”(Auto mode)。 Opus 4.7 很适合复杂且长期运行的任务，比如深度调研、代码重构或功能迭代。以前，你要么得不断地确认权限请求，要么不得已使用危险的“跳过权限”模式。现在，新推出的自动模式让 Claude 自己判断命令的安全性，自动批准执行。这意味着你不用再频繁确认，也能同时运行更多任务，效率大幅提升。如果你不喜欢用自动模式，官方还推出了一个叫做 /fewer-permission-prompts (减少权限提示) 的技能。它会自动检查历史操作，找到那些安全但经常触发权限提示的命令，并建议你加入权限白名单。这样，你就能更专注工作，不用老被权限提示打断。另一个贴心功能叫做“回顾”(Recaps)。它会为你自动总结 Claude 已经完成了哪些任务，以及下一步要做什么。这对处理长期、复杂的任务特别有帮助，哪怕你中间中断几个小时再回来，也能迅速回到节奏里。 CLI 用户还能试试“专注模式”(Focus mode)。这个模式会隐藏所有中间步骤，只呈现最终结果。如果你对 Claude 已经足够信任，不想再浪费时间看中间细节，专注模式能让你一眼看清重点，快速推进任务。 Claude 4.7 还改变了以往固定的“思考预算”机制，现在用的是一种叫“努力程度”(Configure your effort level)的设定。你可以灵活调整 Claude 花费的计算资源和时间，“低努力”意味着响应快、更省 token；“高努力”则能输出最聪明、最强大的结果。一般建议普通任务用 xhigh，特别难的用 max。这种模式能自由切换，更贴合实际需求。最后，别忘了让 Claude 验证自己的工作成果。这其实一直是提升 Claude 效果的关键，现在更重要了。比如： - 对于后端工作，确保 Claude 知道如何启动你的服务器/服务，从而进行端到端测试； - 对于前端工作，使用 Claude Chromium 浏览器扩展程序，赋予 Claude 控制你浏览器的能力； - 对于桌面应用，使用计算机使用 (computer use) 功能。就 Boris 自己而言，最近最常使用的提示词通常是这样的：“Claude 去做某某事，然后 /go”。 /go 是一个自定义技能，它会让 Claude 自动执行以下三步： 1. 使用 bash、浏览器或计算机使用功能进行端到端的自我测试。 2. 运行 /simplify (精简代码) 技能。 3. 提交一个 PR 。对于耗时较长的工作，自我验证非常重要。因为这样一来，当你回来检查任务时，你就确切地知道这些代码是真实可用的。总体来说，Opus 4.7 本身的提升已经很明显了，但如果你愿意稍微调整一下工作流程，更好地适应 Claude 的“主动性”和“智能程度”，一定会有更加明显的效率提升。希望这些技巧能帮你真正玩转 Opus 4.7！

译Boris Cherny分享Claude Opus 4.7深度使用技巧，建议启用自动模式减少权限确认，利用回顾功能追踪长任务进度，通过专注模式隐藏中间步骤，并灵活配置努力程度（xhigh/max）平衡性能与成本。关键是通过/go技能建立自动测试、代码精简与PR提交的自我验证流程，确保长时间运行任务的输出质量，从而充分发挥模型主动性，提升复杂任务处理效率。

宝玉@dotey · 4月17日

要想编程效果好，就得学会“黑话”😂

OpenAI Developers@OpenAIDevs · 4月17日

We’re adding more plugins to Codex to give it more ways to gather context and take action across your stack. New plugins include @coderabbitai, @Remotion, @CircleCI, and more.

译我们正在为 Codex 添加更多插件，让它有更多方式收集上下文并在你的技术栈中执行操作。新插件包括 @coderabbitai、@Remotion、@CircleCI 等。

AI Notkilleveryoneism Memes ⏸️@AISafetyMemes · 4月17日

3 months.

译3个月。 [引用 @arankomatsuzaki]：Anthropic 近1/3的受访人员现在认为初级工程师和研究人员可能在3个月内被 Mythos 取代

Yuchen Jin@Yuchenj_UW · 4月17日

I bet GPT-5.5 / Spud will drop within 1 hour. Developer dilemma of the day: Claude Code or Codex.

译我赌 GPT-5.5 / Spud 将在一小时内发布。今日开发者难题：Claude Code 还是 Codex。

宝玉@dotey · 4月17日45

请问有没有好用的 Ralph Loop for Codex？类似于 Claude Code 的 Ralph Wiggum Plugin https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md 用过 oh my codex，给我装了一坨 MCP，魔改了我的 codex Custom instructions，我个人很不喜欢这种。

译请问有没有好用的 Codex 的 Ralph 循环？类似于 Claude Code 的 Ralph Wiggum 插件 https://github.com/anthropics/claude-code/blob/main/plugins/ralph-wiggum/README.md 用过 oh my codex，它给我装了一堆 MCP，还魔改了我的 Codex 自定义指令，我个人很不喜欢这样。

宝玉@dotey · 4月17日

马斯克的 xAI 正在把自己变成 GPU 出租商，第一个客户是估值 500 亿美元的编程工具 Cursor。据 Business Insider 报道，Cursor 计划用 xAI 数万块 GPU 来训练其最新编程模型 Composer 2.5。这笔交易让 xAI 从纯粹的模型开发公司，变成了某种程度上的云计算服务商，和亚马逊、微软、Google 以及近年崛起的 CoreWeave 站到了同一条赛道上。为什么 xAI 突然想出租 GPU？一个关键细节透露了答案：xAI 总裁 Michael Nicolls 上周在内部备忘录中承认，公司 GPU 的模型算力利用率（MFU，衡量训练时 GPU 被有效使用的比例）低得令人尴尬，只有大约 11%。行业正常水平是 35% 到 45%。也就是说，xAI 坐拥 20 万块 Nvidia GPU、号称要扩展到 100 万块，但大部分算力其实在空转。与其闲着，不如租出去回点血。这两家公司的关系不只是买卖。今年 3 月，xAI 刚从 Cursor 挖走了两位产品工程负责人 Andrew Milich 和 Jason Ginsburg，两人现在直接向马斯克汇报。现在 xAI 又把算力卖给 Cursor，一边挖人一边做生意，关系颇为微妙。 Cursor 这边也面临不小的压力。上个月 Bloomberg 报道其正在以约 500 亿美元估值进行融资谈判，但 Anthropic 和 OpenAI 都在猛推自家的编程助手。Cursor 3 月发布的 Composer 2 是基于中国初创公司月之暗面（Moonshot AI）的开源模型微调而来，这次 Composer 2.5 选择在 xAI 的基础设施上训练，算是在算力来源上又多了一条路。 xAI 的基础设施团队最近也不太平，上周刚失去了基础设施负责人 Heinrich Küttler，SpaceX 的 Daniel Dueri 被调来接管计算基础设施。马斯克去年底在全员会上放话说 xAI 会靠更多算力打败 OpenAI 和 Anthropic，但目前看来，比起用好算力，xAI 更擅长囤算力。

译xAI计划向Cursor出租数万GPU用于训练Composer 2.5，标志其从模型开发向云计算服务转型。内部备忘录显示，xAI的GPU利用率仅11%（行业正常35-45%），20万块Nvidia GPU大量闲置，出租旨在回血。双方关系微妙，xAI刚挖走Cursor两位高管。Cursor面临激烈竞争，此前Composer 2基于Moonshot AI模型，现借xAI算力寻求突破。

Tibo@thsottiaux · 4月16日

Feeling codexy today

译今天感觉很 codexy

Deedy@deedydas · 4月16日

Opus 4.7 benchmarks colored by ranking. – Strong coding (SWE-Bench) bump – Strong Computer use bump – Strong visual reasoning (CharXiv) bump – Weak Terminal Bench bump – BrowseComp regression Slots in between 4.6 and Mythos. [Chart generated by 4.7]

译Opus 4.7 基准测试按排名着色。 – 编程（SWE-Bench）大幅提升 – 计算机使用大幅提升 – 视觉推理（CharXiv）大幅提升 – Terminal Bench 小幅提升 – BrowseComp 退步介于 4.6 和 Mythos 之间。 [图表由 4.7 生成]

ClaudeDevs@ClaudeDevs · 4月16日

For the developers building with Claude, a direct line from the team. Follow for changelogs, API releases, community updates, and deep dives.

译面向使用 Claude 开发的开发者，这是来自团队的直接沟通渠道。关注以获取更新日志、API 发布、社区更新和深度解析。

ClaudeDevs@ClaudeDevs · 4月16日

✻ Flibbertigibetting…

译✻ 喋喋不休中…

Chubby♨️@kimmonismus · 4月16日

Apple just made a quietly stunning admission: its own Siri engineers need to go back to school. According to a report from The Information, the company is sending close to 200 members of the Siri organization to a multi-week bootcamp, where they will learn how to code using AI tools like Claude Code and Codex. Roughly 60 engineers stay behind to keep core development running, and another 60 handle evaluations and safety checks. This retraining wave lands just two months before WWDC in June, where Apple plans to unveil the long-delayed, Gemini-powered Siri overhaul.

译苹果正派遣近200名Siri工程师参加为期数周的集训，学习Claude Code和Codex等AI编程工具。此次大规模再培训距WWDC仅剩两月，届时苹果计划发布基于Gemini的Siri重大升级。约60人留守核心开发，60人负责安全评估。此举被视为苹果承认其团队在AI编程技能上存在缺口，需紧急补课以赶上行业步伐。

TestingCatalog News 🗞@testingcatalog · 4月16日49

Grok Build and Grok CLI are planned to be released next week. A new Grok Code model too? 👀

译Grok Build 和 Grok CLI 计划于下周发布。新的 Grok Code 模型也要来了？👀

Thariq@trq212 · 4月16日

I edited the intro because I realized I buried the lede originally- The 1M context window is a double-edged sword. It allows Claude to do more complex tasks but it can also leads to more context pollution if you don't manage your session well. This is how you do that:

译我编辑了开头，因为我意识到我原本把重点埋没了—— 1M 上下文窗口是一把双刃剑。它让 Claude 能够处理更复杂的任务，但如果你不好好管理会话，也可能导致更多的上下文污染。方法如下： [引用 @trq212]：http://x.com/i/article/2044537014620721153

SemiAnalysis@SemiAnalysis_ · 4月16日

Makora uses LLMs to write high-performance, low-level GPU code. But the real play is treating codegen as a tailwind. Assume the models keep getting better, then build the entire platform around that.

译Makora 使用 LLMs 编写高性能、低层级的 GPU 代码。但真正的策略是将 codegen 视为顺风。假设模型持续进步，然后围绕这一点构建整个平台。

Yuchen Jin@Yuchenj_UW · 4月16日

Manage your Claude Code session like your life depends on it. The rule of thumb is: do /clear often, when starting a new task, always start a new session. (I don't do this enough..) 1M context length is good, but context rot is real, and models get dumb because of it.

译像你的生命取决于它一样管理你的 Claude Code 会话。经验法则是：经常执行 /clear，开始新任务时，务必开启新会话。（我做得还不够...） 1M 上下文长度很好，但上下文退化是真实存在的，模型会因此变蠢。

swyx 🐣@swyx · 4月16日

in the grand narrative of Meta x AI, we saw the flop (Llama 4 hurhurhur), and now we’re seeing the turn: - *more* hiring since the soup wars of 2025 - Zuck literally moved in with Alexandr and Nat and is koding again - finally GA’ed Opus-ish level model (no api, not open, but still) - bought @dps Dreamer and @peakji Manus to build the AI OS prosumer layer the MSL “river” is gonna be pretty exciting.

译在 Meta x AI 的宏大叙事中，我们看到了失败（Llama 4 hurhurhur），现在我们看到了转折： - 自 2025 年的 soup wars 以来*更多*招聘 - Zuck 真的搬去和 Alexandr 和 Nat 一起住，又开始写代码了 - 终于 GA 了 Opus 级别的模型（没有 API，不开放，但还是） - 收购了 @dps 的 Dreamer 和 @peakji 的 Manus 来构建 AI OS 的 prosumer 层 MSL "river" 将会非常令人兴奋。 [引用 @CharlesRollet1]：独家！Meta 聘请了来自 Thinking Machines Lab 的*第五位*创始成员。 Joshua Gross 是一位顶级工程师，他从零到一构建了 Thinky 的旗舰产品 Tinker。他现在领导 Meta Superintelligence Labs 的工程团队。

Tibo@thsottiaux · 4月16日49

/compact coming in Codex, we finally listened

译Codex 即将推出 /compact 功能，我们终于听取了意见

宝玉@dotey · 4月16日74

http://x.com/i/article/2044562880721248256 # 使用 Claude Code：会话管理与 100 万上下文【译】今天，我们为 /usage 命令推出了一项全新更新，旨在帮助你更清晰地了解自己在 Claude Code 中的使用情况。这个决定的背后，是我们近期与用户进行的多次深入交流。在这些交流中，我们反复听到了一个现象：大家在管理会话时的习惯可谓是五花八门。尤其是最近 Claude Code 将上下文窗口（Context Window）升级到了 100 万大关，这种差异就更明显了。你是习惯在终端里只保持一两个开着的会话？还是每次输入提示词都重新开个新会话？你通常在什么时候会用到压缩（Compact）、回溯（Rewind）或者子智能体（Subagents）？又是什么原因导致了一次糟糕的压缩呢？这里头其实大有学问。这些看似不起眼的细节，极大地影响着你使用 Claude Code 的体验。而这一切的核心，都归结于一件事：如何管理你的上下文窗口。 ## 快速科普：上下文、上下文压缩与上下文衰减所谓“上下文窗口（Context Window）”，就好比模型在生成下一次回答时，眼前能同时“看到”的所有信息。它包括了你的系统提示词（System Prompt）、到目前为止的聊天记录、每一次的工具调用（Tool Call）及其输出结果，甚至还有它读过的每一个文件。现在，Claude Code 拥有高达 100 万个词元（Token）（注释：Token 是大模型处理文本的基本单位，通常一个英文单词约为 1 个 Token，一个汉字可能占 1-2 个 Token）的超大上下文窗口。但遗憾的是，使用上下文是需要付出一点代价的，我们通常称之为上下文衰减（Context Rot）（注释：指随着对话历史越来越长，模型需要处理的信息量过大，导致其注意力分散，遗忘早期重要信息或被无关内容干扰的现象）。随着上下文越来越长，模型的表现往往会变差，这是因为它的注意力被分散到了更多的 Token 上。那些早期遗留的、已经无关紧要的内容，会开始干扰模型当前正在执行的任务。上下文窗口是有硬性容量上限的。所以，当你快要把窗口撑满时，你必须把你正在做的任务总结成一段简短的描述，然后带着这段描述在一个新的上下文窗口里继续工作。我们把这个过程称为上下文压缩（Compaction）（注释：为了腾出内存空间，将超长历史记录提炼成精简摘要的过程）。当然，你也可以随时手动触发这个压缩过程。想象一下，你刚刚让 Claude 帮你做了一件事，并且它已经完成了。现在，你的上下文里已经塞进了一些信息（比如工具调用、工具的输出结果、你给的指令）。接下来该怎么做？你可能会惊讶地发现，自己竟然有这么多种选择： - 继续（Continue） — 在同一个会话里，直接发送下一条消息 - 回溯（/rewind 或连按两次 Esc 键） — 时光倒流，退回到之前的一条消息，从那里重新开始尝试 - 清空（/clear） — 开启一个全新的会话，通常带上你从刚才对话中提炼出的简短总结 - 压缩（Compact） — 把目前的对话做个总结，然后在这个总结的基础上继续干活 - 子智能体（Subagents） — 把下一阶段的工作委派给另一个拥有自己干净上下文的 AI 智能体（AI Agent），并且只把它最终的工作结果拉取回来虽然直接“继续”是最顺理成章的反应，但其他四个选项的设定，正是为了帮你更好地管理你的上下文。 ## 什么时候该开个新会话？到底什么时候该维持一个漫长的老会话，什么时候又该另起炉灶呢？我们的经验法则是：当你开始一项新任务时，你也应该开启一个新会话。 100 万的上下文窗口，意味着你现在可以非常靠谱地完成更长、更复杂的任务。比如，让 Claude 从零开始为你搭建一个全栈应用。但有时候，你可能在做一些前后关联的任务。这时候，你需要保留一部分之前的上下文，但不是全部。举个例子，你刚写完一个新功能，现在要为它写一份使用文档。你当然可以开个新会话，但这意味着 Claude 必须把你刚才写过的所有代码文件重新读一遍——这不仅速度更慢，而且花费也更高。 ## 用“回溯”代替“纠正” 如果非要我挑出一个能代表“优秀上下文管理能力”的好习惯，那一定是用好“回溯（Rewind）”。在 Claude Code 里，双击 Esc 键（或者运行 /rewind 命令）能让你穿越回之前的任意一条消息，然后从那里重新下发提示词。至于那个节点之后发生的所有对话，都会被从上下文中彻底抛弃。在纠正 AI 的错误时，“回溯”往往是更高明的做法。举个例子：Claude 读了五个文件，尝试了一种方法，结果失败了。你的本能反应可能是在对话框里敲下：“这招不管用，换 X 方法试试。”但更聪明的做法是，回溯到它刚读完那五个文件的时刻，然后带着你刚学到的教训重新对它说：“别用 A 方法了，foo 模块根本不支持那个——直接去试 B 方法。” 你甚至可以使用“从这里开始总结（summarize from here）”的功能，让 Claude 自己把它学到的教训总结成一段“交接信息”。这感觉就像是那个刚刚踩了坑的“未来版 Claude”，给过去那个还没开始行动的自己留下了一张字条。 ## 上下文压缩 vs 全新会话当一个会话变得越来越长时，你有两种方法可以给它“减负”：使用 /compact （压缩）或者 /clear （清空并从头开始）。这两个操作听起来挺像，但实际表现大相径庭。压缩（Compact）是让模型把到目前为止的对话总结一下，然后用这份摘要替换掉冗长的历史记录。这个过程是“有损”的，意味着你把决定“什么内容重要”的权力交给了 Claude。好处是你什么都不用写，而且 Claude 在保留重要的经验教训或文件记录时，可能比你想得更周到。你也可以通过给它下达指令来掌控压缩的方向（比如：/compact 将重点放在身份验证模块的重构上，丢掉那些关于测试调试的内容）。而使用 /clear，则需要你自己写下核心要点（例如：“我们正在重构身份验证的中间件，目前的限制条件是 X，相关的重要文件是 A 和 B，而且我们已经排除了方法 Y”），然后以一个无比干净的状态重新开始。虽然这要费点劲，但由此产生的新上下文，百分百都是你认为真正相关的精华。 ## 什么样的“压缩”会翻车？如果你经常挂着超长的会话，你大概率遇到过“压缩”效果极其糟糕的情况。我们发现，这种“翻车”通常发生在一个特定的时刻：那就是大语言模型（LLM）无法预测你下一步工作方向的时候。举个例子，在一段漫长的代码调试之后，系统触发了自动压缩，把之前的排查过程总结了一番。结果你紧接着发了一句：“现在，把我们之前在 bar.ts 里看到的另一个警告也修了吧。” 可是，由于刚才的会话重点全在调试前一个 Bug 上，那个没来得及修的警告很可能早就被当成无关紧要的信息，在总结时被直接丢弃了。这是一个相当棘手的问题。因为受限于上下文衰减，模型在进行压缩的那一刻，往往是它“智商”最不在线的时候。好在有了 100 万的上下文容量，你现在有了更充裕的空间，可以主动带上“我接下来想做什么”的描述，去提前执行 /compact。 ## 子智能体与全新的上下文窗口子智能体也是一种管理上下文的绝佳手段。当你提前预知某一项工作会产生大量“阅后即焚”（以后再也用不上）的中间结果时，这招特别管用。当 Claude 通过智能体工具（Agent tool）衍生出一个子智能体时，这个小家伙会获得一个完全崭新的上下文窗口。它可以在里面肆意折腾，做多少工作都行。等到大功告成，它会把结果提炼出来，只把最终的报告交还给“父级”Claude。我们判断是否该用子智能体的“灵魂拷问”是：以后我还需要看这些工具运行的详细输出吗，还是我只想要一个最终结论？虽然 Claude Code 会在背后自动调用子智能体，但有时候你也可以非常明确地指挥它。比如，你可以对它说： - “派个子智能体去，根据下面这份规范文件，验证一下我们刚才做的工作对不对” - “派个子智能体去通读一下另一个代码库，总结出它是怎么实现身份验证流程的，然后你自己照猫画虎，在这边也实现一遍” - “派个子智能体去，根据我的 Git 修改记录，给这个新功能写份说明文档” 总而言之，当 Claude 完成了一轮回答，而你正准备发送一条新消息时，你就站在了一个决策的路口。我们期望在未来，Claude 能足够聪明，自己帮你打理好这一切。但就目前而言，熟练掌握这些决策，正是你引导 Claude 产出高质量结果的必经之路。

译Anthropic 为 Claude Code 推出 /usage 更新，核心在于管理 100 万词元上下文窗口以避免性能衰减。文章介绍了关键策略：开启新会话适用于新任务；使用“回溯”功能从历史节点重启以高效纠错；“压缩”功能自动总结历史，“清空”则需手动提炼要点；当工作产生大量中间结果时，使用“子智能体”在独立上下文中执行并仅返回结论更佳。目前，掌握这些决策是引导 Claude 产出高质量结果的关键。

Thariq@trq212 · 4月16日72

http://x.com/i/article/2044537014620721153 # Using Claude Code: Session Management & 1M Context In my recent calls with Claude Code users, one theme keeps coming up: the 1M token context window is a double-edged sword. It lets Claude Code operate autonomously for longer and handle tasks more reliably, but it also opens the door to context pollution if you're not deliberate about managing your sessions. Session management matters more than ever and there seem to be a lot of questions about it. Do you keep one session open in a terminal, or two? Start fresh with every prompt? When should you use compact, rewind, or subagents? What causes a bad compact? There’s a surprising amount of detail here that can really shape your experience with Claude Code and almost all of it comes from managing your context window. ## A Quick Primer on Context, Compaction & Context Rot The context window is everything the model can "see" at once when generating its next response. It includes your system prompt, the conversation so far, every tool call and its output, and every file that's been read. Claude Code has a context window of one million tokens. Unfortunately using context has a slight cost, which is often called context rot. Context rot is the observation that model performance degrades as context grows because attention gets spread across more tokens, and older, irrelevant content starts to distract from the current task. For our 1MM context model, we see some level of context rot happen around ~300-400k tokens, but it is highly dependent on the task- not a fast rule. Context windows are a hard cutoff, so when you’re nearing the end of the context window, you will need to summarize the task you’ve been working on into a smaller description and continue the work in a new context window, we call this compaction. You can also trigger compaction yourself. # Every Turn Is a Branching Point Say you've just asked Claude to do something and it's finished, you’ve now got some information in your context (tool calls, tool outputs, your instructions) and you have a surprising number of options for what to do next: - Continue — send another message in the same session - /rewind (esc esc) — jump back to a previous message and try again from there - /clear — start a new session, usually with a brief you've distilled from what you just learned - Compact — summarize the session so far and keep going on top of the summary - Subagents — delegate the next chunk of work to an agent with its own clean context, and only pull its result back in While the most natural is just to continue, the other four options exist to help manage your context. ## When to Start a New Session The new 1M context windows means that you can now do longer tasks more reliably, for example to have it build a full-stack app from scratch. But just because your model hasn't run out of context, it doesn't mean you shouldn't start a new session. Our general rule of thumb is when you start a new task, you should also start a new session. A grey area is when you may want to do related tasks where some of the context is still necessary, but not all. For example, writing the documentation for a feature you just implemented. While you could start a new session, Claude would have to reread the files that you just implemented, which would be slower and more expensive. Since documentation may not be a highly intelligence sensitive task, the extra context is probably worth the efficiency gain of not having to re-read the relevant files again. ## Rewinding Instead of Correcting If I had to pick one habit that signals good context management, it’s rewind. In Claude Code, double-tapping Esc(or running /rewind) lets you jump back to any previous message and re-prompt from there. The messages after that point are dropped from the context. Rewind is often the better approach to correction. For example, Claude reads five files, tries an approach, and it doesn't work. Your instinct may be to type "that didn't work, try X instead." but the better move is to rewind to just after the file reads, and re-prompt with what you learned. "Don't use approach A, the foo module doesn't expose that — go straight to B." You can also use “summarize from here” to have Claude summarize its learnings and create a handoff message, kind of like a message to the previous iteration of Claude from its future self that tried something and it didn’t work. ## Compacting vs. Fresh Sessions Once a session gets long, you have two ways to shed weight: /compact or /clear (and start fresh). They feel similar but behave very differently. Compact asks the model to summarize the conversation so far, then replaces the history with that summary. It's lossy, you're trusting Claude to decide what mattered, but you didn't have to write anything yourself and Claude might be more thorough in including important learnings or files. You can also steer it by passing instructions (/compact focus on the auth refactor, drop the test debugging). With /clear you write down what matters ("we're refactoring the auth middleware, the constraint is X, the files that matter are A and B, we've ruled out approach Y") and start clean. It's more work, but the resulting context is what you decided was relevant. ## What Causes a Bad Compact? If you run a lot of long running sessions, you might have noticed times in which compacting might be particularly bad. In this case we’ve often found that bad compacts can happen when the model can’t predict the direction your work is going. For example autocompact fires after a long debugging session and summarizes the investigation and your next message is "now fix that other warning we saw in bar.ts." But because the session was focused on debugging, the other warning might have been dropped from the summary. This is particularly difficult, because due to context rot, the model is at its least intelligent point when compacting. With one million context, you have more time to /compact proactively with a description of what you want to do. ## Subagents & Fresh Context Windows Subagents are a form of context management, useful for when you know in advance that a chunk of work will produce a lot of intermediate output you won't need again. When Claude spawns a subagent via the Agent tool, that subagent gets its own fresh context window. It can do as much work as it needs to, and then synthesize its results so only the final report comes back to the parent. The mental test we use: will I need this tool output again, or just the conclusion? While Claude Code will automatically call subagents, you may want to tell it to explicitly do this. For example, you may want to tell it to: - “Spin up a subagent to verify the result of this work based on the following spec file” - “Spin off a subagent to read through this other codebase and summarize how it implemented the auth flow, then implement it yourself in the same way” - “Spin off a subagent to write the docs on this feature based on my git changes” # Summary In summary, when Claude has ended a turn and you’re about to send a new message, you have a decision point. Overtime we expect that Claude will help you handle this itself, but for now this is one of the ways you can guide Claude's output.

译Claude Code 的百万级上下文窗口在支持长任务的同时，也带来了“上下文腐化”的风险，即模型性能可能在处理约30-40万token后开始下降。因此，有效的会话管理至关重要。关键策略包括：开启新任务时建议新建会话；对于关联任务可酌情保留上下文以提升效率；善用 `/rewind` 回退功能而非直接纠正错误，是维护上下文清洁的核心习惯。用户在每个对话轮次后，应根据情况选择继续、回退、新建会话、压缩或使用子代理。

Thariq@trq212 · 4月16日

one of my learnings in calls these past 2 weeks is that there's a surprisingly high skill ceiling in session management between rewinding, compacting proactively with a handoff message, using subagents and creating new sessions- managing your sessions can take some thought

译过去两周在通话中我学到的一点是，会话管理有着出人意料的高技能上限在回退、用 handoff 消息主动压缩、使用 subagents 和创建新会话之间——管理你的会话需要一些思考 [引用 @trq212]：http://x.com/i/article/2044537014620721153

Peter Steinberger 🦞@steipete · 4月16日

That was the case in December. 4 months and thousands of work hours later, we have a great security concept; you can go all yolo, use a sandbox (Docker or OpenShell), there are allow-lists and per-access exec allow/deny prompts. There’s hundreds of security researchers that pen-tested it.

译那是12月的情况。4个月和数千个工作小时后，我们有了一个出色的安全概念；你可以完全yolo，使用沙盒（Docker或OpenShell），有白名单和每次访问的执行允许/拒绝提示。有数百名安全研究人员对它进行了渗透测试。 [引用 @maxintechnology]：@steipete @openclaw 我不认为OpenClaw是一个参考。它实际上没有适当的安全模型。OpenClaw上的任何东西都不是安全设计的。

宝玉@dotey · 4月15日

如果是 TypeScript 技术栈，做 Agent 开发首选 pi-mono，功能强，调用方便。其次是 vercel 的 aisdk 也还可以。 claude agent sdk 不那么推荐了，主要是绑死了 claude，但目前还有一个不可替代的优势，就可以共享 Claude Max 订阅，开发阶段会比较方便，能用多久不清楚。应用层的话，electron 还是首选，稳定可靠，AI 训练预料足够多，主要问题是应用程序体积略大。但刚开始写 Agent，建议从 cli 开始写，不需要一开始就做界面，这样可以聚焦在 Agent 本身，除非你核心就是 UI。推荐一个开源的项目 craft-agents-oss，TypeScript + pi-mono + Electron + React + claude agent sdk，很好的学习参考。 https://github.com/lukilabs/craft-agents-oss/

译TypeScript技术栈开发AI Agent首选pi-mono框架，功能强大且调用便捷；次选Vercel AI SDK。Claude Agent SDK因过度绑定Claude而不被推荐，但共享Claude Max订阅是其独特优势。应用层Electron仍是首选，稳定可靠，但建议新手从CLI起步以聚焦Agent核心逻辑。推荐开源项目craft-agents-oss作为学习参考，其技术栈组合为TypeScript + pi-mono + Electron + React + claude agent sdk。

swyx 🐣@swyx · 4月15日

btw the famous slack chart is slack propaganda and everyone who cites it is legally obligated to also link to @sophiebits

译顺便说一下，那张著名的 Slack 图表是 Slack 的宣传，每个引用它的人都有法律义务同时链接到 @sophiebits [引用 @nikunj]：每次看到有人说"我可以在一个周末内 vibe code 出来"——我就会想到 Slack 的通知系统.. 把细节做好需要时间、坚持和努力。当然，很多简单的工作流会被 vibe coding 掉。也许你可以把它放进 Claude Code 里一次性把代码写对。但质量、深度和优秀的系统仍然有价值，也需要时间。你无法 vibe code 出经验。现在如此，永远如此。