1 年前 Vibe Coding 的日常

Boris Cherny@bcherny · 3小时前57

Artifacts in Claude Code have been life changing. Excited to expand to Pro and Max!

译Claude Code 中的 Artifacts 改变了我的生活。很兴奋它扩展到 Pro 和 Max！

OpenAI Developers@OpenAIDevs · 2小时前45

http://x.com/i/article/2072717544570728450 # June for OpenAI Developers June brought a lot into the loop. Here's what's new for developers building with OpenAI: You said building with Codex feels like flying. We took that literally: DevDay 2026 applications are here. Submit by July 10: Record and Replay plugin: Codex plugins now have role-specific context: The Build iOS Apps plugin brings app previews into Codex: Build with OpenAI APIs, Agents SDK, and ChatGPT Apps from Codex: Spin up a persistent cloud dev environment with the @digitalocean plugin in Codex: Codex in ChatGPT mobile is generally available: More Codex capabilities rolled out in the EEA, UK, and Switzerland: Give Codex more browser context: Codex profiles give your build stats a home: Bring OpenAI models and Codex into your AWS workflows: New docs agent on developers.openai.com: The OpenAI API got moderation scores, image results, and more: Developers are building voice-first apps with the Realtime API: We’re continuing to support OSS maintainers and the open-source ecosystem: Codex in our community workflows: For builders who want the under-the-hood details behind OpenAI products, here are a few deep dives from our team: That was June. We can't wait for you to see what's compiling for July. Follow @OpenAIDevs on X to stay up to date.

译OpenAI总结6月面向开发者的更新：DevDay 2026申请开放（截止7月10日）；Codex新增Record and Replay插件、角色上下文插件、iOS应用构建插件（含应用预览）；支持从Codex调用OpenAI API、Agents SDK和ChatGPT应用；与DigitalOcean集成实现云端开发环境；Codex在ChatGPT移动端全面上线，并扩展至欧洲经济区、英国、瑞士；新增浏览器上下文增强、开发者统计profile；AWS工作流集成；开放新版docs agent；API增加moderation评分和图像结果；Realtime API推动语音应用开发；持续支持开源维护者。

🚨 AI News | TestingCatalog@testingcatalog · 3小时前61

Claude Pro and Max users can now access Artifacts on Claude Code as well. > Interactive pages built from your session, like a PR walkthrough or a living project dashboard, shared with your team at a private link. Available in beta on Team and Enterprise plans. Everything is an Artifact 👀

译Claude Code 的 Artifacts 功能现对 Pro 和 Max 计划用户开放。用户可请求 artifact，Claude 自动编写代码并实时发布到 claude.ai，同时持续更新。页面为账户私有且完全自包含，此前该功能仅在 Team 和 Enterprise 计划中提供 beta 版本。

Epoch AI@EpochAIResearch · 3小时前61

AI appears to be finding software vulnerabilities at scale. In June 2026, 21 notable organizations disclosed ~1,500 high- and critical-severity CVEs, over 3.5× the previous monthly record set before Claude Mythos Preview's release.

译AI似乎正在大规模发现软件漏洞。 2026年6月，21家知名组织披露了约1,500个高危和严重级别CVE，是Claude Mythos预览版发布前月度记录的3.5倍多。

Yuchen Jin@Yuchenj_UW · 3小时前60

I predicted this months ago: The highest-paying jobs today may be first in line for AI disruption. GPU kernel engineers used to get million-dollar offers. Now AI agents can self hill climb, write better kernels, and top the leaderboard. (We didn’t even use Fable or GPT-5.6)

译Yuchen Jin 数月前预言高薪岗位最先遭 AI 颠覆：GPU kernel 工程师曾获百万美元 offer，如今 AI agent 可自我爬山优化、写出更优内核并登顶榜单。Databricks 使用 KDA（Kernel Design Agents）框架，在 NVIDIA SOL-ExecBench kernel leaderboard L1 单操作赛道排名第一。核心框架包括 KDA、Humanize、Omnigent：Claude 写代码，Codex 审查，agent 自主长时间运行。该工作由 Databricks 与 NVIDIA、MIT HAN Lab 合作完成。

ClaudeDevs@ClaudeDevs · 3小时前59

Artifacts in Claude Code are now also available on Pro and Max plans. Ask for an artifact, Claude writes the code, publishes it live to claude.a‍i, and updates it in real time while it keeps working. Pages are private to your account and fully self-contained.

译Claude Code 中的 Artifacts 现在也适用于 Pro 和 Max 计划。提出一个 artifact，Claude 编写代码，将其实时发布到 claude.a‍i，并在继续工作的同时实时更新。页面对你的账户私有且完全自包含。

jason@jxnlco · 3小时前54

Let’s fucking go

译开发者 @vig_xyz 分享了其使用 Codex 自动化多种工作流程：读取邮件并根据内容在 Google Drive 起草提案；自动生成合同修订建议，经律师确认后通过 computer use 填入 DocuSign；监听 Slack 反馈频道来自动修复 Bug；通宵编写单元测试以实现 100% 代码覆盖率；在 worktrees 上并行启动 6 个线程，使 PR 可独立合并。他表示难以想象回到 IDE 甚至 vim。

Rohan Paul@rohanpaul_ai · 4小时前51

This may be an extreme case but it still shows how quickly Fable 5 classifiers can reroute routine coding to Opus. The session routed 75% of its work to Opus because the new classifiers kept misreading the coding prompts here as a cybersecurity issue.

译用户 @bridgemindai 披露一次编码会话花费 $321，其中 Fable 5 仅完成 $78（约 25%），而 Opus 4.8 被回退调用完成 $242（约 75%）。原因在于 Fable 5 的新分类器将常规编码提示误判为网络安全风险，导致大部分工作自动路由到更昂贵的 Opus 模型。Anthropic 曾称仅极少数任务会触发 fallback，但该用户实际体验与此不符。

Rohan Paul@rohanpaul_ai · 6小时前65

Feels like an end of era, ordinary people will probably never again get upgraded frontier models. Fable 5’s return shows how safety routing can downgrade a frontier model. Now we only permissioned intelligence. The cost of putting a gatekeeper inside intelligence. To note, that safeguard is not a simple refusal layer; it is a classifier that sends flagged Fable 5 requests to Opus 4.8. Fable 5 came back, but the old promise did not. End of an era. ☹️

译Anthropic的Claude Fable 5（7月1日版）回归后在BridgeBench重测中表现大幅下滑：Debugging从86.2暴跌至25.9，Refactoring从73.6降至38.4，Hallucination从75.9滑落至61.7。原因是新安全护栏并非简单拒绝层，而是将标记请求路由至较弱的Opus 4.8，导致大量任务回退。Rohan Paul评论称这标志着普通人可能再也无法获得升级的前沿模型，如今只有“许可智能”。

Claude@claudeai · 6小时前35

A conversation with Boris Cherny and Cat Wu on the path from Claude Code to Claude Tag, and how it spread from engineering to the rest of Anthropic. Claude Fable 5 is now available in Claude Tag.

译与 Boris Cherny 和 Cat Wu 关于从 Claude Code 到 Claude Tag 的路径，以及它如何从工程团队扩展到 Anthropic 其他部门的对话。 Claude Fable 5 现已在 Claude Tag 中可用。

Ethan Mollick@emollick · 6小时前49

Fable in Claude Code is capable of really amazing things, including for non-coders, but the interface is not really designed for managing 5+ hour long autonomous tasks. Really hard to observe what is happening and intervene in real time, you often have to wait until the outputs.

译Fable in Claude Code 确实能做到非常惊人的事情，包括非程序员也可以用，但界面并不是为管理5小时以上的自主任务而设计的。很难实时观察发生了什么并干预，你经常必须等到输出。

jason@jxnlco · 6小时前15

About to use codex computer use to control my iPhone via screen mirroring check find my to see who’s around me and texts them.

译即将使用 codex computer use 通过屏幕镜像控制我的 iPhone，查看 Find My 了解周围有谁并给他们发短信。

Replit ⠕@Replit · 7小时前33

New in Replit: Everything We Shipped in June! https://x.com/i/broadcasts/1yGBeeAYwQMKN

译Replit新动态：我们六月发布的所有内容！https://x.com/i/broadcasts/1yGBeeAYwQMKN

Replit ⠕@Replit · 7小时前56

Fable 5 is back on Replit! Especially great for longer, harder projects. Toggle on High effort mode in Replit Agent and try it today on your toughest builds!

译Fable 5 已回归 Replit！尤其适合更长、更困难的项目。在 Replit Agent 中开启 High effort 模式，立即在你最艰巨的项目上尝试吧！

AYi@AYi_AInotes · 10小时前54

Damn！网页设计师的护城河，一夜之间又塌了一块。只用 Claude Code 加 Sonnet 5，十八分钟就能做出获奖级的完整网站，从设计感到代码完成度全部拉满。以前我们总说 AI 做的东西有模板味，上不了台面，现在模型的 Agent 能力上来之后，复杂的多步设计任务也能稳定落地，质感和完成度都跨过了专业门槛。我觉得未来倒不是说AI 一定会取代设计师，而是说以后不会用 AI 的设计师会先被会用的同行甩开一大截，因为执行层的价值正在快速坍缩，审美和判断才是接下来真正的硬通货。 https://x.com/viktoroddy/status/2072290912085123326/video/1

译推文指出，使用 Claude Code 加 Sonnet 5 仅 18 分钟就能做出获奖级的完整网站，设计与代码完成度均达到专业水准。Agent 能力提升后，复杂多步设计任务可稳定落地，质感跨越专业门槛。未来不会用 AI 的设计师将被会用的同行甩开，执行层价值加速坍缩，审美与判断力成为真正的硬通货。

eric zakariasson@ericzakariasson · 12小时前66

http://x.com/i/article/2072636402521583616 # Fable is back, here's how I use it in Cursor Fable is back in Cursor, and here's a pattern I've been exploring and some other ways I've been getting the most out of the model. ## Fable as orchestrator, Composer as workers It's easy to put everything on a smart model. Most of an agent run is reading files, writing patches, and running checks, and you don't need Fable rates for that. Instead, let Fable decide the subtasks, the order, and whether the result is done. Composer 2.5 does the scoped pieces, cheaper and faster, and can run them in parallel. Most of my chats are short Composer runs. Fable shows up less often, but those runs go longer. You can put the routing in AGENTS.md or a .cursor/rules-rule so the orchestrator agent can use it. A good brief has: - one concern - enough context that the worker doesn't re-explore the whole repo - a definition of done it can check on its own - a short report so the orchestrator can decide quickly Fable alone still makes sense when the judgment is the work, whether that's a hard design call, a gnarly bug that needs one coherent thread, or a plan that has to stay coupled. If you can't name the subtasks, keep it one agent! ## Long horizon cloud agents The use case I reach for most is ultra long horizon work on Cloud Agents. A long refactor, a multi-surface feature with a real definition of done, an investigation across a big codebase. I hand it to Fable, give it something it can check itself against, and let it run. I check in from the iOS app for status, a look at what it's doing, a nudge if it's drifted. ## Keeping up with the frontiers If you only ever run one model, you start mistaking its habits for the ceiling of what agents can do. The frontier also moves every few weeks. Rotating is how I keep "what good looks like" current, from real work rather than benchmarks.

译Eric Zakariasson 分享了 Fable 在 Cursor 中的两种用法。一是作为编排器（orchestrator），将子任务分派给 Composer 2.5 并行执行，仅在设计决策、复杂 Bug 等需要整体判断时才单独使用 Fable。有效简报需包含单一关注点、足够上下文、完成定义和简短报告。二是长时云智能体（Cloud Agents）模式，用于长期重构、多端功能或跨代码库调查，通过 iOS 应用监控并适时干预。作者还建议轮换不同模型，以保持对前沿能力的认知。

Berryxia.AI@berryxia · 17小时前46

现在Cursor里面也可以使用Fable 5 ，也不怕封号😄

译Fable 5 现已可在 Cursor 中使用，用户表示不再担心封号问题。此前群友发现 Claude 桌面端也已支持 Fable 5，但部分用户尚未看到该选项。

数字生命卡兹克@Khazix0918 · 17小时前30

这7天可能时间价值最高的事：用Claude Fable 5把你的所有的工作流、SOP、Skill、项目方案、项目代码全部优化迭代一遍。已经明显能感觉到200刀的Max账号不够烧了，1个半小时就见底了。。。于是又注册了一个号的200刀Max，用力蹬这7天。。。

译卡兹克建议将工作流、SOP、Skill、项目方案及代码全部用Claude Fable 5迭代优化。他称200刀Max账号仅1个半小时即烧完，于是又注册了一个新号，力争在7天内充分利用。

Ethan Mollick@emollick · 18小时前60

Fable, one prompt: "build an elaborate game that makes it feel like I'm a brilliant chess player without knowing anything at all about chess. It should make me feel like a grand master. Feel free to go as meta as you want but the more chess-y the better." https://game-seven-chess.netlify.app/

译Fable，一个提示词："构建一个精心设计的游戏，让我在完全不懂国际象棋的情况下，感觉自己是个出色的棋手。它应该让我感觉自己像个大师。尽情发挥元创意，但越像国际象棋越好。" https://game-seven-chess.netlify.app/

向阳乔木@vista8 · 18小时前59

Raycast 新开发的 Glaze 终于面向所有人，不再需要邀请内测。 Glaze 可以一句话开发桌面软件，这是要跟 AppStore 对抗啊。下载地址见评论区

Rohan Paul@rohanpaul_ai · 19小时前64

Godot is banning vibe coding after AI-made PRs turned review time into the bottleneck. Substantial AI-generated code will also be barred, while small aids like completion remain allowed. It's an open-source game engine, so outside contributors constantly send proposed code changes. Every PR still needs a maintainer who understands the engine deeply enough to spot risk. AI changed the cost balance because generating code got cheaper, but reviewing code stayed expensive. Reviewer capacity was already too thin, and AI-made submissions made it harder. Contributors must disclose any AI help used while writing code for a PR. Godot is also banning AI-generated text in PR discussions, issues, and proposals. imo, enforcement will be impractical, they probably will never know with certainty what was vibe coded and what was not, and that is the whole weakness of the rule. --- godotengine .org/article/contribution-policy-2026/

译Godot 开源游戏引擎发布新贡献政策，禁止“vibe coding”（AI 生成大量代码），因 AI 生成的 PR 使审阅时间成为瓶颈。大量 AI 生成代码将被禁止，仅允许代码补全等小型辅助工具。贡献者需披露是否使用 AI 辅助编写代码，同时禁止在 PR 讨论、议题、提案中使用 AI 生成文本。推文作者认为该规则执行不切实际，难以区分哪些代码是 AI 生成的。

Yuchen Jin@Yuchenj_UW · 21小时前38

Databricks ranks #1 on NVIDIA’s SOL-ExecBench kernel leaderboard, in the L1 single operation track, powered by KDA (Kernel Design Agents) 🎉 What’s crazy is: we 100% leveraged AI agents to beat the competition. This is a sneak peek at recursive self-improvement. The core frameworks we used were KDA, Humanize, and Omnigent: Claude writes code, Codex reviews. Together, they enabled agents to run autonomously for as long as possible. The key is setting up the right framework to let the agents cook. This work was driven by @leshenj15 at Databricks, in collaboration with NVIDIA and MIT HAN Lab’s @LigengZhu and @DongyunZou03 . Databricks AI is like a neolab. Join us if you’re cracked!

译Databricks 在 NVIDIA SOL-ExecBench kernel 排行榜 L1 single operation 赛道排名第一，完全依靠 AI 智能体自主运行。使用的框架是 KDA、Humanize 和 Omnigent：由 Claude 编写代码，Codex 审查代码，实现了递归自我改进。该工作由 Databricks 的 leshenj15 主导，并与 NVIDIA 及 MIT HAN Lab 的 Ligeng Zhu 和 Dongyun Zou 合作完成。

小互@xiaohu · 21小时前56

大概是这种效果 Claude code 副屏痛点是每次CC回答大段文字内容的时候太密集，看起来很费劲，或者给我方案的时候不太容易理解副屏可以将CC的回答直接转换成直观的页面给你展示，这样你能瞬间理解和预览答案还可以交互进行数据回传

译@xiaohu 开发了一个 Claude Code 副屏工具，解决 CC 回答大段文字时密集难读的问题。副屏将 CC 的回答直接转换成直观页面展示，让用户能快速理解和预览答案，并且支持交互式数据回传。

meng shao@shao__meng · 22小时前77

Skills for Design Engineers 作者 @emilkowalski 是知名设计工程师，曾在 Vercel、Linear 工作，也是 Sonner、Vaul 等流行组件的创建者。他把多年积累的一套 UI/动画原则，沉淀成设计工程师们的设计品味 Skills，让 Codex、Claude Code、Cursor 等 Coding Agents 在写 UI 和动画时，具备接近资深设计工程师的审美判断！ https://github.com/emilkowalski/skills 仓库结构：三个相互补充的 Skills 1. 先建立决策框架（emil-design-eng）主 Skill：设计工程哲学 + 动画决策框架 + 组件构建原则 2. 再审查代码（review-animations） · SKILL.md 以严格标准审查动画/动效代码，输出“Before/After/Why”表格 · STANDARDS.md 评审的数值/曲线参考表（easing、duration、spring 等） 3. 最后帮助用户精准描述动效（animation-vocabulary）词汇表：把“那个弹一下的效果”翻译成“Pop in”等专业术语核心主张：动画不是“让它动起来”，而是“让它感觉对” 1. 动画需要理由每条动画都必须回答一个问题：“它为什么要动？” 合理理由： · 空间一致性（toast 从同一方向进出） · 状态指示（按钮变形表示加载完成） · 解释关系（引导用户理解状态变化） · 防止突兀（元素突然出现/消失） · 反馈（按下按钮时 scale(0.97)）不合理理由： · “看起来很酷” + 高频出现 → 应该删除 2. 按使用频率决定动画强度 · 每天 100+ 次（快捷键、命令面板）：禁止动画 · 每天几十次（hover、列表导航）：删除或大幅简化 · 偶尔（弹窗、抽屉、toast）：标准动画 · 罕见/首次（ onboarding、反馈）：可以适当“惊喜” 最实用的技术原则 Easing：不要信默认，要用强曲线 · UI 元素进入/退出 → ease-out · 已在屏幕上的元素移动 → ease-in-out · hover / 颜色 → ease · 恒速运动 → linear · 绝对禁止 UI 动画使用 ease-in（开头慢，用户会感觉到延迟） Duration：UI 动画控制在 300ms 内 · 按钮按下反馈：100–160ms · Tooltip / 小弹层：125–200ms · 下拉框/选择器：150–250ms · 模态框/抽屉：200–500ms Physical correctness · 永远不要从 scale(0) 开始：现实中不会凭空出现。用 scale(0.95) + opacity: 0。 · Popover 从触发点缩放：transform-origin 要指向触发按钮，而不是元素中心（modals 例外）。 · 按钮按下必须有反馈：transform: scale(0.97) 是默认。性能规则 · 只动画 transform 和 opacity（GPU 层）。 · 不要用 width/height/margin/top/left 做动画。 · Framer Motion 的 x/y/scale 简写不是硬件加速的，要用完整 transform 字符串。 · 不要用父元素的 CSS 变量驱动子元素 transform（会引发样式重算风暴）。 · 预定动画用 CSS；动态/可打断的用 JS 或 Spring。打断与对称 · CSS transition 可打断、可重定向；@ keyframes 会从头开始。 · 长按/删除等场景：按下慢（2s linear），释放快（200ms ease-out），非对称时间。无障碍 · 尊重 prefers-reduced-motion：不是“全部关掉”，而是保留 opacity/颜色，移除位移动画。 · hover 动画必须加 @ media (hover: hover) and (pointer: fine)，避免触屏设备误触发。评审 Skill：如何检查代码 review-animations 设定了十条“不可妥协”的标准，并把输出格式严格化为： · transition: all 300ms > transition: transform 200ms ease-out -- 精确指定属性，避免 all 触发非 GPU 动画 · transform: scale(0) > transform: scale(0.95); opacity: 0 -- 不应凭空出现 animation-vocabulary：把模糊感受翻译成专业词这个 Skill 本质上是一个动效术语反向查询表。用户说“iOS 拉到底部会弹回去那种感觉”，它能回答“Rubber-banding”；用户说“元素从按钮里长出来”，它能回答“Origin-aware animation”。它涵盖： · 进出/序列/变换/状态过渡 · 滚动/交互反馈 · Easing / Spring / 循环/环境动画 · 打磨效果（Blur、Clip-path、Skeleton、Number ticker） · 性能术语与动画原则这对设计师和工程师的沟通、以及给 AI 下精确指令，都很有价值。

译Emil Kowalski 将多年 UI/动画原则沉淀为三个 Skill，使 Codex、Claude Code、Cursor 等 Coding Agents 具备资深设计工程师的审美判断。核心规则：动画必须有理由；每天 100+ 次的高频操作禁用动画；UI 动画控制在 300ms 内；只动画 transform 和 opacity；入口从 scale(0.95)+opacity:0 开始；尊重 prefers-reduced-motion（仅移除位移动画）。review-animations 以严格标准审查动画代码，输出 Before/After/Why 表格。animation-vocabulary 将模糊描述（如“弹一下的效果”）转为专业动效术语。

Rohan Paul@rohanpaul_ai · 1天前53

Fable 5 absolutely crushed the HTML5 physics contest, but cost 6x more than Opus 4.8 and 39× more than GLM 5.2 in that test. Test was done on atomic[.]chat, a desktop app that runs LLMs locally. The test asked 4 models to generate self-contained canvas demos with believable motion and collisions. The scenes were not simple animations because every crash needed gravity, force, timing, and contact handling. Outputs: - Fable 5: 62,158 tokens, $3.12 - GPT 5.5: 37,753 tokens, $1.14 - Opus 4.8: 22,280 tokens, $0.56 - GLM 5.2: 36,246 tokens, $0.08

译在 atomic.chat（本地 LLM 桌面应用）的 HTML5 物理竞赛中，Fable 5 以 A+ 成绩完成全部三个场景（火车脱轨、汽车空中碰撞、怪物卡车碾压），消耗 62,158 token，成本 $3.12。相比之下，Opus 4.8 消耗 22,280 token/$0.56，GPT 5.5 消耗 37,753 token/$1.14（在怪物卡车场景中略胜 Fable），GLM 5.2 消耗 36,246 token/$0.08 但未赢得任何场景。Fable 5 质量最佳但成本最高。

ginobefun@hongming731 · 1天前43

BestBlogs 早报 · 07-02 # LongCat-2.0 / 本地 AI / 美图 AI 方法论 / Google ADK 2.0 / Claude Fable 5 [1] ★ 精讲｜Ahmad Osman 谈本地 AI 为何正在追赶 http://Latent.Space 专访 Osmantic 创始人 Ahmad Osman，这位长期倡导本地 AI 的开发者在 AIEWF 办了两场爆满 workshop。他判断开源与闭源前沿模型的差距正持续缩小，目前大约落后 4 到 8 个月。他用朋友买 RTX 5090 跑 Qwen 3.5 改 RGB 灯光失败的例子说明：本地 AI 缺的不是模型，而是搜索、工具、Agent 等完整栈。他从企业主权算力和混合架构角度，讲清了为什么本地 AI 正被认真当作基础设施。来源：http://Latent.Space https://www.bestblogs.dev/article/a6371e93 [2] ★ 精讲｜美团 LongCat-2.0 正式发布：在国产算力集群上完成全流程训练与推理的万亿参数模型美团技术团队官方披露 LongCat-2.0 万亿参数 MoE 模型（总参数 1.6T，平均激活约 48B），在 5 万卡国产算力集群上完成全流程训练与推理。预训练数据超 30T tokens，月均日故障率降低 70% 以上，训练 MFU 提升 1.5 倍。SWE-bench Pro 得分 59.5，超过 GPT-5.5 与 Claude Opus 4.6。原生支持 1M 上下文，已跻身 OpenRouter 全球调用量前三。值得读在于，它是少数把国产算力、万亿 MoE、Agentic Coding 全链路讲透的工程实录。来源：美团 · 技术团队 https://www.bestblogs.dev/article/ad5a0b93 [3] ★ 精讲｜专访美图 CEO 吴欣鸿：做 AI 产品，是一场难以提前策划的游戏《智能涌现》专访美图 CEO 吴欣鸿。美图 2025 年营收 38.58 亿元、净利润 9.65 亿元（同比 +64.7%），AI 重构的影像设计收入占比从 35% 升至 76.6%。他立规矩：新产品立项到上线不超 1 个月，半年 ARR 须达 10 万美元，且老产品禁止导流。MVLAND 内测两三个月 ARR 就到 10 万美元、现已近 50 万美元。值得读在于，他用自然生长而非策划、热爱而非纯 PMF 的方法论，讲清了一家 2000 人公司如何在 AI 应用层持续跑赢。来源：智能涌现 https://www.bestblogs.dev/article/8b6cc4f7 [4] 为什么我们构建了 ADK 2.0 Google ADK 2.0 引入了一个结构化的工作流运行时，将确定性代码执行与 LLM 智能体相结合，解决了生产环境中的可靠性问题，如循环、幻觉和高成本。来源：Google Developers Blog https://www.bestblogs.dev/article/76d5c422 [5] RAG 的上下文工程：每个 RAG 答案背后的四种类型化输入本文通过上下文工程的视角重新审视单文档 RAG，展示了如何让管道中的每个组件输出类型化输入，并汇聚成一次可审计且成本高效的 LLM 调用。来源：Towards Data Science https://www.bestblogs.dev/article/33fa6204 [6] 高德 GrowLoop：构建感性对话的理性 Benchmark 高德团队提出 GrowLoop 系统，通过启发式学习与双循环协进化机制，将感性对话评判标准转化为可生长的理性 Benchmark，解决开放域对话真人感评测难题。来源：AI 前线 https://www.bestblogs.dev/article/4bedb1a9 [7] 我们团队从 AWS 迁移到 PaaS 的经历一个 7 人团队量化了 AWS 基础设施维护的隐性成本，在 3 周内迁移到 Sevalla PaaS，并每周节省 10 小时工程师时间。来源：freeCodeCamp https://www.bestblogs.dev/article/49006840 [8] 人类-AI 交互设计的 39 条原则一个包含 39 条人类-AI 交互设计原则的综合框架，按九个主题组织：概率基础、预期设定、校准信任、透明度、控制、优雅失败、共同创造、负责任自主和持续依赖。来源：UX Collective https://www.bestblogs.dev/article/2cba6a6e [9] 如何把超级个体的产能，转化成组织能力？ | AI 跃迁者调研本文深度访谈出门问问 CEO 李志飞，揭示从超级个体到超级组织的转型路径：以自研 CodeBanana 系统实现沟通与执行合一，通过全栈转型与系统设计师机制将 AI 产能转化为组织能力。来源：腾讯研究院 https://www.bestblogs.dev/article/e3a487f9 [10] AI UITester：AI Native 的 UI 自动化测试新范式｜得物技术本文介绍得物技术团队自研的 AI 原生 UI 测试工具 ai_uitester，通过 VLM 视觉驱动、LLM 用例自动生成和 AI 自愈调试，实现三端统一运维，大幅降低测试维护成本。来源：得物技术 https://www.bestblogs.dev/article/694f9d01 --- http://BestBlogs.dev · 发现真正适合你的高质量内容 BestBlogs 是 AI 驱动的私人阅读助手，帮助你发现真正适合你的高质量内容，欢迎体验。在线阅读：https://www.bestblogs.dev/explore/brief/2026-07-02

译美团发布LongCat-2.0万亿MoE模型（总参1.6T、激活48B），在5万卡国产算力集群完成全流程训练，SWE-bench Pro得分59.5超GPT-5.5与Claude Opus 4.6，原生支持1M上下文，已跻身OpenRouter全球调用量前三。本地AI倡导者Ahmad Osman称开源与闭源前沿差距缩小至4-8个月，但缺少搜索、工具等完整栈。美图2025年营收38.58亿元、净利润9.65亿元（同比+64.7%），AI收入占比升至76.6%，新产品上线不超1个月、半年ARR达10万美元。Google ADK 2.0引入结构化工作流运行时，结合代码执行与LLM智能体解决可靠性问题。

Peter Steinberger 🦞@steipete · 1天前13

How did I ever function without AI? cc chefcook @theo

译没有AI我到底是怎么活过来的？ cc chefcook @theo

Peter Steinberger 🦞@steipete · 1天前47

Pointed codex at some Twitter feedback on the OpenClaw iOS app and it did a first improvement pass. It's still not good, but for two prompts it aint bad. Especially cool how it uses computer use to add before/after screenshots, as there's no GitHub API. https://github.com/openclaw/openclaw/pull/98452

译将 Codex 指向 OpenClaw iOS 应用的一些 Twitter 反馈后，它进行了一次初步改进。虽然还不够好，但就两个提示词而言还算不错。特别酷的是它如何使用 computer use 来添加前后对比截图，因为没有 GitHub API。 https://github.com/openclaw/openclaw/pull/98452

elvis@omarsar0 · 1天前33

I really wish GPT-5.5 had a bit more "taste" in design and planning. For everything else related to code, it's the best model. I hope GPT-5.6 closes the gap. It would feel more complete then. For now, I switch to Opus 4.8/GLM-5.2 to fix design issues or when I plan.

译我真的希望 GPT-5.5 在设计和规划方面多一些“品味”。在代码相关的其他方面，它是最好的模型。我希望 GPT-5.6 能缩小差距。那样的话感觉会更完整。目前，我切换到 Opus 4.8/GLM-5.2 来修复设计问题或进行规划。

Peter Steinberger 🦞@steipete · 1天前50

Asked codex to download+transcribe all sessions from @aiDotEngineer and tailor them to my interests.

译要求 Codex 下载并转录 @aiDotEngineer 的所有会话，并根据我的兴趣进行定制。

elvis@omarsar0 · 1天前46

Great paper on managing agent skills. Skill libraries keep growing, and picking the right skills has become a bottleneck for coding agents. The defaults are to expose the agent to the whole skill collection, or retrieve skills with embeddings and rerankers. Both treat the choice as independent picks. SkillComposer treats composition as one joint decision over which skills, how many, and in what order. A constrained autoregressive decoder over skill identifiers produces the full plan in a single pass, so dependencies between successive skills fall out naturally. On SkillsBench with GPT-5.2-Codex and Gemini-3-Pro-Preview, it lifts pass rate by +23.1 and +18.2pp over no-skill, beats top-3 retrieval, and matches the gold-skill upper bound at lower prompt-token cost. Paper: https://arxiv.org/abs/2606.32025 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译论文提出SkillComposer，将代码Agent的技能选择与组合视为一次联合决策，用约束自回归解码器一次生成完整技能计划（包括技能、数量与顺序），自然处理技能间依赖。在SkillsBench上，使用GPT-5.2-Codex和Gemini-3-Pro-Preview，pass rate分别提升+23.1和+18.2个百分点，超过top-3检索，并以更低prompt token成本匹配gold-skill上界。

宝玉@dotey · 1天前25

Claude Code 重置额度了，但是我亏死了，本来就要重置的

译由于Fable 5已准备好再次构建，Claude Code重置了所有用户的5小时和每周速率限制。用户吐槽自己亏大了，因为额度本来就快重置了。

OpenCode@opencode · 1天前33

now available in OpenCode Zen

译现已在 OpenCode Zen 中上线。

Lee Robinson@leerob · 1天前63

You can now try Kimi K2.7 in Cursor! Results from our evals ↓ Interesting to see the comparison with GLM 5.2.

译You can now try Kimi K2.7 in Cursor! 我们的评测结果如下 ↓ 与 GLM 5.2 的对比很有意思。

elvis@omarsar0 · 1天前50

My prediction: the excitement for Fable 5 will wear off really fast. Reposting this to help those who will be extremely disappointed after they play with Fable 5 and run out of tokens or can't do much with it. Just a bit of advice on how to leverage a combination of AI models to get the same or better results. The best part is that there are many ways to do this now, including mixing with frontier open-weight models.

译作者预测Fable 5的兴奋感将迅速消退，并提醒用户注意token限制和功能局限。建议通过组合多个AI模型（如Opus 4.8用于规划、GPT-5.5用于执行）获得相同或更好效果，也可混合前沿开放权重模型。此外，将任务分解为更小子步骤以提升质量的方法常被低估，这正是动态工作流的重要性所在。

eric zakariasson@ericzakariasson · 1天前61

fable is so back, use it again in cursor!

译Claude Fable 5 已在 Cursor 中重新可用。它在 CursorBench 上领先所有模型，但每次任务成本最高。

Rohan Paul@rohanpaul_ai · 1天前70

Meta employees used over 60 trillion tokens in 30 days, with one user alone consumed 280 billion. that gives an average close to $50,000 per employee per year of token. - SemiAnalysis Report Most companies now set monthly caps, but the numbers vary from $250 to $4,000. Some employees barely touch those limits, while power users burn through them in days. The report estimates coding now explains over 70% of OpenAI and Anthropic ARR.

译Meta员工30天内消耗超60万亿模型token，单用户最高达2800亿，人均年token成本约5万美元。多数公司设月额度上限250-4000美元，重度用户数天用尽。编程工具贡献OpenAI和Anthropic超70%的ARR。Perplexity CEO指出，AI使用正转向重度用户：单个工程师年花费可达1000万美元于编码工具，Perplexity Computer用户月支出超1万美元，内部员工已建立多智能体循环架构。Agentic AI正从追逐海量普通用户转向服务少数高效能操作者。

OpenRouter@OpenRouter · 1天前47

Claude Fable 5 from @Anthropic is back on OpenRouter! Anthropic is redeploying it globally with new safeguards for cybersecurity misuse. Some coding and debugging requests may temporarily fall back to Opus 4.8 while classifiers are refined.

译来自 @Anthropic 的 Claude Fable 5 已回归 OpenRouter！ Anthropic 正在全球重新部署它，并针对网络安全滥用增加了新的保护措施。一些编码和调试请求可能会暂时回退到 Opus 4.8，同时分类器正在优化。

Chubby♨️@kimmonismus · 1天前67

Zai going strong: they officially launched ZCode 3.0. The new AI-native coding IDE is deeply optimized for GLM-5.2 and supports agentic software development from planning and coding to code review and deployment. • Deep GLM-5.2 integration with multi-agent collaboration • Long-running autonomous coding tasks with planning and verification • Remote control via Telegram, WeChat, and Feishu • Available on macOS, Windows, and Linux • New paid plans starting at $18/month zAI is determined to catch up with its Western competitors and put them under pressure. Love to see it!

译Zai正式发布ZCode 3.0，一款为GLM-5.2深度优化的AI原生编程IDE。支持多智能体协作，可自主执行从规划、编码到审查和部署的长期任务，并可通过Telegram、微信、飞书远程控制。GLM Coding Plan订阅用户在ZCode中享有1.5倍使用配额，同时支持BYOK（自带密钥）。覆盖macOS、Windows、Linux平台，付费计划起价$18/月。