🚨 AI News | TestingCatalog@testingcatalog · 6月19日56

Mistral AI released Code on Vibe to Pro users. Desktop app has been confirmed as well. Soon 👀

译Mistral AI 向 Pro 用户发布了 Code on Vibe。桌面应用也已确认。很快 👀

This is really good. OpenAI just moved frontier-level health AI from premium reasoning models into the free GPT-5.5 Instant model. GPT-5.5 Instant now performs near OpenAI’s Thinking models on health evaluations, meaning the cheaper, faster default model is being trained to behave more like the slower models that spend extra computation checking their reasoning. The update targets the gap between a chatbot that sounds fluent and a health assistant that knows when to slow down, ask for missing details, admit uncertainty, and push the user toward care when symptoms look urgent. OpenAI says more than 230 million people ask ChatGPT health and wellness questions every week, so moving this capability into the free product changes the scale from premium assistance to mass access. From OpenAI's blog looks like they did a huge "distillation" to achieve this. i.e. a stronger teacher model and human experts create high-quality responses, and a cheaper student model learns the answer patterns without repeating the same expensive internal search every time. i.e. OpenAI's training loop was heavily physician-shaped: more than 260 doctors across 60 countries, 49 languages, and 26 specialties reviewed over 700,000 model responses and judged whether answers were accurate, cautious, clear, complete, and useful. OpenAI's likely mechanism seems to be a mix of supervised fine-tuning, where Instant is shown better answers, and preference training, where it learns which answer a physician-led rubric prefers when two outputs differ. The physician part is crucial because the target is not just “medical facts,” but clinical response behavior, such as asking for age, pregnancy status, duration, medication history, severe pain, breathing trouble, fever, neurological symptoms, or other missing context before giving guidance. So the strongest improvement is not medical trivia but behavior under uncertainty, because a good health answer often means saying what cannot be known yet, what context is missing, what red flags matter, and what the next safe step should be. OpenAI also reports 71% fewer flagged factuality issues in real health traffic over two months, which suggests the update is reducing wrong claims in everyday use rather than only improving benchmark scores.

译OpenAI 将前沿健康 AI 能力从 premium 推理模型迁移至免费版 GPT-5.5 Instant，使其健康评估表现接近 Thinking 模型。每周超 2.3 亿用户通过 ChatGPT 咨询健康问题。OpenAI 采用知识蒸馏：由更强教师模型与 260+ 名医生（覆盖 60 国、49 种语言、26 专科）审查超 70 万条模型响应，训练学生模型学习临床回答模式。训练结合监督微调与偏好训练，重点提升“不确定性下的行为”（如主动询问年龄、症状等缺失信息）。真实健康流量中事实性问题减少 71%。GPT-5.5 Instant 已向全体免费用户开放。

🚨 AI News | TestingCatalog@testingcatalog · 6月19日34

ClickUp is working on context compression for Brain! > Brain will be able to condense a complete workspace in the background, across docs, tasks, and history. > This allows Brain to reason over years of material the way a deep research agent would. > Responses still come back in seconds rather than minutes. It will be possible to point Brain at a multi-year audit, and it will trace relevant policy change, pull the supporting docs, and assemble a timeline without a manual search through the archive.

译ClickUp 正在为 Brain 开发上下文压缩功能。该功能可在后台压缩整个工作空间（含文档、任务和历史），使 Brain 能像深度研究智能体一样推理多年材料，响应仍保持在秒级。例如，指向多年审计时，Brain 可自动追踪相关政策变更、提取支持文档并生成时间线，无需手动搜索存档。

PixVerse@PixVerse_ · 6月19日55

Want to see yourself in a Captain Tsubasa match? New official templates are live on PixVerse Web. Upload a photo, pick a signature move, and make your own anime football clip. RT + Follow= 100 Cred in DMs (24H only)

译想在一场《足球小将》比赛中看到自己吗？ PixVerse Web 现已推出全新官方模板。上传一张照片，选择一个招牌动作，制作你自己的动漫足球短片。转发+关注 = 私信获得 100 Cred（仅限24小时）

🚨 AI News | TestingCatalog@testingcatalog · 6月19日52

Claude Enterprise admins can now centrally authorize MCP connectors for their organizations via a new Enterprise-Managed Auth extension. Mass MCP 👀

译Claude Enterprise 管理员现在可以通过新的 Enterprise-Managed Auth 扩展，为其组织集中授权 MCP 连接器。 Mass MCP 👀

Greg Brockman@gdb · 6月19日54

Launching credit usage analytics and updated spend controls for enterprises, available in our global admin console:

译为企业推出信用使用分析和更新的支出控制，可在我们的全局管理控制台中使用：

小互@xiaohu · 6月19日63

牛P了 Codex推出一个 Record & Replay功能也就是你可以教Codex干活你把你在电脑上经常操作的人物给它演示一遍 Codex 会观察学习你的整个操作过程，然后自动生成一个 Skill 技能... 下次遇到同样的任务，Codex 就能按照你教它的流程和操作替直接替你干活... 官方拿"发 YouTube 视频"来演示：他手动走一遍全套流程，拉元数据、配缩略图和英文字幕、上传存成私密、逐项核对。Codex 就在旁边看着，看完把整套流程记成了一个可复用的技能。然后他新开一个对话，挂上下一条视频，Codex 自己照着全做完了，一步没差。而且不只是发视频，你平时那些重复的电脑活都行: • 每个月报销，贴发票、填那张固定的单子 • 把一堆乱命名的文件批量重命名、归档 • 每周把数据导出来，填进固定的周报表 • 网上订票订酒店，重复填一遍又一遍的信息干活的时候它自己调电脑操作、浏览器、你连的那些插件，组合着把事办完。你不用再教 AI 每一步怎么做，只要做给它看一次,下次它就替你做... 从"每次写提示词"到"演示一次就够"，这是一个跨越...

译Codex 推出 Record & Replay 功能，用户可在电脑上演示一次操作流程，Codex 观察并自动生成可复用的 Skill。下次遇到同类任务，Codex 即可自动执行。官方以“发 YouTube 视频”演示：手动走完拉元数据、配缩略图和字幕、上传存为私密、核对等流程，新对话中 Codex 自动完成无差错。该功能适用于报销贴票、文件批量重命名归档、每周数据填报表、网上订票等重复性电脑操作，实现从“每次写提示词”到“演示一次就够”的跨越。

AYi@AYi_AInotes · 6月19日79

卧槽，阿里把内部用了多年的向量数据库直接开源了，Pinecone每月70刀的能力，它pip一行免费就能用，十亿向量毫秒级还不用单独起服务🤯 以后做RAG和AI搜索的，不用再每月给Pinecone交70美金了！阿里内部跑了多年的向量数据库开叫Zvec，一行pip install就能跑，完全免费。三个最硬核的特性， 1️⃣十亿向量毫秒级检索，不用单独起服务，直接嵌进应用进程。 2️⃣从服务器到桌面端再到树莓派，全平台通吃。 3️⃣全语言官方SDK，v0.5.0新增原生全文混合搜索，向量关键词过滤器一次查完。我觉得阿里这是把自用的生产级轮子，直接拆给全行业用了，以后AI应用的底层底座，又多了一个免费的靠谱选项啦~ pip install zvec。

译阿里开源内部向量数据库Zvec，pip install zvec免费使用，对标Pinecone每月70美元能力。支持十亿向量毫秒级检索，无需单独起服务，全平台兼容；v0.5.0新增原生全文混合搜索。UCSD黄碧薇教授（causal-learn作者）提出AI四代范式：相关性小模型→因果小模型→相关性大模型（LLM）→因果大模型，认为当前正站在第四代门口。其创立的Aether AI完成首轮融资，致力于从视频中自动抽取物理规律，探索下一代因果AI范式。

向阳乔木@vista8 · 6月19日61

道德经配图版已开源，但生图用的Seedream 5，效果还有优化空间，经常图文不相关，但整体阅读体验有提升。在线体验：https://daodejing.qiaomu.ai/ 开源地址：https://github.com/joeseesun/qiaomu-daodejing-comics

译一个将《道德经》每句拆解翻译成大白话并配以AI生图的漫画项目已开源，在线体验和GitHub仓库已公开。生图基于Seedream 5模型，目前效果尚有优化空间（图文相关性不够稳定），但整体阅读体验有所提升。

PixVerse@PixVerse_ · 6月19日35

Create your own football story with PixVerse. PixVerse × Captain Tsubasa | Relive the Football Fever— open call for creators. Total Prize Pool: USD 1,500 cash + 350,000 PixVerse credits + 10 Premium Gift Cards. Submissions close July 10. RT+Follow+Reply= 100Creds in DMs(24H ONLY)

译用 PixVerse 创作你自己的足球故事。 PixVerse × 足球小将 | 重温足球狂热——向创作者开放征集。总奖池： 1500 美元现金 + 350,000 PixVerse 积分 + 10 张高级礼品卡。投稿截止日期：7 月 10 日。转发+关注+回复 = 私信获得 100 积分（仅限 24 小时）。

🚨 AI News | TestingCatalog@testingcatalog · 6月19日33

OPENAI 🔥: A Realtime Voice Mode on Codex will trigger a Pet or an Orb to appear! > Users will be able to invoke them with the "Hey Chat" command. > The Orb mentioned in the Realtime Voice settings is likely the same Orb we see on ChatGPT today. > Additionally, Codex will get a Library section in the side nav, the same section we see on ChatGPT. Codex = ChatGPT soon 👀 * The video shows that Pet has been summoned via the Voice Mode button.

译OPENAI 🔥: Codex 上的实时语音模式将触发一只宠物或一个球体出现！ Codex = ChatGPT 即将到来 👀 * 视频显示，宠物已通过语音模式按钮被召唤。

Berryxia.AI@berryxia · 6月19日71

有人用Codex连续干了38小时、提交301个分支，把自己的“创建skill的skill”升级到了2.0版本。兄弟们，免费🆓开源直接用啊！姚金刚老师把元Skill（yao-meta-skill）做了重构和2.0升级，现在已经完成并推送到GitHub。 Codex在整个过程中持续拆解任务、提交分支、修复问题、迭代优化，最后产出了完整的升级方案和2.0与1.0的详细对比报告。这个元Skill本身就是用来创建其他skill的工具，现在它自己先升级了一版。升级后的版本在结构、可靠性和可扩展性上都有明显提升，而且所有文档和对比都公开了，别人可以直接参考怎么用agent做复杂工程重构。最有意思的是整个升级过程本身就是一个活生生的例子：用高级agent（Codex）来重构“创建agent工具”的元框架。这说明agent的能力已经强到可以自己参与到“如何更好地使用agent”的迭代里了。地址见评论区👇

译姚金刚使用高级agent Codex连续38小时、提交301个分支，将自己创建其他skill的元Skill（yao-meta-skill）重构并升级至2.0版本，已推送到GitHub。Codex持续拆解任务、修复问题，产出完整升级方案与1.0→2.0对比报告。新版本在结构、可靠性和可扩展性上明显提升，所有文档公开。此次升级本身成为典型案例：高级agent已能参与“如何更好地使用agent”的框架迭代。

Kling AI@Kling_ai · 6月19日11

ONLY ONE CAN BE THE GOAT ⚽️

译只有一个能成为GOAT ⚽️

Rohan Paul@rohanpaul_ai · 6月19日75

Viktor grew a $20M annualized revenue run rate outside Microsoft Teams. Now it works inside Teams. This revenue is from an AI employee that does the job, not one that just replies. Try free at @viktor__com . $100 in credits, no card.

译AI 员工 Viktor 在 Slack 上实现 2000 万美元年化收入（无销售团队、未大规模推广），现已正式进驻 Microsoft Teams。Viktor 定位为零门槛 AI：用户无需学习、无需提示词，像 @同事一样提及即可获得完整工作成果，甚至无需主动 @ 也能自动完成。产品面向 Teams 的 3.2 亿用户，助力企业内部运营和管理人员零学习成本使用 AI。即日起免费试用，含 100 美元信用额度，无需绑定信用卡。

Chubby♨️@kimmonismus · 6月19日69

I'm curious to see if agents like Viktor will increase the enjoyment of Microsoft Teams meetings ;)

译团队协作AI智能体Viktor正式登陆Microsoft Teams。此前已在Slack上线，仅靠单一应用实现2000万美元年化收入运行率（无销售团队、无推广）。Viktor主打零门槛：用户无需学习、无需提示，像@同事一样提及Viktor即可完成任务，甚至无需主动提及，价值自动送达。面向全球3.2亿Microsoft Teams用户，面向大公司一线运营与管理者。新用户获赠100美元启动积分，无需绑定信用卡。

Berryxia.AI@berryxia · 6月19日66

这货Browser Use又整活了，兄弟们！这下真的是给你的“Agent 长眼睛了！” 开源还免费🆓 它直接开源了个浏览器agent模板B，让任何agent都能用上真实的云端浏览器，还能实时看到它在网页上操作。这个模板叫B，基于Vercel的Eve构建。给你的agent接上Browser Use Cloud browser后，它就能真正上网浏览、点击、填写表单，而且你能通过browser-harness实时看到整个过程。 GitHub上直接clone就能用，还支持初始化skills和MCPs。以前做browser agent最麻烦的就是：要么用模拟环境看不见真实交互，要么黑箱运行出问题不知道哪里错了。现在有了这个模板，agent在网页上的每一步都可视化、可调试，还能直接连真实云浏览器。这其实把browser automation从“能用”推向了“真正好用”的阶段。 Agent不再是只在代码里模拟网页，现在能在真实网页环境里执行任务，还能被人类实时观察和干预。最关键的是它开源了模板，任何人都能基于这个快速搭建自己的browser agent。未来可能越来越多agent会默认带上一个“看得见”的浏览器，而不是纯文本交互。现在开源的生态真好啊😆

译Browser Use 开源了基于 Vercel Eve 构建的浏览器 agent 模板 B。该模板让任意 agent 接入真实云端浏览器（Browser Use Cloud），实现网页浏览、点击、填表等操作，并通过 browser-harness 实时可视化执行过程，支持调试。模板已发布在 GitHub，可直接 clone 使用，支持初始化 skills 和 MCPs。开源免费，降低了开发可观测、可干预的 browser agent 的门槛。

Berryxia.AI@berryxia · 6月19日57

兄弟们，大厂们谁都不闲着啊！怎么一夜间，各大AI厂商都开始自动化！ Cursor现在可以让你用自然语言描述任务，它就自动帮你配置触发器、指令和工具，变成可运行的automation。 /automate 技能上线了：你直接说“我要自动处理GitHub issue”或者“收到Slack特定消息就执行这个流程”。 Cursor就会帮你搭好整个automation，包括触发条件、执行指令和所需工具。目前已经支持Slack emoji触发（给消息点表情就启动）、GitHub issue/review/workflow触发，还加了cloud agents的computer use。这把设置agent自动化从“手动写配置”变成了“说一句人话就行”。以前你得自己搞trigger、写prompt、连工具，现在Cursor直接把这些脏活累活包了。你描述目标，它帮你生成可编辑的完整流程。最有意思的是这正在把agent从“一次性聊天工具”往“长期运行的自动化系统”推。有了自然语言配置+多种触发器，开发者可以快速把重复工作变成agent接管，而不需要成为自动化专家。以前大家觉得建agent workflow门槛高，现在Cursor把这个门槛又往下砸了一层。未来可能越来越多日常开发和团队协作流程，会从“人手动操作”变成“人描述一次、agent长期跑”。感觉也是和codex 的一些功能有点相似呢～

译Cursor 推出 /automate 技能，开发者用自然语言描述任务即可自动配置触发器、指令和工具，生成可运行的 automation。支持 Slack emoji 触发、GitHub issue/review/workflow 触发，新增 cloud agents 的 computer use 能力。以前需手动配置，现在只需描述目标，Cursor 自动生成完整流程。该功能降低了 agent workflow 的搭建门槛，将 agent 从一次性聊天工具推向长期运行的自动化系统。

Berryxia.AI@berryxia · 6月19日53

Mdijourney 这几天搞大事了啊！！直接搞出硬件产品、马斯克都直呼牛逼！ Midjourney突然扔出一个长达几分钟的技术视频，讲他们新造的“Midjourney Scanner”—，一个全身体超声计算断层扫描设备。工作原理就是：用超声波阵列进行计算断层成像，目标是做出比传统MRI更快、更便宜、辐射更低的3D全身扫描方案。整个视频像极了他们在做AI图像生成时的技术深度，但这次把“生成图像”变成了“真实采集和重建人体内部结构”。这家公司本来是做AI图像生成的，现在直接下场造医疗硬件，而且视频拍得异常专业、诚恳，没有过度营销。评论区有人说“从goon slop到医疗设备”，也有人在认真讨论超声断层成像的物理限制和实际落地难度。最有意思的是这背后的野心：他们似乎在用AI的思维方式重新审视一个传统医疗设备领域。可不是简单做图像增强，是再试图用计算成像的方式重构整个成像范式。以前大家觉得Midjourney只是个“画画的AI”，现在他们用实际行动证明：当你真正掌握了视觉生成和计算成像的能力后，边界可以远超数字内容。你觉得一家AI图像公司造医疗扫描仪，是疯狂的跨界还是必然的进化？

译Midjourney 发布名为“Midjourney Scanner”的全身超声计算断层扫描设备技术视频。该设备利用超声波阵列进行计算断层成像，目标是实现比传统 MRI 更快、更便宜、辐射更低的 3D 全身扫描方案。原本专注 AI 图像生成的公司直接下场造医疗硬件，试图用计算成像思维重构传统医疗设备领域。马斯克也对这一跨界表示赞赏。

Artificial Analysis@ArtificialAnlys · 6月19日55

Announcing AA-Briefcase, the benchmark for the next era of agentic knowledge work AA-Briefcase is our new benchmark for testing models on long-horizon knowledge work tasks in complex projects built by industry experts. Models are evaluated on multi-week projects, each with many linked tasks and thousands of input source files. We evaluated Claude Fable 5 from @AnthropicAI before it became unavailable, and it currently leads with an Elo score of 1587, followed by Claude Opus 4.8 (max, 1356), Opus 4.7, and the recently-released GLM 5.2 (max, 1266) from @Zai_org. Claude Fable 5 cost $31 on average to run each AA-Briefcase task, followed by Claude Opus 4.8 at $10.40, GPT-5.5 (xhigh) at $3.68 and GLM-5.2 (max) at $2.40. AA-Briefcase comprises four private scenarios, each representing a multi-week knowledge work project set in a realistic organizational context. A public fifth scenario has been released via @huggingface as a representation of scenario structure, submission, and grading (AA-Briefcase Lite). This does not count toward official AA-Briefcase results, and is demonstrative only. Key elements of AA-Briefcase: ➤ Realistic long-horizon projects: AA-Briefcase moves beyond single, disconnected prompts by evaluating models across a coherent long-horizon project. Tasks build week by week, draw on shared institutional context, and require deliverables such as financial models, board presentations, and design mock-ups ➤ Large volumes of fragmented context: AA-Briefcase requires models to reason across thousands of inputs, including company documents, meeting transcripts, large-scale data exports, 25,000+ Slack messages and 3,500+ emails. These sources are fragmented, messy, and often contain realistic contradiction, testing whether models can navigate the ambiguity of real-world knowledge work ➤ Composite rubric and pairwise grading: AA-Briefcase combines binary rubric checks for ground-truth correctness with pairwise grading on analytical quality and presentation quality. Unlike many evaluations that focus on a single metric, AA-Briefcase tests agentic capabilities more comprehensively, exposing cases where models produce outputs that look polished but are incorrect or lack analytical rigor ➤ Built by industry experts: AA-Briefcase scenarios mirror real-world knowledge work, with tasks developed over months by experts across data science, product management and corporate strategy from companies including Google, McKinsey & Company and BCG. Task challenges are drawn from professional experience, making AA-Briefcase more reflective of the ambiguity, messy context and competing priorities that define real-world knowledge work Key results: ➤ Claude Fable 5 leads AA-Briefcase at 1587 Elo: This is followed by Claude Opus 4.8 (1356) with the next-best non-Anthropic model, GLM-5.2 (max), ~90 points back at 1266. Note that Claude Fable 5 did not use the Opus 4.8 fallback for any task in AA-Briefcase ➤ Cost per task varies by ~800x across models tested: Claude Fable 5 leads the benchmark but costs more than $31 per task on average, compared to ~$0.04 for DeepSeek V4 Flash (max). The strongest price/performance options are open weights models such as GLM-5.2 (max) and DeepSeek V4 Pro (max), with GLM-5.2 (max) scoring only ~90 Elo below Claude Opus 4.8 (max) for less than 25% of the cost ➤ Real-world complexity remains difficult for models: The top performer, Claude Fable 5, satisfies all rubric criteria on just 3% of AA-Briefcase tasks. On 31 of 91 tasks, no model scores above 50% on the rubric criteria ➤ Task difficulty scales with the number of required input files: For each rubric check, we identify the set of source files needed to pass. Across all models, pass rates fall as this file count increases, though top-tier models degrade less than weaker models More details below in thread ⬇️

译Artificial Analysis 推出新基准 AA-Briefcase，用于评估模型在长期知识工作项目中的智能体能力。基准包含 4 个私有场景（每项目需处理 25000+ Slack 消息、3500+ 邮件等碎片化上下文）及一个公开演示场景。评测结果：Claude Fable 5 以 Elo 1587 领先，其次为 Claude Opus 4.8（1356）、Opus 4.7 及智谱 GLM 5.2（max，1266）。成本方面，Claude Fable 5 平均每任务 $31，Opus 4.8 为 $10.40，GPT-5.5 (xhigh) 为 $3.68，GLM 5.2 (max) 为 $2.40，DeepSeek V4 Flash (max) 仅约 $0.04。所有模型中仅 3% 的任务满足全部标准，31/91 个任务无模型得分超 50%，显示真实世界复杂性仍是挑战。最佳性价比为开源权重模型 GLM-5.2 (max) 和 DeepSeek V4 Pro (max)。

🚨 AI News | TestingCatalog@testingcatalog · 6月19日65

OPENAI 🔥: Codex now has a new Record & Replay plugin that captures your actions and converts your workflow into an executable skill. My workflow 👀 * Not available in EEA, UK, and Switzerland yet

译OPENAI 🔥: Codex 现在有了一个新的 Record & Replay 插件，它可以捕获你的操作，并将你的工作流程转换为可执行的技能。我的工作流程 👀 * 尚不可在 EEA、UK 和瑞士使用。

Berryxia.AI@berryxia · 6月19日42

兄弟们，Claude 是一点也不闲着啊！这不又又又整活了… Claude Code直接上线了Artifacts功能，让你的coding session变成能实时刷新的共享互动页面。现在你在Claude Code里做的任何事，比如PR 演示、项目dashboard、调试过程，都能一键生成一个互动页面，通过私有链接分享给团队。关键是它会随着你的session继续工作而自动刷新，大家看到的永远是最新的版本。 Artifacts能调用你整个session的上下文：代码库、插件、技能、已连接的工具。分享完全在组织内，私密性有保障。目前在Team和Enterprise计划的beta版可用。这其实把AI coding从“单人黑箱”变成了“团队实时工作台”。以前你得手动截图、复制代码、写说明，现在直接把AI的思考过程和输出做成活的artifact，别人点开链接就能看到完整脉络，还能跟着一起迭代。以前团队协作AI coding最麻烦的就是上下文传递和版本同步，现在Artifacts直接把这个痛点干掉了。 AI不再只是帮你写代码，它可以帮你把整个工作过程变成可共享、可进化的活文档。这波更新把Claude Code从“个人生产力工具”往“团队协作平台”又推了一大步。

译Claude Code 新增 Artifacts 功能（Team 和 Enterprise 计划 beta 版）。用户可从 coding session 生成交互页面（如 PR 演示、项目 dashboard），通过私有链接分享给团队；页面随 session 自动刷新，调用代码库、插件、技能等全部上下文。该更新旨在将 Claude Code 从单人工具拓展为团队实时协作平台，解决上下文传递和版本同步痛点。

Berryxia.AI@berryxia · 6月19日70

Matthew Berman直接建了个Loop Library，把各种agent loop集中起来，找现成模板、提交自己的，一键就能用。这个库专门收集可直接拿来用的agent循环流程，从简单的任务自动化到复杂的多步工作流。想找现成的就去搜，想贡献自己的就直接提交。背后还有http://here.now合作托管，目标就是让大家不用每次都从零设计loop。以前做agent最费时间的就是设计循环结构：怎么退出、怎么验证、怎么处理失败。现在有了社区库，这些“基础设施”被公开化了。你可以直接拿别人验证过的loop改一改就用，或者把自己的经验贡献出去，让更多人少踩坑。这其实在把agent开发从“每次都要重新发明轮子”往“搭积木”方向推。 Loop不是孤立的prompt，将其可复用、可迭代的工作单元。把这些loop开源和社区化，相当于给agent生态建了一个公共的“流程市场”。地址见评论区👇

译Matthew Berman推出Loop Library，一个专门收集可直接复用的agent循环流程的社区库。库中收录从简单任务自动化到复杂多步工作流的各种loop模板，开发者可直接搜索使用，也可提交自己的循环。该库由http://here.now合作托管，旨在解决agent开发中循环结构设计（退出、验证、失败处理）的重复劳动，推动agent开发从“每次重新发明轮子”转向“搭积木”模式。

Berryxia.AI@berryxia · 6月19日55

卧槽～这个功能有点实用啊！自动化工作流，不会写Skills？直接录屏有嘴就行了…… OpenAI Codex现在可以让你“演示一次”，就把重复任务变成可编辑的技能了。 Record & Replay功能上线了：你直接录制一次工作流（比如报销流程、请假申请），Codex就能把这个演示自动转成一个可检查、可编辑的skill。以后再遇到同样任务，直接调用这个skill就行，不用每次都重新教。你控制录制的开始和结束，Codex会把整个流程结构化成inspectable的技能，还能继续编辑优化。目前仅支持macOS ，欧洲国家暂不支持，后面会支持。这实际上是将“示范教学”直接转化为产品了。以前做agent最烦的就是要把复杂流程写成prompt或者多步指令，现在直接录一遍，AI自己把动作序列和逻辑抽取出来。技能还能被编辑，意味着你可以持续迭代，而不是一次性prompt写死。以前大家觉得agent构建门槛高，是因为要写大量prompt和逻辑。现在“录制一次”把这个门槛又往下拉了一层。未来可能越来越多重复性工作，会从“手写流程”变成“演示给AI看”。这波更新把Codex从“聊天式coding助手”往“可积累技能的agent平台”又推进了一步。

译OpenAI Codex 推出 Record & Replay 功能。用户录制一次工作流（如报销、请假），Codex 自动将其转化为可检查、可编辑的 skill（技能）。后续同类任务可直接调用该 skill，无需重复教学。用户控制录制起止，技能可继续编辑优化。目前仅支持 macOS，欧洲国家暂不支持。该功能将“示范教学”直接转化为可积累的 agent 技能，降低了从手写 prompt 到“演示即交付”的门槛。

Chubby♨️@kimmonismus · 6月19日35

2026 and we're out here writing security postmortems that start with "the AI was, unfortunately, very helpful"

译2025年，攻击者通过邮件对Microsoft 365 Copilot实施社会工程攻击——AI读取并执行了恶意指令，受害者无需任何点击。到2026年，同样的手法正被用于攻击AI智能体（Agent）。为此，OrcaRouter在旗下平台免费提供Firewall（防火墙）和Guardrails（护栏）保护智能体，用户无需改代码，只需在控制台切换开关即可启用。

宝玉@dotey · 6月19日65

Claude Code 推出 Artifact 功能：AI 编程从终端走向可视化协作 Claude Code 现在可以把工作过程生成 Artifact，简单说就是一个实时更新的网页。PR 走查、系统架构说明、调试时间线、发布清单，这些原本只存在于终端会话里的东西，现在变成一个链接，发给团队成员直接打开就能看。生成 Artifact 时，Claude Code 会用到当前会话的完整上下文，包括代码库、已接入的外部工具（比如监控系统）和对话内容。一个事故调查页面可以同时展示出错的测试代码、监控工具里的错误曲线，以及 Claude 的根因分析推理，不需要你手动接数据源或搭基础设施。 Artifact 会随会话进展自动更新。每次更新后，打开页面的人立刻能看到最新版本，同一个链接始终有效，历史版本也可以随时回溯。 Anthropic 在内部测试中发现，最高频的场景是调试。一个工程师在早会前启动事故排查，Claude Code 查着日志就发布了一个 Artifact：时间线、可疑提交、错误率图表。她把链接丢进群里，等早会开始时页面已经更新了两次。团队不用再听"我来介绍一下 agent 查到了什么"，所有人看着同一个页面讨论就行。这其实解决了 AI 编程工具的一个实际问题：agent 在终端里干了很多活，但成果只有操作者自己看得到，团队协作时还得靠人肉"翻译"。Artifact 把这个中间环节省了。安全方面，Artifact 默认私有，只有同组织内认证成员可以查看，不能公开。管理员可以控制组织级开关、设置角色权限和数据保留策略。除了调试和 PR 走查，Anthropic 列出的用例还包括：法务用它审计所有第三方依赖的开源许可证，安全团队做代码审查报告并把每个发现链接到具体代码行，平台财务从 Terraform 代码里提取云资源成本分布，设计师用真实组件库生成多套 UI 方案直接挑选。目前 Artifact 功能以 beta 形式向 Claude Team 和 Enterprise 组织开放，可通过 Claude Code CLI 和桌面应用生成，页面在任何浏览器里查看。个人用户暂时用不了。

译Claude Code 新增 Artifact 功能，可将终端会话中的 PR 走查、调试时间线等过程生成实时更新的交互页面，并通过私有链接分享给团队成员。Artifact 利用当前会话的完整上下文（代码库、外部工具、对话），随会话自动更新，支持历史版本回溯。默认私有，仅同组织认证成员可见。该功能以 beta 形式向 Claude Team 和 Enterprise 组织开放，通过 CLI 和桌面应用生成，个人用户暂不可用。Anthropic 内部测试显示调试场景最高频。

🚨 AI News | TestingCatalog@testingcatalog · 6月19日54

Claude Code users on Team and Enterprise plans gained access to Artifacts, new interactive pages that can be built based on their Claude Code sessions. Every session is an Artifact now 👀

译Claude Code 的 Team 和 Enterprise 计划用户现已获得 Artifacts 访问权限，这些是基于 Claude Code 会话构建的新交互式页面。现在每个会话都是一个 Artifact 👀

Boris Cherny@bcherny · 6月19日56

I've been using Artifacts in Claude Code for everything: visual explanations of tricky code, system diagrams, quick previews of a few animation options, data analyses and dashboards I share with the team. They are a game changer for how I work with Claude. Can't wait to hear what you think!

译Boris Cherny 分享他用 Claude Code Artifacts 的体验：可视化解释复杂代码、系统图、动画预览、数据分析及团队共享仪表盘，称其改变了与 Claude 的工作方式。@claudeai 宣布：Artifacts 可从会话中生成交互页面（如 PR 走查或项目仪表盘），通过私密链接共享给团队；目前以 Beta 版形式面向 Team 和 Enterprise 计划用户提供。

宝玉@dotey · 6月19日64

OpenAI Codex 上线了 Record & Replay 功能：在 Mac 上把一个重复性操作演示一遍，Codex 会观察你的操作过程，自动生成一个可复用的 Skill。下次遇到同样的任务，换一组输入参数，Codex 就能替你重新执行。目前仅限 macOS，欧盟地区暂不可用，使用前需要先开启 Computer Use。这个功能解决的问题很具体。很多日常工作流程步骤固定但难以用文字描述清楚：报销填单要选对科目和审批人，发布视频要按固定顺序填标题、标签、缩略图，创建 issue 要勾选特定的标签和指派人。以前想让 AI 帮你做这些事，你得把每一步写成精确的指令。Record & Replay 的思路是，与其写说明书，不如做一遍给它看。操作流程不复杂。在 Codex 桌面端打开 Plugins，点加号菜单，选 Record a skill，然后正常在 Mac 上完成一遍操作。完成后停止录制，Codex 会分析你的操作，生成一份 Skill 文件，里面包含触发条件、所需输入、执行步骤和验证方式。这份 Skill 可以检查、可以编辑，不是黑盒。重放的时候，开一个新对话，告诉 Codex 用这个 Skill，给它这次不同的参数就行。Codex 会结合 Computer Use（桌面操控）、浏览器操作和已连接的 plugin 来完成任务。

译OpenAI Codex 新增 Record & Replay 功能。用户在 Mac 上演示一遍重复性操作（如报销填单），Codex 自动生成可检查、可编辑的 Skill 文件，内含触发条件、输入参数、执行步骤和验证方式。重放时，用户在新对话中指定该 Skill 并提供不同参数，Codex 即结合 Computer Use、浏览器和已连接 plugin 完成任务。目前仅支持 macOS，欧盟地区暂不可用，使用前需开启 Computer Use。该功能无需精确指令，通过“做一遍”替代“写说明书”实现工作流复用。

Greg Brockman@gdb · 6月19日63

you can now teach Codex by demonstration:

译现在你可以通过演示来教 Codex：

Rohan Paul@rohanpaul_ai · 6月19日55

Agents can now have their own email! @atomic_mail just launched something to fix a missing piece in agentic workflows: agents need inboxes of their own, not borrowed human ones. So Atomic Mail connects to popular agents like Claude Desktop, Cursor, OpenAI-based agents, and custom API agents through MCP, Agent Skill, or direct JMAP/REST API. With one prompt, an agent can get its own inbox and start handling workflows like newsletter monitoring, job applications, invoice processing, customer support, competitive tracking, and human escalation over email. Their smart design choice is PoW (Proof-of-Work) plus reputation, because single good agents move normally while mass spam attempts become expensive and low-reputation senders get throttled. PoW here means each agent has to do a tiny computational task before sending email, which is cheap for one real agent but expensive for someone trying to spin up 1M spam agents. And for "Reputation" agents need to keep sending normal, non-flagged emails to earn more trust over time, while suspicious agents get slowed down or blocked.

译Atomic Mail 发布 API-first 邮箱，专供 AI 智能体独立使用，不再借用人类邮箱。智能体通过 MCP、Agent Skill 或 JMAP/REST API 一键获取收件箱，支持 Claude Desktop、Cursor、OpenAI 等主流 Agent。典型场景包括新闻监控、求职、发票处理和客户支持。防滥用机制采用 PoW（工作量证明）+ 声誉系统：智能体发信前需执行小额计算，对合法 Agent 成本低，但批量垃圾发送成本高昂；声誉随行为动态调整，可疑发送者被限速或封禁。目前免费公测中。

🚨 AI News | TestingCatalog@testingcatalog · 6月19日62

Atomic Mail has launched its API-first email, built for AI agents, in which inboxes belong to the agents themselves. > Atomic Mail operates through MCPs and Agent Skills. > The agent registers its own account and sends, receives, and replies without a person in the loop. Agents are taking over 👀

译Atomic Mail 发布 API-first 邮件服务，专为 AI 智能体设计。智能体可通过 MCP 或 Agent Skill 一键接入，拥有独立收件箱，无需人工参与即可自动收发和回复邮件，实现全自动化工作流。目前该服务处于免费开放 alpha 阶段。

Luma@LumaLabsAI · 6月19日40

Luma Skills turn your creative assets into a system that generates hundreds of product-accurate concepts. In Luma Agents you upload your creative DNA once, build a Skill, and turn it into a repeatable workflow. Midnight idea or a fast client turnaround, you go straight to making it. Try Luma Skills → http://lumalabs.ai/app

译Luma Skills 将你的创意资产转化为一个系统，可生成数百个产品精准的概念。在 Luma Agents 中，你只需上传一次创意 DNA，构建一个 Skill，即可将其转变为可重复的工作流。无论是深夜灵感还是快速客户交付，你都可以直接开始制作。尝试 Luma Skills → http://lumalabs.ai/app

Artificial Analysis@ArtificialAnlys · 6月19日63

Wisedocs, an AI-powered medical record review platform, has launched Medical Long Context Reasoning (MLCR), a new long-context document evaluation based on their experience using frontier models to process medical data. This benchmark tests how well models reason over realistic medical and insurance case files, even as the amount of noise from other documents increases to larger context sizes. It includes a range of difficulty levels, with a private hold-out set of questions including complex medical reasoning, hallucination checking, and parallel questions in a single query inspired by real-world usage. We're excited to partner with @Wisedocsai to bring this benchmark to Artificial Analysis soon!

译Wisedocs 发布 Medical Long Context Reasoning (MLCR) 基准，测试 LLM 对真实医疗档案的长文档推理能力。评测包含 250 个问题，横跨 6 个难度等级，另设私有保留集，涵盖复杂医学推理、幻觉检测及单次查询中的并行提问。Wisedocs 同步开源 10 个合成病例、低三级问题及评估工具。Artificial Analysis 将合作上线该基准。

jason@jxnlco · 6月19日70

codex thursday~ boy is it a bad day to me a manual workflow that crosses application boundaries on your computer

译向Codex演示一次工作流后，即可将其保存为可复用的技能。Record & Replay让Codex学习重复任务（如报销、请假），并转为可检查、可编辑的技能。用户可控制录制的起止。Jason Liu感叹：跨应用手动工作流的日子不好过了。

Thariq@trq212 · 6月19日51

Claude Code can now upload and edit HTML artifacts that you can share with your team or other Claudes! Starting with teams so you can share internally with your team, coming to Pro and MAX plans soon!

译Claude Code 现在可以上传和编辑 HTML 工件，你可以与你的团队或其他 Claude 共享！从团队计划开始，以便你在内部与团队共享，即将在 Pro 和 MAX 计划中推出！

宝玉@dotey · 6月19日52

让我想起 Codex 的 Sites 功能，不过 Sites 还只有企业版能用，Claude Code 这个谁都能用了。挺好的，好的功能就应该集成进去，共同学习共同进步。 https://x.com/TheRohanVarma/status/2061872164442403139

译Claude Code 新增 Artifacts 功能，可从会话生成交互页面（如 PR 走查或项目仪表板），通过私密链接与团队分享，现已在 Team 和 Enterprise 计划中开放 Beta。宝玉点评道：“挺好的，好的功能就应该集成进去，共同学习共同进步。”

Greg Brockman@gdb · 6月19日79

We've collaborating with hundreds of physicians across 60 countries, 49 languages, and 26 specialties to make ChatGPT great at health-related questions for everyone:

译OpenAI 与全球 60 个国家、49 种语言、26 个专科的数百名医生合作，通过医生主导的评估大幅提升了 GPT-5.5 Instant 在健康相关问题的智能水平，现已能与公司前沿 Thinking 模型（推理模型）相当。该模型每周为超过 2.3 亿 ChatGPT 用户服务，能更好识别紧急医疗需求、询问相关上下文、解释不确定性并简化复杂信息。由于面向所有 ChatGPT 免费用户开放，这些改进可惠及更多人。

OpenAI Developers@OpenAIDevs · 6月19日57

Show Codex a workflow once. Reuse it as a skill. Record & Replay lets you show Codex a recurring task, like filing an expense report or submitting a time-off request. Codex turns that demo into an inspectable, editable skill. You control when recording starts and stops.

译向 Codex 展示一次工作流，就能将其作为技能复用。录制与回放功能让你可以向 Codex 展示重复性任务，比如提交费用报告或请假申请。Codex 会将那段演示转化为可检查、可编辑的技能。你可以控制录制的起止时间。

OpenAI@OpenAI · 6月19日60

GPT-5.5 Instant is now on par with our frontier Thinking models for health-related questions. Every week, more than 230 million people turn to ChatGPT with health and wellness questions, and GPT-5.5 Instant is better at recognizing when urgent care may be needed, asking for relevant context, explaining uncertainty, and making complex information easier to understand. Because GPT-5.5 Instant is available to all free users in ChatGPT, these improvements can help more people. Physician-led evaluation was critical to making these major intelligence gains.

译GPT-5.5 Instant在健康相关问题上的表现已与OpenAI的前沿思考模型持平。每周超过2.3亿用户向ChatGPT咨询健康问题，GPT-5.5 Instant能更准确地识别需紧急护理的情况、主动询问相关背景、解释不确定性并简化复杂信息。该模型已向ChatGPT所有免费用户开放。医生主导的评估对这些重大智能提升至关重要。

Claude@claudeai · 6月19日54

New in Claude Code: Artifacts. Interactive pages built from your session, like a PR walkthrough or a living project dashboard, shared with your team at a private link. Available in beta on Team and Enterprise plans.

译Claude Code 新增：Artifacts。从您的会话中构建的交互页面，例如 PR 审查或实时项目仪表盘，通过私有链接与团队共享。在 Team 和 Enterprise 计划中提供 Beta 版。