TestingCatalog News 🗞@testingcatalog · 5月8日59

AVM 2 is currently in development 🚧 Historically, AVM updates are reserved to the day before Google I/O Soon? @sama 👀👀👀

译AVM 2 目前正在开发中 🚧 历史上，AVM 更新都安排在 Google I/O 的前一天快了吗？@sama 👀👀👀

TestingCatalog News 🗞@testingcatalog · 5月8日64

GOOGLE 🚨: Gemini 3.1 Flash Lite is now Generally Available! Users can also test this model on AI Studio. > Designed for ultra-low latency, high-volume tasks, and unmatched cost-efficiency, Flash-Lite is already transforming how applications are built at scale.

译GOOGLE 🚨: Gemini 3.1 Flash Lite 现已全面开放！用户也可以在 AI Studio 上测试此模型。 > Flash-Lite 专为超低延迟、高吞吐量任务和无可比拟的成本效益而设计，已经在改变大规模应用的构建方式。

Sam Altman@sama · 5月8日79

people are really starting to use voice to interact with AI, especially when they have a lot of context to dump. GPT-Realtime-2 comes to the API today; it is a pretty big step forward. (we are working on improvements to voice in chat.)

译人们真的开始用语音与AI互动了，尤其是在需要输入大量上下文时。 GPT-Realtime-2今天登陆API；这是相当大的一步前进。（我们正在改进聊天中的语音功能。）

Greg Brockman@gdb · 5月8日87

You can now just build amazing voice agents, with the GPT-Realtime-2 reasoning model in our API:

译OpenAI在API中正式推出具备GPT-5同级推理能力的GPT-Realtime-2语音模型，标志着语音智能体实现重大突破。该模型使语音智能体能作为实时协作者，在对话中动态完成聆听、推理与解决复杂任务。此次更新同时推出了GPT-Realtime-Translate和GPT-Realtime-Whisper等流式模型，共同构成了一套面向下一代语音界面的全新音频能力组合，为开发者构建卓越的实时语音交互应用提供了强大工具。

Chubby♨️@kimmonismus · 5月8日75

OpenAI just dropped three new realtime voice models: -GPT-Realtime-2 (with GPT-5-class reasoning for voice agents that can actually think mid-conversation), - GPT-Realtime-Translate (live translation across 70+ input languages), and - GPT-Realtime-Whisper (streaming speech-to-text as people talk). However, their teaser probably refers to their upcoming new Voice Mode in ChatGPT (advanced voice mode 2?)

译OpenAI近日发布了三款新的实时语音模型：GPT-Realtime-2具备近似GPT-5的推理能力，允许语音助手在对话中实时思考；GPT-Realtime-Translate支持超过70种语言的实时翻译；GPT-Realtime-Whisper则能实现流式语音转文本。与此同时，OpenAI官方通过引用推文暗示，用户期待已久的ChatGPT语音功能更新正在积极准备中，即将正式推出。这预示着ChatGPT很可能在近期迎来全新的高级语音模式，进一步提升其交互体验与应用能力。

TestingCatalog News 🗞@testingcatalog · 5月8日81

OPENAI 🚨: 3 new models are now available on OpenAI Playground and APIs. - gpt-realtime 2 - gpt-realtime-whisper - gpt-realtime-translate ChatGPT Voice Mode upgrade soon? 👀

译OpenAI在Playground和API中推出了三款新模型：GPT-Realtime-2、GPT-Realtime-Whisper和GPT-Realtime-Translate。其中，GPT-Realtime-2被描述为迄今最智能的语音模型，为语音智能体带来了GPT-5级别的推理能力，使其能作为实时协作者，在对话中聆听、推理并解决复杂问题。这些模型共同构成了一套面向下一代语音界面的新音频能力集，也预示着ChatGPT的语音模式可能即将迎来重要更新。

OpenAI@OpenAI · 5月8日86

Introducing GPT-Realtime-2 in the API: our most intelligent voice model yet, bringing GPT-5-class reasoning to voice agents. Voice agents are now real-time collaborators that can listen, reason, and solve complex problems as conversations unfold. Now available in the API alongside streaming models GPT-Realtime-Translate and GPT-Realtime-Whisper — a new set of audio capabilities for the next generation of voice interfaces.

译在API中推出GPT-Realtime-2：我们迄今为止最智能的语音模型，为语音助手带来GPT-5级别的推理能力。语音助手现已成为实时协作者，能够在对话展开时倾听、推理并解决复杂问题。现已在API中与流式模型GPT-Realtime-Translate和GPT-Realtime-Whisper同步上线——为新一代语音界面提供全新的音频功能套件。

OpenAI Developers@OpenAIDevs · 5月8日78

Voice agents are getting more capable. Here’s what’s new: • GPT-Realtime-2 for voice agents that reason and take action • GPT-Realtime-Translate enabling translation from 70 input languages into 13 output languages • GPT-Realtime-Whisper, making transcription even faster

译OpenAI通过API正式发布了新一代实时语音模型系列，显著增强了语音智能体的能力。其核心GPT-Realtime-2具备媲美GPT-5的推理水平，使语音智能体能作为实时协作者，在对话中聆听、思考并解决复杂问题。同时推出的GPT-Realtime-Translate支持70种输入语言到13种输出语言的实时翻译，GPT-Realtime-Whisper则提供了更快的语音转录速度。这一系列模型为下一代语音交互界面奠定了全新的音频能力基础。

Ant Ling@AntLingAGI · 5月7日76

Announcing Ling-2.6-1T by inclusionAI, now available on OpenRouter. 🚀 This trillion-parameter flagship instruct model is built for real-world agents. It utilizes a “fast thinking” approach to cut costs by ~75% while maintaining SOTA performance on AIME26 and SWE-bench Verified. Ideal for: - Advanced coding - Complex reasoning - Large-scale agent workflows

译inclusionAI宣布Ling-2.6-1T现已在OpenRouter上线。🚀 这款万亿参数旗舰指令模型专为现实世界智能体打造。它采用"快速思考"方法，在保持AIME26和SWE-bench Verified基准测试顶尖性能的同时，将成本降低约75%。适用于： - 高级编程 - 复杂推理 - 大规模智能体工作流

Rohan Paul@rohanpaul_ai · 5月7日65

Newly launched BACH 1.0 from @video_rebirth solved one of the hardest problems in AI video models: keeping the same character face consistent across different angles and cuts. Not just 1 nice close-up. Actual multi-shot consistency from the same identity, which is where most Image-to-Video models still break. BACH excels at facial emotion expression of characters. Overall, very cinematic direction and production-ready output. And currenly #6 in the world on Artificial Anlysis. 🧵 1.

译由@video_rebirth新推出的BACH 1.0解决了AI视频模型中最棘手的难题之一：在不同角度和镜头切换中保持同一角色面部的一致性。不仅是单个优质特写镜头。它实现了同一身份的真实多镜头一致性，而这正是多数图像转视频模型仍会失效的环节。 BACH擅长表现角色的面部情绪。整体而言，其输出极具电影感且达到制作水准。目前在Artificial Anlysis全球榜单中位列第六。 🧵 1.

Chubby♨️@kimmonismus · 5月7日66

Zyphra under 1B active parameters, AMD-Trained, big evals, look strong? Zyphra says its new ZAYA1-8B model delivers unusually high reasoning power for its size, using under 1 billion (!) active parameters while competing with much larger open-weight and proprietary systems on math, coding, and reasoning benchmarks. The interesting part is not just the model’s size, but its full-stack bet: AMD-only training infrastructure (!), new architectural choices, large-scale RL, and a test-time compute method called Markovian RSA that appears to boost hard math performance through parallel reasoning and recursive aggregation.

译Zyphra发布ZAYA1-8B模型，其活跃参数不足10亿，却在数学、编程和推理基准测试中媲美更大的开源及专有系统。其亮点不仅在于小尺寸，更在于全栈技术方案：完全基于AMD基础设施训练，采用了新的架构选择和大规模强化学习。此外，模型应用了一种名为Markovian RSA的测试时计算方法，通过并行推理和递归聚合，显著提升了复杂数学问题的解决能力。

Chubby♨️@kimmonismus · 5月7日48

Let’s go!!! Leo is a Great Leaker, so I assume we see sonnet 4.8 today!!

译来吧！！！Leo是个伟大的泄露者，所以我假设我们今天会看到sonnet 4.8！！

SenseTime@SenseTime_AI · 5月6日71

🚀 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮-𝗨𝟭 𝘂𝗽𝗱𝗮𝘁𝗲: ⚡ 𝗢𝗽𝗲𝗻-𝘀𝗼𝘂𝗿𝗰𝗲𝗱 𝗮𝗻 𝟴-𝘀𝘁𝗲𝗽 𝗱𝗶𝘀𝘁𝗶𝗹𝗹𝗲𝗱 𝗟𝗼𝗥𝗔: 100 NFE → 8 NFE, cutting H100 inference from 𝟮𝟯𝘀 𝘁𝗼 𝟮𝘀 🧩 𝗖𝗼𝗺𝗳𝘆𝗨𝗜 𝗶𝘀 𝗻𝗼𝘄 𝘀𝘂𝗽𝗽𝗼𝗿𝘁𝗲𝗱, with ready-to-run workflows for t2i, image editing, and interleaved generation Try it out 👇 https://github.com/OpenSenseNova/SenseNova-U1/

译🚀 SenseNova-U1 更新： ⚡ 开源8步蒸馏LoRA：100 NFE降至8 NFE，H100推理时间从23秒缩短至2秒 🧩 现已支持ComfyUI，提供文生图、图像编辑和交错生成的即用工作流试用链接 👇 https://github.com/OpenSenseNova/SenseNova-U1/

meng shao@shao__meng · 5月6日57

Luma Uni-1 把图像生成模型，在提示词和图像之间，加入一层显式推理，而且这层推理通过 API 对外可编程，不再是黑盒

译Luma发布的Uni-1图像生成模型在提示词与像素生成之间引入了一个显式推理步骤，用于解读创意方向并消除歧义。这一关键推理层现已通过API对外可编程开放，使其不再是黑盒，允许开发者将Uni-1作为智能基础设施集成到生产流程中。主要应用模式包括嵌入产品作为创作引擎、构建自定义多阶段工作流或开发独立工具。API提供的核心能力涵盖基于参考图像的风格或角色约束生成、在模型层面强制保持视觉一致性，以及通过自然语言指令进行精准编辑。

向阳乔木@vista8 · 5月6日63

前段时间参与了 Doubao-Seed-2.0-lite 0428 内测。这个版本升级，增加了音频理解，能同时支持图片、视频、音频、文本四种输入，成为豆包大模型家族首款全模态理解模型。除了全模态理解，据说 Agent、Coding、GUI 能力这次也都有明显提升。拿 API 做了一些测试，分享几个场景：前端动效复刻、视频Hooks建议、字幕识别等案例见后续 Thread

译Doubao-Seed-2.0-lite 0428 内测版本升级，新增音频理解功能，能同时支持图片、视频、音频和文本四种输入，成为豆包大模型家族首款全模态理解模型。该版本在 Agent、Coding 和 GUI 能力上也有明显提升。通过 API 测试，验证了其在前端动效复刻、视频Hooks建议和字幕识别等场景的应用潜力，具体案例详见后续推文线程。

Xiaomi MiMo@XiaomiMiMo · 5月6日59

MiMo V2.5 🥰🥰

译MiMo V2.5 🥰🥰 [引用 @Designarena]：突发：小米MiMo-V2.5在设计竞技场开放权重模型中综合排名第六！其Elo评分为1297，与@Kimi_Moonshot的Kimi K2.5（思考版）处于同一性能区间。祝贺@XiaomiMiMo团队发布成功！

歸藏(guizang.ai)@op7418 · 5月6日79

OpenAI 更新了 GPT-5.5 Instant 模型，现在变成了 ChatGPT 默认模型。模型提升了实时准确性和日常任务的表现，主要改进：性能优化：在法律、金融、医学等领域的幻觉率明显下降。同时，在图片理解和文档解析方面表现更好。表达风格：回答更加紧凑且聚焦要点，减少了无用的铺垫和过渡排版。简单来说就是废话变少了，之前的 5.5 版本（GPT-5.5）确实废话有点多。此外，个性化能力也得到了提升。发布状态：今天已经全量发布。ChatGPT 已经将其设为默认模型，Codex 没有更新。新功能引入：在 GPT 中引入了记忆来源功能。你可以通过控件可视化地查看 memory 来自什么地方，如果发现有问题，也可以直接编辑它。

译OpenAI 已全量发布 GPT-5.5 Instant 模型，并将其设为 ChatGPT 的默认模型。此次升级显著提升了模型的实时准确性和日常任务处理能力，特别是在法律、金融和医学等领域有效降低了幻觉率。同时，模型在图片理解与文档解析方面表现更佳。其回答风格变得更加简洁、聚焦要点，并增强了清晰度、个性化以及温暖自然的语调。此外，GPT 引入了记忆来源功能，允许用户可视化查看并编辑记忆的来源。本次更新未包含 Codex 模型。

karminski-牙医@karminski3 · 5月6日73

Google 刚刚发布了 Gemma 4系列模型的草稿专用模型! 31B Dense 搭配草稿模型速度竟然能提升3倍! 付出的代价仅仅是多花 1G 显存! 另外 Gemma4-26B 也能提升1.5x 速度, Gemma4-E4B 更是能提升3.1x 速度. 我之前给大家做过 Gemma 4 推测性解码的教程, 当时官方还没有专用草稿模型, 所以我给大家演示的是 gemma-4-31B-it-UD-Q4_K_XL 作为主模型, 然后使用 gemma-4-E2B-it-UD-Q4_K_XL 作为草稿模型, 速度可以提升 1.23x, 草稿接受率在62% 左右. 这次直接翻三倍原因很简单, 因为之前用的 gemma-4-E2B-it-UD-Q4_K_XL 即使已经是量化模型了, 大小也有3GB左右, 而这次的 gemma-4-31B-it-assistant 即使是原始精度也只有 939 MB! 而且是专门为了推测性解码优化的! 接受率也会高. 所以提速自然就明显了. 而代价也仅仅是显存中再多加载这个模型就可以了(大概1GB显存开销). 现在压力来到了 Qwen 这边, 建议 Qwen 赶紧推出 Qwen3.6-27B-assistant, 再不推出我的显卡可是要红温了, 我天天cue你们嗷! #gemma4 #qwen #gemma4assistant #推测性解码 #投机解码

译Google发布了Gemma 4系列模型的专用草稿模型，用于推测性解码优化。31B Dense模型搭配草稿模型速度提升3倍，仅增加1G显存开销；Gemma4-26B和Gemma4-E4B分别提升1.5倍和3.1倍速度。新草稿模型如gemma-4-31B-it-assistant体积仅939 MB，专门优化后接受率高，相比之前使用非专用草稿模型（如gemma-4-E2B-it-UD-Q4_K_XL）提速更明显。作者呼吁Qwen尽快推出类似优化模型（如Qwen3.6-27B-assistant），以应对高性能需求。

meng shao@shao__meng · 5月6日77

全球首个基于 Subquadratic Sparse Attention (SSA) 架构的前沿 LLM ~「SubQ」，实现 12M token 的实用上下文窗口，同时在效率上大幅领先传统 Transformer，来自 @subquadratic 技术核心突破：SSA 机制传统 Transformer 的标准注意力是全对全（all-pairs），计算复杂度为 O(n²)，导致长上下文成本爆炸。大多数 token 间的交互实际无意义，却仍需全量计算。 SSA 的创新在于内容依赖的选择（content-dependent selection）： · 每个 query 只动态挑选真正相关的 key 位置进行注意力计算。 · 实现线性缩放（linear scaling）：计算与内存成本随序列长度线性增长，而非二次方。 · 同时保留内容驱动路由与任意位置精确检索能力，避免了固定模式稀疏注意力（位置无关）、RNN/SSM（状态压缩丢失细节）或 DeepSeek DSA（selector 仍为二次方）等方案的权衡。实测效果（B200 GPU + FlashAttention-2 对比）： · 128K token：7.2× 预填充加速 · 1M token：52.2× 加速 · 成本 < Opus 的 5%，支持 12M token 上下文。训练与功能定位 SubQ 采用三阶段训练（预训练 → SFT → RL），特别强化长上下文下的可靠检索与多跳推理，针对企业真实场景（如完整代码库、长合同、跨文档研究）优化，而非仅追求基准分。功能定位：解决“名义上下文窗口”（能塞多少 token） vs “功能上下文窗口”（能有效利用多少 token）的鸿沟。适用于 Coding Agent、长期 Agent 会话、企业知识库等需要“一次性看全”而非 RAG/分块的场景。 SubQ Code 也可以申请试用，我也刚刚申请，期待通过后再做具体分享。申请链接在这： https://subq.ai/request-early-access

译前沿模型SubQ基于创新的Subquadratic Sparse Attention架构，实现了1200万token的实用上下文窗口。其核心技术SSA通过内容依赖的选择机制，让每个查询仅动态计算与相关键的注意力，使计算和内存成本随序列长度线性增长，而非传统Transformer的二次方增长。实测在100万token时比FlashAttention-2快52.2倍，成本低于Opus的5%。该模型针对需要一次性处理完整代码库、长文档等企业真实长上下文场景优化，旨在弥合“名义上下文”与“功能上下文”窗口的差距。

Berryxia.AI@berryxia · 5月6日66

Gemma 4 现在最高能跑到 3倍速度，而且质量完全不变。他们没有增加参数、没有换新架构，只是推出了一套 MTP drafters（多 token 预测草稿机），让模型一次预测多个 token，彻底绕过了传统 autoregressive 一个词接一个词的串行瓶颈。 GPU 不再傻等，它开始“预判”了。这意味着：本地部署实时性大幅提升 Agent、代码生成、实时翻译这些场景直接起飞开源模型在性价比上的优势又被拉大一截 Google 这次玩的不是参数战，只是把硬件利用率直接压榨到极致。当闭源模型还在拼“谁更聪明”的时候，开源已经在拼“谁更快、更便宜、还能本地跑”了。博客在这里👉 https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4/

译Google通过为Gemma 4引入MTP drafters（多token预测草稿机），在不增加参数、不改变架构和模型质量的前提下，实现了最高3倍的推理速度提升。该技术让模型能一次预测多个token，突破了传统自回归解码的串行瓶颈，极大提升了GPU利用率。这显著增强了本地部署的实时性，并使Agent、代码生成等场景受益，进一步放大了开源模型在性价比和本地运行方面的优势。

Berryxia.AI@berryxia · 5月6日67

今天这个tts有点东西啊！所有TTS都在卷“声音有多像人”， Inworld AI却直接把规则改了： Realtime TTS-2 是第一个真正“会听”的实时语音模型。它不只是说，它会实时听完整段对话、捕捉情绪、语气、节奏，然后决定“该怎么说”。更狠的是： - 支持自然语言语音指令（像prompt LLM一样指挥声音） - 同一个声音身份横跨100+语言，切换不换人 - 还能用一段文字描述就生成全新声音，保存后直接复用这已经不是“语音输出”，而是“会倾听、会共情、会适配”的实时对话伙伴。过去语音AI听起来再真，也总像在背台词。现在，它终于开始“像一个真正注意你的人”那样说话了。试用链接👉 https://inworld.ai/tts

译Inworld AI 发布了新一代实时对话语音模型 Realtime TTS-2，突破了传统TTS仅追求拟人化的竞争框架。该模型能够实时倾听完整对话，捕捉情绪、语气与节奏，动态决定回应方式，成为一个“会倾听、会共情、会适配”的对话伙伴。其关键创新包括：支持用自然语言语音指令像指挥大语言模型一样调整声音；同一声音身份可跨100多种语言保持一致性；还能通过文字描述生成全新声音并保存复用。这标志着语音AI从机械的“语音输出”迈向更贴近真人互动的“实时对话”时代。

Berryxia.AI@berryxia · 5月6日75

O社的 GPT-5.5 Instant 开始在 ChatGPT 中推出。这是一个重大升级，让你获得更智能、更清晰、更个性化的回答，语气更温暖、更自然。

ginobefun@hongming731 · 5月6日63

#BestBlogs 每日早报 2026-05-06 核心主题：GPT-5.5 Instant / AI 代码审查瓶颈 / Wilkinson 自治 CEO / Agent Harness / Stripe Proto Dash

Rohan Paul@rohanpaul_ai · 5月6日76

OpenAI just made GPT-5.5 Instant the default ChatGPT model, with fewer false claims, shorter answers, stronger image and STEM handling, and deeper personalization from memory, files, past chats, and connected Gmail. 52.5% fewer hallucinated claims than GPT-5.3 Instant on high-stakes medicine, law, and finance prompts, plus 37.3% fewer inaccurate claims on difficult conversations users had already flagged for factual errors. The model also uses 30.2% fewer words and 29.2% fewer lines in one comparison, which means OpenAI is tuning for answers that explain enough without burying the user in structure. Also, ChatGPT can now pull useful context from saved memories, past chats, files, and Gmail when that context improves the answer.

译OpenAI已将GPT-5.5 Instant设置为ChatGPT的默认模型。该模型在医学、法律和金融等关键提示上的虚假陈述比前代减少52.5%，在用户标记过事实错误的困难对话中不准确陈述降低37.3%。回答更简洁，用词和行数减少约30%。同时，模型增强了图像和STEM处理能力，并能从记忆、过往对话、文件及连接的Gmail中提取上下文以实现深度个性化。OpenAI表示，此次升级旨在提供更智能、清晰、个性化且语气自然温暖的回答。

Eric@ericmitchellai · 5月6日43

big if true (actually small)

译如果是真的就厉害了（其实很小）

Chubby♨️@kimmonismus · 5月6日66

Really really cool: Sub-200ms TTFA is the number that matters. Anything above ~300ms in a voice agent and you can feel the lag. Everything else is downstream of that.

译主推文强调语音代理的首次音频响应时间（TTFA）低于200毫秒至关重要，超过300毫秒即可感知延迟。引用推文介绍了专为实时对话设计的Realtime TTS-2新一代语音模型，该模型能理解对话内容、接受自然语言语音指令、在超过100种语言中保持同一声音身份，并能模拟人类专注的说话方式，最终实现听觉与体验俱佳的语音AI效果。

ChatGPT@ChatGPTapp · 5月6日73

High fives to all our users on the new Instant model.

译向我们所有使用新即时模型的用户致以热烈的祝贺。

Sam Altman@sama · 5月6日49

in particular, the combination of improvements to speed, intelligence, personality, and great memory/personalization feels like a more-than-sum-of-the-parts thing when it all hits together

译特别是当速度、智能、个性以及强大的记忆/个性化功能这些改进结合在一起时，它们共同作用所产生的效果，感觉像是超越了各部分简单相加的总和 [引用 @sama]：chatgpt中的新即时模型真是太棒了如果你已经有一段时间只使用思考模型，不妨试试它！

Greg Brockman@gdb · 5月6日91

Major ChatGPT upgrade rolling out now, in the form of GPT-5.5 Instant:

译重大ChatGPT升级现正推出，形式为GPT-5.5 Instant：这是一次重大升级，以更温暖自然的语调提供更智能、更清晰、更个性化的答案。同时它也更简洁，这正是我们所了解到的用户需求。我们相信你会喜欢与它对话。

Elon Musk@elonmusk · 5月6日83

Grok 4.3

译Grok 4.3 [引用 @xai]：Grok 4.3 现已在 xAI API 上线。这是我们迄今为止最快、最智能的模型。它在 @ArtificialAnlys 排行榜的智能体工具调用和指令遵循方面位居榜首，并在 @ValsAI 的企业领域（如案例法和公司金融）中排名第一。 Grok 4.3 支持 100 万 token 的上下文窗口，定价为输入每百万 token 1.25 美元，输出每百万 token 2.50 美元。创建 API 密钥并开始构建：http://console.x.ai/team/default/api-keys

TestingCatalog News 🗞@testingcatalog · 5月6日71

Google released Multi-Token Prediction (MTP) drafters for the Gemma 4 family. It comes with a 3x speed boost without losing performance. Looking forward to testing a quantized Gemma 4 with MTP drafters on a Mac Mini!

译Google为Gemma 4系列发布了多令牌预测（MTP）草稿模型。它在不损失性能的情况下带来了3倍的速度提升。期待在Mac Mini上测试带有MTP草稿模型的量化版Gemma 4！

TestingCatalog News 🗞@testingcatalog · 5月6日77

OPENAI 🚨: GPT-5.5 Instant is rolling out to all users on ChatGPT! "gpt-5.5-chat-latest" is coming to APIs as well. > Much more concise. Better memory. More personalized. Instant testing time 👀

译OPENAI 🚨: GPT-5.5 Instant 正在向所有 ChatGPT 用户推出！"gpt-5.5-chat-latest" 也将进入 API。 > 更加简洁。更好的记忆。更个性化。即时测试时间 👀

Sam Altman@sama · 5月6日69

5.5 instant comes to ChatGPT today! imo it is a pretty big upgrade, i really like using it.

译5.5 instant 今日登陆 ChatGPT！在我看来这是一个相当大的升级，我真的很喜欢使用它。 [引用 @ericmitchellai]：Excited that we're updating the default model in ChatGPT today! 5.5 instant 在智能、图像感知和事实准确性方面都有显著提升。它还更新了写作风格，使其更平实、更直接。你的愿望清单上有什么？

宝玉@dotey · 5月6日75

Google 为自家开源模型 Gemma 4 发布了 MTP drafter（多 token 预测草稿模型），推理速度最高提升 3 倍，输出质量保持不变。https://x.com/googledevs/status/2051700599184629994/video/1 Gemma 4 是 Google 几周前发布的开源模型系列，从手机端的 E2B、E4B 一直到工作站的 26B MoE 和 31B Dense，官方称上线几周下载量已经突破 6000 万。MTP drafter 用的是 speculative decoding（推测解码）：让一个轻量级的小模型先“猜”出接下来好几个 token，再让大模型一次性并行验证，验证通过的部分一口气全部输出。这套机制对本地跑模型的场景特别有用。LLM 推理之所以慢，瓶颈往往不在算力，而在内存带宽，处理器大部分时间都在把几十亿参数从显存搬到计算单元，只为了挤出下一个 token。推测解码把闲置算力利用起来，让小模型一次预测多个 token，大模型只做验证，等于把流水线拉满。实际效果上，在 Apple Silicon 跑 26B MoE 模型，批量大小开到 4 到 8 时本地能拿到约 2.2 倍提速。因为最终验证仍由大模型完成，输出和原版逐字一致，没有质量取舍。 drafter 沿用 Gemma 4 的 Apache 2.0 协议，权重已经上传到 Hugging Face 和 Kaggle，transformers、MLX、vLLM、SGLang、Ollama 都已支持。

译Google为其开源模型Gemma 4推出MTP drafter（多token预测草稿模型），采用推测解码技术，能在保持输出质量不变的前提下，将推理速度最高提升3倍。该技术利用轻量级小模型预先推测多个token，再由大模型并行验证，从而显著提高吞吐效率，尤其有利于缓解本地部署时的内存带宽瓶颈。例如，在Apple Silicon上运行26B MoE模型时，批量处理可获得约2.2倍加速。模型沿用Apache 2.0协议，权重已开源，并获主流推理框架支持。

ChatGPT@ChatGPTapp · 5月6日83

GPT-5.5 Instant is starting to roll out to everyone in ChatGPT. Much more concise. Better memory. More personalized. And it's way easier to talk to. Really.

译GPT-5.5 Instant 开始向所有 ChatGPT 用户推出。更简洁。记忆更佳。更个性化。而且对话体验顺畅得多。真的。

OpenAI@OpenAI · 5月6日86

GPT-5.5 Instant is starting to roll out in ChatGPT. It’s a big upgrade, giving you smarter, clearer, and more personalized answers in a warmer, more natural tone. And it's also more concise, which we heard you wanted. We think you'll love chatting with it.

译GPT-5.5 Instant 正在 ChatGPT 中逐步推出。这是一次重大升级，以更温暖、更自然的语调为您提供更智能、更清晰、更个性化的答案。同时它也更加简洁，这正是我们所了解到的用户需求。我们相信您会喜欢与它对话。

Eric@ericmitchellai · 5月6日82

Excited that we're updating the default model in ChatGPT today! 5.5 instant is a substantial improvement in intelligence, image perception, and factuality. It also updates the writing style to be a bit plainer and more straightforward. What was on your wishlist?

译OpenAI宣布将ChatGPT的默认模型更新为GPT-5.5 Instant。新版模型在智能水平、图像理解能力和事实准确性方面均有显著提升。其回应风格变得更简洁、直接和自然，同时提供更个性化和清晰的答案。此次升级基于用户反馈，旨在提供更优质的对话体验。

Chubby♨️@kimmonismus · 5月6日83

Nice, big update: OpenAI is rolling out GPT-5.5 Instant in ChatGPT as the new default model (very good jumps in benchmark) The upgrade makes ChatGPT smarter, more factual, more dependable, and better at everyday tasks like image analysis, STEM questions, writing, and high-accuracy domains such as medicine, law, and finance. The bigger shift is personalization: ChatGPT can now use saved memories, past chats, files, and connected Gmail context more effectively, while showing users which memory sources influenced a response. GPT-5.5 Instant will roll out to all ChatGPT users over the next two days, while personalization improvements are coming first to Plus and Pro users on web, with mobile following soon. In the API, it will be available as gpt-5.5-chat-latest.

译OpenAI正式将GPT-5.5 Instant设置为ChatGPT的新默认模型，该模型在基准测试中表现大幅提升，变得更智能、准确和可靠。其在图像分析、STEM、写作及医学、法律等高精度领域能力增强。核心升级在于个性化功能，能有效利用用户保存的记忆、过往聊天、文件和Gmail上下文，并展示影响回复的记忆来源。该模型将在未来两天内向所有用户推出，个性化改进优先面向网页版Plus和Pro用户，移动版随后跟进；API版本为gpt-5.5-chat-latest。官方表示，升级后的模型能提供更智能、清晰、个性化的答案，语气温暖自然且更简洁。

xAI@xai · 5月6日80

Grok 4.3 is now live on the xAI API. It’s our fastest, most intelligent model to date. It tops the @ArtificialAnlys leaderboards in agentic tool calling and instruction following, and ranks #1 in @ValsAI enterprise domains like case law and corporate finance. Grok 4.3 supports a 1 million token context window and is priced at $1.25/m input and $2.50/m output. Create an API key and start building: http://console.x.ai/team/default/api-keys

译Grok 4.3 现已在 xAI API 上线。这是我们迄今为止最快、最智能的模型。它在 @ArtificialAnlys 排行榜上的智能体工具调用和指令遵循方面位居榜首，并在 @ValsAI 的企业领域（如判例法和公司金融）中排名第一。 Grok 4.3 支持 100 万令牌的上下文窗口，定价为输入每百万令牌 1.25 美元，输出每百万令牌 2.50 美元。创建 API 密钥并开始构建：http://console.x.ai/team/default/api-keys

Rohan Paul@rohanpaul_ai · 5月6日65

The first frontier model with a 12 million token context window just launched. - 52x faster than FlashAttention at 1MM tokens - Less than 5% the cost of Opus @subquadratic just announced a major breakthrough in changing the cost curve of attention in LLM. They brought a frontier-scale LLM built entirely around sub-quadratic sparse attention, where the model selectively computes only the important token relationships so very long context can scale far cheaper and faster than standard transformer attention. In normal transformers, long context is painfully expensive because as context grows, the attention work grows roughly with the square of the sequence length. A 1M-token document is not just “a long document” for a normal model; it is a massive grid of possible token relationships. SubQ’s key technique is that most of that grid is useless. A legal contract does not need every comma to compare itself with every sentence from 400 pages ago. A codebase does not need every variable name to attend equally to every unrelated file. SubQ is saying: let the model find the few relationships that probably matter, then spend compute there.

译SubQ模型发布，这是首个基于完全次二次稀疏注意力架构（SSA）的前沿LLM，拥有1200万token的上下文窗口。它在处理100万token时比FlashAttention快52倍，成本低于Opus的5%。该模型突破传统Transformer注意力计算所有token关系的限制，通过稀疏注意力选择性聚焦重要关系，使长上下文处理的计算量减少近1000倍，显著改变了LLM的成本曲线和扩展方式。