probably the best reward function for reasoning efficiency i've seen

译可能是我见过的最好的推理效率奖励函数。

Qwen: Foundation Models for the Agent Era with Steven Hoi, Head of Multimodal Interaction, Tongyi Large Model BU Qwen3.7 delivers major breakthroughs in reasoning, fully upgrading native agentic capabilities across tool use, coding, and long-horizon tasks.

译Qwen：面向智能体时代的基座模型，由通义大模型BU多模态交互负责人Steven Hoi介绍。 Qwen3.7在推理方面取得重大突破，全面升级了工具使用、编码和长程任务的原生智能体能力。

Josh Woodward@joshwoodward · 6月3日53

✅ Papercut fixed: Thinking Levels are now available on Gemini across Web, iOS, and Android.

译✅ 已修复：思考级别功能现已在 Gemini 的 Web、iOS 和 Android 平台上线。

Berryxia.AI@berryxia · 6月3日76

兄弟们，Google DeepMind 团队又来整活儿！ Google DeepMind的最新发布，直接把“AI能帮科学家干嘛”这个老问题彻底翻篇了。他们把Gemini做成了一个叫Co-Scientist的多Agent系统。不是简单问答工具，是完整复制了科学家从idea到验证的整个循环：生成上千个假设、举办“idea锦标赛”、让多个Agent展开科学辩论、互相批判精炼，最后用文献、数据和搜索工具把每个主张落地验证。以前科研最卡的环节，就是一个人脑力有限，生成好假设、反复辩论、跨领域拉新知识都要靠自己。现在Co-Scientist把这个过程变成可规模化的流水线。过去一年他们和全球顶尖科学家一起测，在肝纤维化新靶点、肌萎缩侧索硬化（ALS）新疗法、逆转衰老的遗传线索这些超级复杂的问题上，都拿出了真正有潜力的新方向。最反直觉的一点是：它不是来取代科学家的，只是真正成了“专职研究伙伴”。科学家终于可以把脑力从“反复想假设、反复查文献”里解放出来，专注在最有创造力的判断和实验设计上。 AI把以前只有顶尖团队才玩得起的“高强度idea迭代”变成了人人可用的基础设施。现在他们已经把Hypothesis Generation功能开放给个人研究者，直接通过Gemini for Science就能用。普通研究员也能拥有一个24小时不睡觉、能辩论、能验证、还能不断进化的AI合作者。这其实戳破了当前最主流的误解：很多人以为AI会让科学家失业，结果真实路径是AI把科学发现的速度和广度直接拉高一个数量级，让更多人能真正参与到突破性研究里。

译Google DeepMind发布了基于Gemini的多Agent系统Co-Scientist，旨在实现科研流程自动化。该系统能够生成、辩论和验证假设，帮助科学家从高强度脑力劳动中解放出来。过去一年，它已在肝纤维化新靶点、ALS新疗法等复杂问题上与科学家合作探索出新方向。其定位并非取代科学家，而是作为“专职研究伙伴”。目前，其假设生成功能已通过Gemini for Science向个人研究者开放。

SenseTime@SenseTime_AI · 6月3日34

At SenseTime, we believe the future of #AI is shaped by continuously pushing the boundaries of #FoundationalInnovation. At the 2026 AI Innovation Forum, our Co-founder and Chief Scientist Dr. @lindahua highlighted an important industry trend: #ModelArchitecture optimization can significantly reduce the compute required per unit of intelligence. He also note that China’s AI ecosystem should leverage application and model innovation to drive chip development forward. 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭, SenseTime's latest multimodal model built on our proprietary 𝗡𝗲𝗼-𝗨𝗻𝗶𝗳𝘆 𝗮𝗿𝗰𝗵𝗶𝘁𝗲𝗰𝘁𝘂𝗿𝗲, demonstrates this in practice — achieving significantly lower #ComputeCosts in infographic generation while being simultaneously adapted to multiple #ChineseChips. At the same time, we continue developing AI solutions that genuinely solve user problems and create sustainable #CommercialValue, strengthening our long-term competitiveness in the evolving AI arena. Forum organisers: China International Capital Corporation Limited, @hkust

译商汤联合创始人兼首席科学家在2026 AI创新论坛指出，模型架构优化能显著降低单位智能所需的算力。其新发布的多模态模型SenseNova U1基于自研Neo-Unify架构，实践了这一理念，在生成信息图时实现了显著更低的计算成本，并已适配多款中国芯片。商汤强调持续通过应用与模型创新推动芯片发展，以创造商业价值与长期竞争力。

Rohan Paul@rohanpaul_ai · 6月3日57

Stanford researchers found that law professors preferred AI answers over peer professor answers 75% of the time when judging contract-law help for students. The study tested whether LLMs can handle a field where the answer is often not a fact, but a defensible argument built from rules, exceptions, and judgment. The professors wrote 40 real student-style questions, gave their own answers, and then blindly judged nearly 3,000 comparisons between human and AI responses. The striking result was not just that AI won often, but that professors marked AI answers as harmful only 3.5% of the time, compared with 12% for human answers. i.e. the model was not merely sounding fluent, but often matching the teaching standard law professors use when explaining ambiguity to students.

译斯坦福研究人员发现，在评估合同法问题时，法律教授有75%的次数更倾向于选择AI给出的答案，而非同行教授的答案。该研究让教授们针对40个真实学生提问撰写答案，并对近3000个人类与AI的回答进行了盲测比较。结果不仅显示AI胜出频率高，而且教授们仅将3.5%的AI答案标记为“有害”，而对人类答案的有害标记率为12%。这表明大语言模型并非只是流畅，其表现常能达到教授向学生解释法律模糊性的教学标准。

MiniMax (official)@MiniMax_AI · 6月3日74

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun A few highlights 🧵 1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and does block-level selection with a small top-K. That's how the 1M context window stays tractable. 2. The efficiency win is huge. In our previous generation, ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation. 3. M3 isn't just a coding model. Natively multimodal (image + video in), ability to handle long-horizon agentic tasks, and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it (Unity included) and it's holding its own. 4. M3 can self-evaluate on vision-coding tasks: it builds a website or SVG, browses and inspects its own rendered output, judges it, and iterates - grading work visually. 5. We're also seeing junior-analyst-level performance on finance tasks; something we haven't even showcased publicly yet. 6. What's next: harder long-horizon / multi-file tasks in future releases, scaling data + post-training (RL) compute toward pre-training scale, and going deeper into finance, legal & bio. Thanks to everyone who joined 🙏 Try M3 link in the comments👇

译MiniMax M3模型通过Live Session分享了核心信息。其MSA技术采用块级Top-K选择，保持真实、未压缩的KV缓存，使1M token上下文窗口高效运行。该技术将长上下文生成的注意力内核解码时间从约30%降至约5%，效率提升显著。M3是原生多模态模型，支持图像视频输入，可处理长程智能体任务及桌面操作，并具备视觉自评估迭代能力。模型在金融任务中展现出初级分析师水平。未来版本将聚焦更复杂的长程任务，并扩展金融、法律与生物领域。Together AI为其提供推理服务。

Rohan Paul@rohanpaul_ai · 6月3日81

Microsoft unveiled MAI-Thinking-1. So Microsoft now has a full in-house pipeline for building stronger reasoning models again and again. Microsoft calls this system a “hill-climbing machine,” meaning it keeps improving the data, training setup, rewards, safety tests, and evaluations as one connected process. Strong for its size, including 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro. MAI-Thinking-1 is the first model from that process, using 35B active parameters inside a 1T total parameter mixture-of-experts model, where only part of the model runs for each token. The base model was trained from scratch on 30T mostly human-generated tokens, with Microsoft saying it avoided third-party model distillation during pre-training. After that, the team used reinforcement learning, which means the model practiced tasks and improved from feedback, to teach math reasoning, coding, tool use, helpfulness, and safety.

译微软发布了 MAI-Thinking-1，这是一款采用 MoE 架构的模型，拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练，且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中，该模型于 AIME 2025 获得 97.0%，在 LiveCodeBench v6 获得 87.7%，在 SWE-Bench Pro 获得 52.8% 的成绩。

Chubby♨️@kimmonismus · 6月3日50

Just figured out that „Mai“-1 thinking stands for: Microsoft AI-thinking. 🤯

译刚刚发现“Mai”-1 thinking 代表：微软 AI 思考。 🤯

Chubby♨️@kimmonismus · 6月3日63

Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6 0 distillation „Microsoft’s first reasoning model“

译Mai-1 thinking：中型模型，45b 活跃参数，MoE，与 Sonnet 4.6 并列 0 知识蒸馏 “微软的首个推理模型”

🚨 AI News | TestingCatalog@testingcatalog · 6月3日70

MICROSOFT 🔥: New MAI Code 1 Flash and MAI Thinking 1 models have been revealed on the official MAI website! Also, MAI Image 2.5, MAI Voice 2, and MAI Transcribe 1.5 are there too. > MAI-Code-1-Flash plans and reasons through complex coding tasks from start to finish, so you spend less time debugging and more time building. > MAI-Thinking-1 (35B active, ~1T total parameters, MoE) has a smaller inference footprint than much larger models, yet is competitive with Claude Opus 4.6 on SWE-Bench Pro. h/t @MeetPatelTech

译微软在官网更新了 MAI 模型系列，重点发布了 MAI Code 1 Flash 和 MAI Thinking 1。MAI Thinking 1 拥有 35B 活跃参数和约 1T 总参数，采用 MoE 架构，其推理成本低于更大型模型，但在 SWE-Bench Pro 上的表现可与 Claude Opus 4.6 竞争。MAI Code 1 Flash 则专注于通过规划和推理来完成端到端的复杂编码任务。此外，MAI Image 2.5、MAI Voice 2 及 MAI Transcribe 1.5 也同步上线。

Google DeepMind@GoogleDeepMind · 6月3日61

We believe AI can be a dedicated research partner to help discover the next breakthrough. Enter Co-Scientist: our latest Gemini-based multi-agent system that can generate, debate and evolve novel hypotheses for complex scientific problems 🧵

译我们相信 AI 可以成为专属研究伙伴，帮助发现下一个突破。隆重推出 Co-Scientist：我们最新的基于 Gemini 的多智能体系统，能够为复杂科学问题生成、辩论和演进新颖的假设 🧵

Chubby♨️@kimmonismus · 6月3日33

RTX spark running 120b parameter model locally. Ngl, pretty cool

译RTX显卡本地运行1200亿参数模型。说实话，挺酷的。

Berryxia.AI@berryxia · 6月3日50

今天这个视频又被很多人挖出来转发，是因为啥呢？ 🤔 半个月之前发布的视频，开始动起来了…

译Moonshot AI创始人杨植麟的40分钟视频近日被广泛转发。他在视频中详细拆解了Kimi K2的训练过程，其核心突破在于仅以460万美元的极低成本完成训练。在近期一场8模型实时编程大赛中，Kimi K2获得第一名。杨植麟通过分享强调了极致优化与架构设计的重要性。

OpenRouter@OpenRouter · 6月3日68

⚡ New provider drop: AI-Native Cloud from @digitalocean is now live on OpenRouter. High performance inference across popular open-weight models. #1 on output speed and latency for DeepSeek V3.2 by @ArtificialAnlys. See their stats and try the models: https://openrouter.ai/provider/digitalocean

译⚡ 新增服务商：DigitalOcean 的 AI-Native Cloud 现已在 OpenRouter 上线。提供高性能推理，覆盖热门开源权重模型。在 DeepSeek V3.2 的输出速度和延迟方面排名第一（数据来自 @ArtificialAnlys）。查看其数据并试用模型：https://openrouter.ai/provider/digitalocean

AK@_akhaliq · 6月3日62

GPU Forecasters Language Models as Selective Surrogates for Kernel Runtime Optimization

译GPU预测器大语言模型作为内核运行时优化的选择性代理

MiniMax (official)@MiniMax_AI · 6月2日72

Watch M3 reach the frontier 🚀

译MiniMax发布M3模型，宣称是首个将编程与智能体能力、1M上下文长度及原生多模态三大前沿能力结合的开源权重模型。其编程与智能体能力在多个评测中表现突出：SWE-Bench Pro得分59.0%，Terminal Bench 2.1得分66.0%，SWE-fficiency 34.8%，KernelBench Hard 28.8%，MCP Atlas 74.2%。模型通过MiniMax Sparse Attention技术支持1M上下文。官方提供了API接入与新的MiniMax Code服务，模型权重和技术报告预计约10天后发布。

StepFun@StepFun_ai · 6月2日74

We probably don’t talk enough about “usable.”

译我们可能对“可用性”的讨论还不够。当Flash模型同时将速度、成本和智能带入“可用”范围时，智能的供给方式发生了结构性变化。

StepFun@StepFun_ai · 6月2日69

This is exactly the philosophy: don't bolt on efficiency, design for it from day one. MFA + AFD aren't tricks. They're what lets Step 3.7 Flash serve at a fraction of the KV-cache cost. Huge thanks to @FireworksAI_HQ for making Step 3.7 Flash one-click to run. Go build something agentic with it.

译阶跃星辰发布其推理优化型模型Step 3.7 Flash。该模型为196B MoE架构，从设计之初就专注于推理效率。其采用多矩阵分解注意力机制，使KV-cache成本仅为DeepSeek模型的约22%；同时通过注意力与FFN解耦技术，实现了硬件优化的高效服务。该模型已通过Fireworks AI提供，采用Apache 2.0许可，并可用于构建智能体应用。

MiniMax (official)@MiniMax_AI · 6月2日55

we're live now 🔴 Inside M3 with @togethercompute: the model, the MSA architecture, and the inference powering it. come hang 👇 https://x.com/i/spaces/1nxeLLDDBEaJX/peek

译我们现在正在直播 🔴 与 @togethercompute 一起深入探讨 M3：模型、MSA 架构以及驱动它的推理技术。欢迎加入 👇 https://x.com/i/spaces/1nxeLLDDBEaJX/peek

AYi@AYi_AInotes · 6月2日65

苹果、Intel、AMD、高通，今晚大概率睡不好了。统治了 PC 整整 30年的 Wintel 王朝，今天被一个卖显卡的，连桌子一起掀了。 NVIDIA 的 RTX Spark，一块 3nm 的 SoC，把 ARM CPU、Blackwell GPU、128GB 统一内存焊进同一颗芯片，塞进 14mm 的超薄本，本地跑 120B 大模型， 1440p 满帧跑 3A，拔了电源帧数硬是一格没掉。但真正让那四家睡不着的，还不只是这些参数。过去三十年的 PC，像一群车厂在比谁的排量大，所有人盯着 CPU 跑分， Intel Inside 就是品质保证，竞争全在同一套规则里打。 NVIDIA 今天开进来一辆电动车，直接说规则换了，以后比的是 AI 算力和谁的软件生态更深，而它那张生态网，叫 CUDA，已经铺了二十年。这一下，每一家被点到的，都得正面回应： Intel 和 AMD 还能追性能、追制程，追不上的是那二十年攒下的开发者。苹果 2020 年就用 M 系列证明了 ARM 加统一内存能有多强，可它把 CUDA 拦在门外， NVIDIA 干脆绕开，在 Windows 这边复刻了一遍，还多带了苹果永远不肯给的东西，完整 GPU 生态、3A 游戏、CUDA 全栈。高通的 Snapdragon X 先跑了一年 Windows on ARM，没有 GPU 生态撑腰，整个故事还只讲了一半，没想到今天这个位置被人抢了当然，发布会上说的，和真正用起来之间，向来隔着一段距离。 ARM 版 Windows 的兼容层跑老软件掉多少、满载久了降不降频、这套东西最后卖什么价，老黄一个都没交代。但方向似乎已经钉死了，过去你买电脑，Intel Inside 是贴在机身上的那张品质标签，但是往后这张标签，得换人贴了。 NVIDIA 今天卖的不只是一颗芯片，还有下一个三十年 PC 行业的入场券。

译NVIDIA发布RTX Spark，一款3nm制程的SoC，整合了ARM CPU、Blackwell GPU及128GB统一内存。它被应用于超薄笔记本，可本地运行120B大模型，并在1440p分辨率下满帧运行3A游戏，拔电后性能不降。此举被视为PC行业竞争规则的转变，从比拼CPU性能转向比拼AI算力与CUDA软件生态，标志着NVIDIA对Wintel王朝的挑战。该方案绕开了苹果对CUDA的限制，并抢先在Windows平台复刻了ARM架构加完整GPU生态的路径，旨在争夺未来三十年的PC行业主导权。

MiniMax (official)@MiniMax_AI · 6月2日61

the price tags tell the story 👀 M3 on @aimlapi! go test it yourself 😎

译MiniMax M3 模型现已登陆 AI/ML API 平台。平台方通过 one-shot Doodle Jump 游戏对多个模型进行了测试，结果显示各模型输出效果相近，但价格差异显著：MiniMax M3 价格为 $0.05，Qwen 3.7 Max 为 $0.08，DeepSeek V4 Pro 为 $0.10，GPT-5.5 为 $0.42。MiniMax 方面以此强调其高性价比，并宣布该模型目前提供限时 50% 折扣。

Rohan Paul@rohanpaul_ai · 6月2日57

Intel is aiming to debut a new AI data center chip before the year closes, comes with lower-cost memory and cooling technology than Nvidia and AMD alternatives. AI boom is quietly shifting from building models to running them, millions of times a day, for ordinary users. Intel’s Crescent Island strategy appears to lean into inference: air cooling instead of liquid cooling, LPDDR5 instead of high-bandwidth memory, and a focus on workloads where lower cost may matter as much as raw performance. After Gaudi failed to break through, the company seems to be choosing a narrower battlefield where its weaknesses hurt less and its manufacturing ambitions might eventually help more.

译英特尔计划在年底前推出一款新的AI数据中心芯片，主打低成本策略以与英伟达、AMD竞争。其代号“Crescent Island”的策略聚焦推理任务，采用空气冷却和LPDDR5内存以降低总体成本，而非追求极致性能。这是在Gaudi芯片未能成功突围后，英特尔选择进入一个对自身制造优势更有利的细分市场。

xAI@xai · 6月2日67

Composer 2.5 is now available inside Grok Build. Composer 2.5 is a fast, highly intelligent model that excels on long-running tasks and following complex instructions.

译Composer 2.5 现已在 Grok Build 中可用。 Composer 2.5 是一个快速、高度智能的模型，擅长处理长时间运行的任务和遵循复杂指令。

Qwen@Alibaba_Qwen · 6月2日83

👏👏 Introducing Qwen3.7-Plus — a multimodal agent model that unifies vision and language into one versatile agent foundation. ✅ Multimodal interactive hybrid agent: unified GUI & CLI operation across visual and text tasks ✅ Versatile coding agent & productivity assistant with full-modality input ✅ Visual Agent: perception, reasoning, grounding, and search-augmented QA ✅ Cross-harness generalization across diverse agent frameworks One model. Sees, thinks, codes, acts.🙌🙌 Now available via API on Alibaba Cloud Model Studio. Try it — let us know what you build.😎 🔗🔗⬇️⬇️ Blog：https://qwen.ai/blog?id=qwen3.7-plus Qwen Studio：https://chat.qwen.ai/?models=qwen3.7-plus API：https://modelstudio.console.alibabacloud.com/ap-southeast-1?tab=doc#/doc/?type=model&url=2840914_2&modelId=qwen3.7-plus&serviceSite=international

译通义千问推出 Qwen3.7-Plus，这是一款统一视觉与语言能力的多模态智能体模型。它支持图形界面与命令行混合操作，可作为多功能编码智能体与生产力助手，并具备视觉感知、推理、定位与搜索增强问答能力。该模型设计为可跨多种智能体框架泛化。现在可通过阿里云百炼平台的 API 使用。

Rohan Paul@rohanpaul_ai · 6月2日74

Nemotron 3 Ultra will be available from Nvidia in few days. Hybrid SSM (state-space models) + mixture-of-experts architecture. The SSM part is built for long sequences, so the model can keep reasoning or using tools for longer without getting crushed by the usual attention cost. Jensen Huang at NVIDIA GTC Taipei 2026 ---- From 'NVIDIA' YT channel (link in comment)

译Nemotron 3 Ultra将在几天内由Nvidia发布。采用混合SSM（状态空间模型）+ 混合专家架构。 SSM部分专为长序列设计，因此模型可以更长时间地持续推理或使用工具，而不会被通常的注意力成本压垮。黄仁勋在NVIDIA GTC台北2026上表示。 ---- 来自'NVIDIA' YouTube频道（链接在评论中）

elvis@omarsar0 · 6月1日71

Very good advice on self-improving agents. (bookmark it) This is something I am seeing in my own experiments with coding agents and harnesses for long-horizon tasks. What I have found is that stronger models do not always evolve better agents. The current believe in self-evolving agents is that a bigger model writes better prompt and skill edits, so devs put their best model in the evolver seat. New research shows that intuition is mostly wrong. The work separates two abilities that usually get conflated. Producing harness updates stays flat across model capability, so Qwen3.5-9B writes edits roughly as good as Claude Opus 4.6. Benefiting from those updates follows an inverted-U that peaks at mid-tier models, while weak models fail to even activate the edits and strong models have little headroom left. This is important to understand as it tells you where to spend. Put a cheap model on the evolver and your expensive model on the solver, because the gains land solver-side, not evolver-side. Paper: https://arxiv.org/abs/2605.30621 Learn to build effective AI agents in our academy: https://academy.dair.ai/

译该研究指出，在自我改进的AI智能体中，“更强模型总能写出更好进化器提示词”的直觉是错误的。工作区分了两种能力：产生更新的能力在不同模型间趋于平坦，而从更新中受益的能力呈倒U形曲线，在中等模型处达到顶峰。弱模型无法有效激活更新，强模型则因已处性能高位而获益甚微。因此，成本效益最佳的配置是：使用廉价的中等模型担任“进化器”，而将昂贵的强模型用作“求解器”。

🚨 AI News | TestingCatalog@testingcatalog · 6月1日55

NVIDIA announced an upcoming release of Nemotron 3 Ultra later this week, a 550B-parameter open-weight model. According to Artificial Analysis, it is positioned as the most intelligent open-weight model from the US lab. Soon 👀

译NVIDIA宣布将于本周晚些时候发布Nemotron 3 Ultra，这是一个550B参数的开放权重模型。根据Artificial Analysis，它被定位为美国实验室最智能的开放权重模型。 Soon 👀

MiniMax (official)@MiniMax_AI · 6月1日64

It truly is 😎 #M3

译确实如此 😎 #M3

MiniMax (official)@MiniMax_AI · 6月1日77

M3 live on @novita_labs 🔥 it's time to build (50% off the first week 👀)

译MiniMax M3 模型现已在 Novita AI 平台上线，并提供首周半价优惠。作为首个开源权重模型，它集成了前沿编码与智能体能力，在 SWE-Bench Pro 上得分 59.0%，Terminal Bench 2.1 上得分 66.0%，MCP Atlas 上得分 74.2%。该模型上下文窗口最高可达 1M tokens，由 MiniMax Sparse Attention 技术支持，并从一开始即支持原生多模态，可处理文本与视觉理解任务。Novita AI 作为其 Day-0 API 发布合作伙伴，为开发者提供接入服务。

Berryxia.AI@berryxia · 6月1日71

刚刚在Hugging Face刷新模型时，看到KwaiKeye放出了Keye VL 2.0-30B-A3B。这个多模态模型总参数30B，活跃参数只有3B，Apache 2.0完全开源。它直接用DeepSeek Sparse Attention实现了256K上下文。最有意思的是视频理解部分的表现。你喂给它的帧数越多，模型准确率反而稳步上升。这和我们以前觉得长视频容易让模型迷失的直觉完全相反。它在多个长视频基准上已经和Qwen3 VL、Gemini 3 Flash打成平手。以前大家总觉得多模态模型要么上下文够长，要么理解够深，二者很难兼得。现在KwaiKeye把稀疏注意力真正落地，把这两件事同时推到一个新水平。实际效果如何，后面看看真实case册书。

译KwaiKeye开源了多模态大模型Keye VL 2.0-30B-A3B，采用Apache 2.0许可。该模型总参数为30B，但仅激活3B参数。其核心亮点是通过DeepSeek稀疏注意力技术实现了256K的上下文长度。该模型的视频理解能力表现出一个反直觉的特性：喂入的帧数越多，其准确率反而持续上升。在基准测试中，其表现已与Qwen3 VL、Gemini 3 Flash等模型相当。

Artificial Analysis@ArtificialAnlys · 6月1日81

NVIDIA just announced the release of Nemotron 3 Ultra in Jensen Huang's Computex keynote: at 550B parameters (55B active), this is the largest Nemotron 3 model to date, and it is the most intelligent US open weights model We partnered with @nvidia to evaluate this model for intelligence and speed - these figures use the model’s BF16 weights, but as with Nemotron 3 Super the model will be made available in NVFP4 quantization as well for higher inference performance. ➤ New leader for US open weights intelligence: Nemotron 3 Ultra scores 48 on the Artificial Analysis Intelligence Index. This is well ahead of the next strongest US open weights models, Gemma 4 31B (39), Nemotron 3 Super (36) and gpt-oss-120b (33), but behind the Chinese-led open weights frontier (Kimi K2.6 at 54). ➤ Leading speed for its intelligence: on a pre-release @DeepInfra endpoint, Nemotron 3 Ultra served over 300 tokens per second. Peer models in its size class from China-based labs such as DeepSeek and Moonshot (Kimi) are generally served at speeds of 50-100 tokens per second in the market today. gpt-oss-120b is served at speeds similar to this level, but with significantly lower intelligence. ➤ Largest Nemotron 3 model so far: at approximately 550 billion total parameters and 90% sparsity, Nemotron 3 Ultra is significantly larger than its siblings and is the largest recent US open weights model release We’ll be sharing additional analysis and full benchmarks at release.

译NVIDIA在Computex上发布了Nemotron 3 Ultra，总参数达550B（激活参数55B），是目前最大的Nemotron 3模型。该模型在美国开放权重模型中智能性最强，在Artificial Analysis Intelligence Index评测中得分为48，超越了Gemma 4 31B（39分），但仍落后于月之暗面（Kimi）的K2.6（54分）。在推理速度方面，其在预发布端点上超过了300 tokens/s，远高于同级别中国模型通常的50-100 tokens/s。该模型将提供BF16权重及NVFP4量化版本以提升推理性能。

MiniMax (official)@MiniMax_AI · 6月1日69

@CreaoAI moving fast 🔥 M3's live on day one, go try it

译@CreaoAI 行动迅速 🔥 M3在第一天就上线了，快去试试 [引用 @CreaoAI]：MiniMax M3现已在CREAO上线。采用稀疏注意力推理，在长上下文下解码速度最高提升15.6倍，专为需要处理海量代码库、文档和转录文本而不减速的智能体打造。从模型下拉菜单中选择M3即可运行。⚡

OpenCode@opencode · 6月1日63

MiniMax M3 will be launching soon You can try it right now in OpenCode For free

译MiniMax M3即将发布你现在就可以在OpenCode中免费试用

SemiAnalysis@SemiAnalysis_ · 6月1日30

10x speed at a 20x to 50x price premium per token. We're about to find out exactly how much the enterprise market is willing to pay for ultra-low latency AI.

译速度提升10倍，但每token价格溢价20至50倍。我们即将确切了解企业市场愿意为超低延迟AI支付多少费用。

Rohan Paul@rohanpaul_ai · 5月31日59

Some cool visuals. Dell Delivers world's first Nvidia Vera Rubin NVL72 rack to CoreWeave. It packs 72 Rubin GPUs, 36 Vera CPUs, 3.6 exaFLOPS of FP4 inference, 75 TB of fast memory, and 260 TB/s NVLink bandwidth

译一些很酷的视觉效果。戴尔向CoreWeave交付全球首个Nvidia Vera Rubin NVL72机架。它包含72个Rubin GPU、36个Vera CPU、3.6 exaFLOPS的FP4推理性能、75 TB的快速内存和260 TB/s的NVLink带宽。

AYi@AYi_AInotes · 5月31日66

对大部分人来说，Codex就是目前最顶最好用的生产力工具，都全面拥抱用起来！！那么Codex里的4个模型怎么选最省钱？ 1️⃣先说最贵的那个，gpt-5.5 是质量优先的旗舰，它适合复杂编码、复杂推理、知识工作、研究流程，尤其是那种看着像写东西、背后却要走好几步判断的活，官方给它的定位就是旗舰级，价格也站在最高那一档，输入 $5.00、输出 $30.00 每 100 万 tokens。

译Codex（由OpenAI发布）提供四个可选模型。其中，gpt-5.5作为质量优先的旗舰模型，适用于复杂编码、推理及知识工作，其定价较高，为输入$5.00、输出$30.00每百万tokens。主推文旨在帮助用户根据任务类型与成本考量进行选择。

郭明錤｜Ming-Chi Kuo@mingchikuo · 5月31日61

Nvidia's Much-Anticipated, Reportedly Upcoming N1X / Windows PC Processor: Supply Chain Checks and Key Takeaways ▌Supply chain checks point to around 10M shipments of N1X-based devices over the next two years. ➡ Still a niche market, aimed at power users who need on-device AI compute. ➡ Whether shipments get revised up will come down to price, but mainly to whether Windows can deliver apps and workflows that truly orchestrate on-device AI compute. ▌Today, the main ways people use AI on a PC (both Windows and Mac) are accessing cloud LLM services through a browser and calling LLMs via API to consume a cloud provider's compute / tokens: ➡ In both cases, the core AI compute happens in the cloud, not on the device. ▌So far in 2026, the two hottest stories in the PC market have had almost nothing to do with on-device AI compute: ➡ Strong MacBook Neo sales. My industry checks suggest 2026 shipments of Neo models were revised up by roughly 100% (5M → 10M). Buyers are paying for price, design, and ecosystem, not for on-device AI compute. ➡ Cheap mini PCs, still niche, are drawing a lot of attention because they can run AI agents (like OpenClaw) around the clock (e.g., Mac mini). These agents also run inference in the cloud. ➡ Bottom line: neither the sales nor the buzz has much to do with on-device AI compute. ▌The key to on-device AI driving an upgrade cycle is the operating system (OS): ➡ What really sets on-device AI apart from the cloud is its ability to deeply integrate a user's data and workflows across apps while keeping things private. But that needs OS support. ➡ AI in today's PC OS is still mostly about adding AI features to first-party apps and loosely connecting workflows across apps. ➡ Some apps already make good use of on-device AI compute, like speech-to-text, but not enough to drive meaningful upgrade demand. ▌The N1X devices could give AI power users another solid option: ➡ Thanks to the N1X, device makers can strike a better new balance across AI compute, memory, design, and portability. ➡ For power users running LLMs on-device, an N1X device is a solid alternative to the Mac when it comes to capable on-device AI compute and large memory. ➡ But if the goal is a real upgrade cycle, then beyond price, OS support (Windows) is still what matters.

译供应链信息显示，Nvidia即将推出的N1X处理器设备未来两年出货量约1000万台，仍属面向需要设备端AI算力的性能用户的小众市场。2026年PC市场热点是MacBook Neo销量上调和可运行AI智能体的小型PC，但两者均与设备端AI算力无关。真正的设备端AI优势在于操作系统层面的隐私与深度整合，而当前Windows的支持尚不足。N1X设备能为需要本地运行大语言模型的用户，提供一个更平衡的选择，但能否驱动升级周期，关键仍在于Windows能否提供相应的应用与工作流支持。

SemiAnalysis@SemiAnalysis_ · 5月31日61

BREAKING NEWS: COREWEAVE & DELL IS THE FIRST CLOUD TO ANNOUNCE THAT THEY HAVE RUBIN VR200 NVL72 WITH FULLY PASSING L11 DIAGS. Next Step is to get multiple racks burnin in a couple & do software level bringup like sglang, vllm, dynamo, etc.

译突发新闻：CoreWeave与Dell是首个宣布其Rubin VR200 NVL72已完全通过L11诊断的云服务商。下一步是获取多个机架进行数周的烧机测试，并完成软件层面的启动工作，如SGLang、vLLM、Dynamo等。

Berryxia.AI@berryxia · 5月31日51

最近大家看到小米的MiMo 模型的降价！我今天看了一下用了120 w 差不多花了3块多钱！正好看到小米MiMo团队罗福莉分享的一篇技术博客。 V2.5系列刚把API价格降下来，背后其实是他们把推理系统彻底重构了一遍。他们用的Hybrid Sliding Window Attention架构，能把KVCache存储压缩到全注意力的约1/7。但罗福莉他们很清楚，架构优势在真实生产流量里不会自动变现。于是团队重新设计了KVCache管理、层级缓存和prefix-cache tree，针对SWA特有的缓存难题做了专项处理，同时深度优化了调度策略和Prefill/Decode流水线。在真实生产流量验证后，有效KVCache容量提升了接近5倍，主流框架下的服务端缓存命中率稳定在93%到95%。再叠加MoE配置调优和多模态推理优化，才真正把长上下文推理成本打下来，支撑了这次降价。这恰巧说明，好架构只是天花板，把它真正落地成可规模化、低成本的生产能力，才是决定模型性价比的关键。

译小米MiMo-V2.5系列近期实现了API降价。其核心支撑是团队对推理系统进行了彻底的工程重构。模型基于Hybrid Sliding Window Attention架构，理论上可将KVCache存储压缩至全注意力模型的约1/7。为将此架构优势落地，团队重新设计了KVCache管理、层级缓存和prefix-cache tree，并深度优化了调度与Prefill/Decode流水线。经真实生产流量验证，有效KVCache容量提升了近5倍，服务端缓存命中率稳定在93%-95%。这些优化与MoE配置调优共同作用，显著降低了长上下文推理成本，从而支撑了本次降价。