Qwen3.7-Max hits #1 on the @OpenRouter Trending LLM chart with 77.3B tokens in usage. And we are just getting started. 👇 https://int.alibabacloud.com/m/1000413314/

译Qwen3.7-Max 以 77.3B tokens 的使用量登顶 @OpenRouter 热门大语言模型榜单。而我们才刚刚开始。 👇 https://int.alibabacloud.com/m/1000413314/

Krea@krea_ai · 5月28日62

Krea 2 now built in to Hermes

译Krea 2现已内置到Hermes中。

Krea@krea_ai · 5月28日73

Krea 2 available in Comfy!

译Krea 2现已登陆Comfy！ KREA的首个基础图像模型——从零训练——具备可调节的创造力、风格参考和情绪板条件控制。

Krea@krea_ai · 5月28日58

Krea 2 is now live on Runware!

译Krea 2 现已在 Runware 上线！ - 两个版本：Large（照片级写实，创意控制）和 Medium（插画、动漫、设计） - 每次生成最多支持 10 张加权参考图 - 内置创意控制功能 - 支持情绪板和风格迁移 - 支持文生图和图生图模式

Emad@EMostaque · 5月27日69

Great to see @poolsideai (US lab) committing to open sourcing their foundation models going forward Laguna is an interesting release, check it out

译很高兴看到 @poolsideai（美国实验室）承诺未来将开源其基础模型。 Laguna 是一个有趣的发布，去看看吧。

Alibaba Cloud@alibaba_cloud · 5月27日78

1M context. Smarter reasoning. More possibilities.Excited to see Qwen3.7 Max now available in Go with @opencode 🚀

译100万上下文窗口。更智能的推理。更多可能性。很高兴看到 Qwen3.7 Max 现已通过 @opencode 支持 Go 语言调用 🚀

歸藏(guizang.ai)@op7418 · 5月27日62

MiniMax M3模型要上了，好久没发新模型了他们

Qwen@Alibaba_Qwen · 5月27日68

🚀🚀 Qwen3.7-Max just hit #4 on Code Arena, on par with Claude Opus 4.6 ，top-ranked Chinese lab on the board! @arena More to ship. Stay tuned. 🕶️

译🚀🚀 Qwen3.7-Max 刚刚在 Code Arena 上升至第 4 名，与 Claude Opus 4.6 持平，是榜单上排名最高的中国实验室！@arena 更多内容即将发布。敬请期待。🕶️

Berryxia.AI@berryxia · 5月27日55

兄弟们，MiniMax M3 要来了~~~ MiniMax AI工程负责人Skyler Miao今天只发了一句“Something BIG is coming”。配图里藏着M3模型的核心架构：基于GQA的动态块稀疏注意力。它先用一个轻量索引分支快速扫完整上下文，选出最相关的token块，再只对这些块做真正的Sparse Attention（稀疏注意力）。结果在1M token上下文上，Prefill （预填充）速度比M2快9.7倍，解码速度快15.6倍。以前大家卷长上下文，算力成本像天文数字。现在MiniMax直接把这个天花板砸出一个口子，让百万token级别的Agent任务真正能落地。长上下文不再是“能跑就行”，而是开始变得又快又省。 MiniMax M3一旦发布，DeepSeek V4之外，又多了一个能把1M上下文真正玩转的选手。

译MiniMax即将发布M3模型。其核心架构为基于GQA的动态块稀疏注意力机制，通过轻量索引分支筛选相关token块进行稀疏注意力计算。性能方面，在1M token上下文窗口下，Prefill速度相比M2提升9.7倍，解码速度提升15.6倍。该设计旨在大幅降低处理超长上下文的算力成本，使百万token级别的Agent应用得以更高效落地。

Artificial Analysis@ArtificialAnlys · 5月27日67

OpenBMB has released MiniCPM5-1B (Non-reasoning), the leading 1B open weights model, scoring 17.9 on the Artificial Analysis Intelligence Index @OpenBMB is a China-based lab jointly founded in 2022 by Tsinghua University’s NLP Lab and ModelBest Inc. This release extends the open weights Pareto frontier for Intelligence vs. Parameters at the sub-2B scale. It sits almost 2 points ahead of the best-performing 2B open weights model, @Alibaba's Qwen3.5 2B (Reasoning, 16.3), and 7 points ahead of Qwen3.5 0.8B (Reasoning, 10.5). Unlike the recently released MiniCPM-V 4.6 1.3B Instruct, MiniCPM5-1B (Non-reasoning) does not support native multimodal input, and is text input and output only. Key results: ➤ MiniCPM5-1B scores 17.9 on the Artificial Analysis Intelligence Index, the highest of any open weights model at 1B parameters or below by 7.4 points. The next-most-intelligent open weights model at this scale is Qwen3.5 0.8B (Reasoning, 10.5). No other open weights model under 2B parameters has exceeded 15 on the Intelligence Index; its predecessor MiniCPM-V 4.6 1.3B sits at 12.7. ➤ MiniCPM5-1B extends the open weights Pareto frontier on both Intelligence vs. Total Parameters and Intelligence vs. Active Parameters at the sub-2B scale. It surpasses its predecessor MiniCPM-V 4.6 1.3B (12.7) by 5.3 points at ~23% fewer parameters, and beats Qwen3.5 2B (Reasoning, 16.3) by 1.6 points at less than half the parameter count. ➤ MiniCPM5-1B is more token-efficient than the larger reasoning peers it surpasses, but uses more output tokens than its (also non-reasoning) predecessor MiniCPM-V 4.6 1.3B. It used 12.6M output tokens to run the Intelligence Index, ~31x fewer than Qwen3.5 2B (Reasoning, 389M) and ~8x fewer than Qwen3.5 2B (Non-reasoning, 100M), but ~2.3x more than MiniCPM-V 4.6 1.3B's 5.4M. ➤ AA-Omniscience score of -1 is the highest in its size class, earned by abstaining rather than hallucinating. MiniCPM5-1B declines to answer the vast majority of AA-Omniscience questions, avoiding the hallucination penalty that pulls sub-2B peers down to the -70 to -89 range (Qwen3.5 0.8B Non-reasoning at -89, MiniCPM-V 4.6 1.3B at -85, Exaone 4.0 1.2B Non-reasoning at -83). Choosing to abstain rather than guess is the more honest posture, and AA-Omniscience credits it positively. Additional model details: ➤ Size: 1B total parameters (dense) ➤ Context window: 128K ➤ Modality: Text input and output only ➤ Precision: BF16 ➤ License: Apache 2.0 ➤ Providers: No confirmed providers upon release

译OpenBMB发布了MiniCPM5-1B（Non-reasoning），一款1B参数的稠密大语言模型。该模型在Artificial Analysis Intelligence Index上获得17.9分，成为1B及以下开源模型中得分最高者。其得分领先同规模模型Qwen3.5 0.8B（10.5分）和Qwen3.5 2B（16.3分），性能超越前代模型MiniCPM-V 4.6 1.3B（12.7分）。MiniCPM5-1B为纯文本模型，上下文窗口128K，采用Apache 2.0许可证。在AA-Omniscience测试中，其通过选择“拒绝回答”而非猜测，避免了模型幻觉惩罚，获得了同尺寸类别的最高分。

Berryxia.AI@berryxia · 5月27日71

兄弟们，端侧生图模型又有新东西了！之前有客户端Drawthing 客户端也是支持iPad 手机进行文生图～今天PrismML也来了一个好玩的… 他们发布了1-bit和Ternary版本的Bonsai Image 4B扩散模型。 1-bit那版只有0.93GB，比全精度模型小8.3倍。 Ternary版1.21GB，用-1、0、+1的三元权重，在保持极致小巧的同时，把图像质量和提示词遵循度拉得更高。两者在Mac M4 Pro上生成速度最高能快5.6倍。更重要的是，质量上它跟那些大得多的模型打得有来有回。对象构图、人像偏好、美学评分、复杂提示跟随，全都不落下风。他们还同步上线了Bonsai Studio这个iOS App。直接在iPhone上本地生成图像，不用订阅，不用调用API，彻底离线可用。这套极致压缩技术，把以前只有云端才能玩的高质量图像生成，真正塞进了个人设备。完了测试效果看看如何… 等我

译PrismML发布了Bonsai Image 4B扩散模型的1-bit和Ternary两个极致压缩版本。1-bit版本仅0.93GB，比全精度模型缩小8.3倍；Ternary版本为1.21GB，采用-1、0、+1三元权重。两者在Mac M4 Pro上的生成速度最高可提升5.6倍，且生成质量可与更大模型相媲美。同时，PrismML推出了配套的iOS应用Bonsai Studio，支持在iPhone上完全离线、本地生成图像。

🚨 AI News | TestingCatalog@testingcatalog · 5月27日49

MiniMax M3 has been teased 🔥 > MiniMax M3 will be based on a new Sparse Attention architecture > MiniMax M3 is expected to be open source Soon? 👀

译MiniMax M3 已被预告 🔥 > MiniMax M3 将基于新的稀疏注意力架构 > MiniMax M3 预计将开源很快？ 👀

OpenCode@opencode · 5月27日66

Qwen3.7 Max now available in Go - text only - 1M context - smartest model in the Qwen family to date

译Qwen3.7 Max 现已在 Go 平台上线 - 仅支持文本 - 1M 上下文 - 迄今为止 Qwen 家族中最智能的模型

Chubby♨️@kimmonismus · 5月27日70

MiniMax just teased their Sparse Attention architecture for M3. The benchmarks show 9.7x prefilling speedup and 15.6x decoding speedup at 1M tokens vs M2. MiniMax deliberately went back to full attention for M2 because efficient attention wasn't production-ready. Their pretrain lead wrote a whole blog post about it in March. Now they're showing a new two-stage approach, lightweight index branch for block selection, then sparse attention only on relevant KV blocks. Really interesting. And tbh I'm always happy when open source receives new wins.

译MiniMax预览了其M3架构采用的新稀疏注意力（Sparse Attention）技术。测试显示，在1M token上下文下，该技术相比M2实现了9.7倍的预填充（prefilling）加速和15.6倍的解码（decoding）加速。M2曾为保证生产环境就绪而采用全注意力机制，M3则采用了新的两阶段方法：先用轻量级索引分支选择数据块，再仅对相关的KV块执行稀疏注意力。这是开源领域的新进展。

MiniMax (official)@MiniMax_AI · 5月26日41

#MSA #OpenSource #M3 🫣😎

译#MSA #开源 #M3 🫣😎

Alibaba Cloud@alibaba_cloud · 5月26日68

Qwen3.7-Max is officially the #2 AI coding model globally. Scoring 1541 on Code Arena, it trails only Claude. Built for production: runs 35-hour tasks, 1000+ tool calls, and ships 2-week projects in hours.

译Qwen3.7-Max 正式成为全球第二大 AI 编程模型。在 Code Arena 上得分 1541，仅次于 Claude。专为生产环境打造：可运行 35 小时任务、1000+ 次工具调用，并在数小时内交付两周的项目。

向阳乔木@vista8 · 5月26日70

以前只知道有个乐队叫子曰，没想到网易有道大模型也叫子曰。最新发布的子曰4是一个全模态模型，27B参数，视觉数理方向同规模SOTA，纯文本数理难题准确率81.4%。在27B这个“甜点级”参数规模里，子曰4做到了中文学习场景下多模态+纯文本数理推理的双料极佳。这次，子曰4全模态模型和TTS引擎同步开源，开放参数权重，支持本地部署、二次训练。尤其是 TTS 模型，看介绍有点强：只需3秒就能克隆原声，支持14种语言，克隆准确度超97%，音色还原度 95%以上。在线录了13秒音频，克隆我的声音，然后朗读朋友写的诗，效果如下：

译网易有道发布子曰4，一个27B参数的全模态大语言模型，在视觉数理方向达到同规模SOTA，纯文本数理难题准确率为81.4%。该模型在27B“甜点级”参数规模下，实现了多模态与纯文本数理推理的双重优势。同时，子曰4全模态模型和TTS引擎已同步开源，开放参数权重，支持本地部署与二次训练。其TTS模型只需3秒即可克隆原声，支持14种语言，克隆准确度超97%，音色还原度达95%以上。

Tencent Hy@TencentHunyuan · 5月26日69

🙏 Thank you all for the incredible love and support! Our latest Tencent Hunyuan translation models are on fire on Hugging Face: 🥰Hy-MT2-1.8B ranks #1 🥰Hy-MT2-30B-A3B ranks #4 on the open-source model trending leaderboard, with over 7K downloads already! To make it even easier for everyone, we’ve launched the Tencent Hy Translation WeChat mini-program, built on Hy-MT2. It supports voice input and offline translation, plus powerful customization of translation styles and instructions — delivering results that better match your expectations and feel far more practical. Try it out and share your feedback with us — we’d love to hear from you! Models on HF: https://huggingface.co/tencent/Hy-MT2-1.8B https://huggingface.co/tencent/Hy-MT2-30B-A3B GitHub: https://github.com/Tencent-Hunyuan/Hy-MT2 #HyMT2 #TencentHunyuan #OpenSource

译腾讯混元发布翻译模型 Hy-MT2，在 Hugging Face 开源模型趋势排行榜上表现突出：1.8B 版本排名第一，30B-A3B（MoE）版本排名第四，下载量已超 7K。官方同步推出了基于该模型的“腾讯混译”微信小程序，支持语音输入与离线翻译，并可自定义翻译风格与指令。模型代码与权重已开源。

Alibaba Cloud@alibaba_cloud · 5月26日16

https://x.com/i/broadcasts/1vJpPrMXaZbJE

译Anthropic发布了Claude Code的更新，现在它可以在后台运行任务。

Emad@EMostaque · 5月26日58

It’ll be interesting to see if the post training for this uses a multiple of the compute of pretraining as cursor did when they tuned Kimi as the base model

译xAI的Grok基础模型V9-Medium（1.5T参数）已完成训练，评测结果良好。在补充训练中加入了大量Cursor数据。该模型即将开始微调，几天后启动强化学习，预计2至3周后向公众发布。这将是相较于目前服务所有Grok生产流量的0.5T参数v8-small模型的重大改进，尤其在复杂编码任务上。有人推测其后训练可能使用了类似Cursor调整Kimi时远超预训练的计算量。

🚨 AI News | TestingCatalog@testingcatalog · 5月25日48

SPEACEXAI 🔥: The next Grok model is expected to be ready for public release in 2-3 weeks. > 1.5T V9-Medium base model in comparison to 0.5T v8-Small, used for Grok 4.3 > Cursor data being used for supplementary training Grok 5? 👀👀👀

译Grok基础模型V9-Medium（参数规模1.5T）已完成训练，评估结果良好，预计2-3周内向公众发布。该模型相较于当前服务所有Grok生产流量的0.5T v8-Small版本有巨大改进，尤其针对高难度编码任务。训练中加入了大量Cursor数据，并且未来还会有更多补充训练。目前微调已进行，强化学习将在几天内开始。

Elon Musk@elonmusk · 5月25日71

Grok foundation model V9-Medium (1.5T) has finished training. Evals look good. A lot of Cursor data was added in supplementary training and there is more to come. Fine-tuning is underway and reinforcement learning begins in a few days. 2 to 3 weeks to public release. This will be a major improvement over the 0.5T v8-small that currently serves all Grok production traffic, especially for difficult coding tasks.

译Grok基础模型V9-Medium（1.5T）已完成训练。评估结果良好。补充训练中加入了大量Cursor数据，后续还会有更多。微调正在进行中，强化学习将在几天后开始。预计2到3周内公开发布。这将比当前服务所有Grok生产流量的0.5T v8-small模型有重大改进，尤其在复杂编码任务上。

小互@xiaohu · 5月25日61

兄弟们，Hyper3D 又放大招了这次是真的猛... Rodin Gen-2.5发布：最强 3D 生成模型 4 秒生成百万面模型全球首个千万面级3D生成在模型细节上，连毛孔、皮肤微结构这种级别的细节都能还原... 原生贴图纹理，严格对齐几何，涉及到衣物质感和缝线等微小纹理正确，细节和对齐做到正确平衡。思考模式从低到高，最快 4 秒出稿 - 极低模式 - 4 秒出稿 - 快速制作简易资产、批量测试实验 - 低模式 - 9 秒出稿 - 简约风模型，小型硬表面道具制作 - 中模式 - 20 秒出稿 - 结构与细节表现均衡 - 高模式 - 40 秒出稿 - 高品质资产，结构层次丰富，表面平滑 - 极高模式 - 80 秒出稿 - 微观细节专业资产一张参考图就能出贴图模型原生 3D 贴图算法，直接在三维空间里生成纹理，360° 无死角覆盖，转到背面底部都不会糊，支持 PBR 材质，光影一键预处理。用过之前那些贴图拼接糊成一坨的工具的兄弟，应该知道这个差距有多大。 Faithful 模式严格贴合参考素材，Creative 模式自动优化结构，比如轮胎给你修成完美圆形。最高精度档下还能切 Micro 和 Clean： Micro 给你毛孔级细节，Clean 给你干净平滑的几何，做风格化或者后续上动画都好用。而且支持同时并行跑 10 个模型，批量探索创意方向直接起飞。背后团队是影眸科技，国人团队，2016 年就开始做 3D 生成。整个行业走"2D 升维 3D"捷径的时候，他们死磕原生 3D 模型，更难，但破面、拓扑混乱这些致命问题，只有这条路能治。今年论文拿了 SIGGRAPH 2025 最佳论文奖，同期获奖的商业公司只有 Google 和 Meta。

译影眸科技推出 Rodin Gen-2.5，号称全球首个千万面级 3D 生成模型。该模型提供从极低（4秒）到极高（80秒）的五档思考模式，以平衡生成速度与细节精度。其原生 3D 贴图算法能在三维空间直接生成纹理，支持 PBR 材质与 360° 无死角覆盖，并提供 Faithful（贴合参考）与 Creative（自动优化）两种贴图模式。该模型已获 SIGGRAPH 2025 最佳论文奖。

Chubby♨️@kimmonismus · 5月25日71

Google DeepMind's AlphaProof Nexus autonomously solved 9 open Erdős problems, some unsolved for 56 years, at a cost of a few hundred dollars per problem. It also proved 44 open OEIS conjectures, resolved a 15-year-old question in algebraic geometry, and discovered a novel algorithmic parameter in optimization theory that humans hadn't found. The core mechanism combines LLM reasoning (Gemini 3.1 Pro hype?!) with Lean formal verification. The AI generates proof attempts, Lean's compiler checks every logical step automatically. No human review needed to confirm correctness. The most surprising finding: a basic agent that simply alternates LLM generation with compiler feedback replicated all 9 Erdős successes. The full-featured system with evolutionary search and reinforcement learning only provided meaningful advantages on the hardest problems. This shows a more recent broader trend: as foundation models improve, simple agentic loops are catching up to complex specialized architectures . What sets this apart from OpenAI's informal proof approach: formal verification acts as an automatic filter. The failure analysis showed the AI frequently hallucinated lemmas it claimed were established results, and often disguised the core difficulty by rephrasing it as a helper lemma. Informal proofs would let these errors pass. Lean catches them immediately. The agent also detected misformalizations in existing mathematical literature, correcting ambiguities in problem statements before solving the corrected versions. It served as both a solver and a diagnostic tool. Current limitations are real. Successes cluster in combinatorics, number theory, and optimization where Lean's math library is mature. Problems requiring substantial new theory remain out of reach. Most Erdős problems still weren't solved tho.

译Google DeepMind的AlphaProof Nexus系统自主解决了9个开放的Erdős问题（部分问题存在56年），每个问题的成本约几百美元。它还证明了44个OEIS猜想，解决了一个15年的代数几何问题，并在优化理论中发现了新算法参数。其核心机制是将大语言模型的推理能力与Lean形式化验证系统结合，Lean自动检查每一步逻辑，无需人工复核。研究发现，一个仅交替使用大语言模型生成与编译器反馈的基础智能体，便能复现全部9个Erdős问题的成功。该系统还能检测并修正现有数学文献中的表述错误。其局限在于成功案例集中于Lean数学库成熟的领域（如组合、数论），仍无法解决需要全新理论的大问题。

Chubby♨️@kimmonismus · 5月24日54

OpenAI: carefully rolls out GPT-5.5-Cyber through Trusted Access for verified defenders Anthropic: “Claude Mythos is too powerful for public release” Also Anthropic: accidentally shows Mythos in the UI and immediately runs out of capacity 2026 AI launches are absolut cinema. Anyways: Mythos incoming?

译2026年AI大模型发布呈现鲜明对比与戏剧性。OpenAI采取审慎策略，通过“可信访问”机制，向验证过的安全专家限量推出GPT-5.5-Cyber。与之形成反差的是，Anthropic官方曾宣称其Claude Mythos模型因过于强大不适合公开发布，但该模型却意外短暂出现在用户界面中，并导致服务容量告罄。现有信息表明，Anthropic正为Claude Mythos（代号claude-mythos-1-preview）在Claude Code与Claude Security等企业产品线上的发布做准备，但这并不等同于面向公众的全面开放。整个过程充满了计划与意外的交织。

🚨 AI News | TestingCatalog@testingcatalog · 5月24日65

ANTHROPIC 🔥: Mythos 1, "claude-mythos-1-preview", is being prepared for a release on Claude Code and Claude Security. The model became visible for a short amount of time on Claude; besides that, new strings mentioning Mythos have been added. > Access to the Claude Mythos model in Claude Code and Claude Security. It still doesn't mean the general public will have access to this exact model, according to Anthropic's earlier communication. More below 👇

译ANTHROPIC 🔥：Mythos 1，即"claude-mythos-1-preview"，正准备在Claude Code和Claude Security上发布。该模型曾在Claude上短暂可见；此外，新增了提及Mythos的字符串。 > 在Claude Code和Claude Security中访问Claude Mythos模型。根据Anthropic之前的沟通，这仍不意味着公众将能访问此确切模型。更多详情请见下方 👇

StepFun@StepFun_ai · 5月24日76

StepAudio 2.5 Realtime is live! Real-time voice that picks up what you actually mean — tone, pace, pauses, sighs, even the half-laugh mid-sentence. - Top-tier paralinguistic perception — reads tone, pace, micro-emotions - Bring-your-own persona via API — personality, backstory, quirks, language style - 10,000+ native personas → millions of feature combinations - 5 preset personas to try out of the box - ZH/EN RLHF-tuned to hold character even under roleplay stress tests. Try it → https://www.stepfun.com/studio/audio?tab=voice-chat Model card: https://stepaudiollm.github.io/step-audio-2.5-realtime/

译StepAudio 2.5 Realtime 是一款实时语音交互模型。其核心优势在于能感知用户的副语言特征，如语气、节奏、停顿甚至轻叹，从而理解话语背后的真实意图。该模型支持通过 API 高度定制角色人格与说话风格，内置超过10,000种可组合的预置角色，并提供5种开箱即用的预设角色供体验。同时，模型经过RLHF优化，能在复杂的角色扮演压力测试中稳定保持设定的人设。支持中英文双语交互。

Chubby♨️@kimmonismus · 5月24日56

Looks like GPT-5.6 release is very close. Really looking forward to it. 5.5 already is an insanely good model. Hope it gets a bit better vibe tho

译看起来 GPT-5.6 的发布非常接近了。真的很期待。5.5 已经是一个极其出色的模型了。希望它在“感觉”上能再好一点。 [引用 @synthwavedd]：我非常激动地宣布，看起来他们开始在 GPT-5.6 的 UI 去臃肿化方面取得进展了！🥹 这是第一个没有任何 UI 指引（“默认”）的提示词效果——我们正在取得进展……

Chubby♨️@kimmonismus · 5月24日66

Can found opus 4.8 in Google Vertex. Can’t confirm any of this tho. However, the fact that Sonnet 4.8 is coming soon has been common knowledge since the data leak. The inclusion of Opus 4.8 in Vertex comes as a surprise to me - though, considering the accelerated release schedule and the massive success of GPT-5.5, it is certainly plausible. Couldn’t be more excited!

译在Google Vertex上发现了Opus 4.8。不过这一点尚无法确认。然而，自数据泄露以来，Sonnet 4.8即将到来已是众所周知。 Opus 4.8出现在Vertex上让我感到意外——不过，考虑到加速的发布节奏和GPT-5.5的巨大成功，这确实有可能。简直太令人兴奋了！

🚨 AI News | TestingCatalog@testingcatalog · 5月23日65

ANTHROPIC 🔥: Mythos class models are expected to become generally available after getting stronger safeguards, according to the latest Project Glasswing update. > And in the near future, once we’ve developed the far stronger safeguards we need, we look forward to making Mythos-class models available through a general release. Soon? 👀

译Anthropic在Project Glasswing项目最新进展中宣布，Mythos级模型在开发出更强的安全防护措施后，预计将向公众开放。此前，Anthropic与合作伙伴通过该项目已发现超过一万个关键或高危软件漏洞。这为模型后续的强安全防护开发提供了重要背景与方向。

Rohan Paul@rohanpaul_ai · 5月22日75

BitCPM-CANN just became the world’s first open-sourced 1.58-bit ternary LLM trained entirely on Chinese-developed AI infrastructure. Developed by ModelBest, Tsinghua Univ, and OpenBMB community, the entire training pipeline, from quantization operators and algorithms to the full-stack framework, was natively executed on Huawei Ascend 910B NPUs. 1.58-bit ternary weights use only 3 weight states, so the model needs far less memory when deployed on phones, PCs, cars, and local industrial devices. The harder achievement is the training system behind it: QAT, STE, low-bit operators, algorithms, framework work, and reproducible training scripts all had to hold together on Ascend 910B. When hardware costs rise, the winning model is not merely the one that scores higher in a chart, but the one that can be trained, reproduced, deployed, and improved under real constraints.

译ModelBest、清华大学与OpenBMB社区联合发布了BitCPM-CANN，这是全球首个完全基于华为昇腾910B NPU训练的开源1.58比特三元大模型。其核心创新在于采用仅含三种权重状态的极低比特量化技术，使模型内存占用相比BF16降低约6倍，可高效部署于手机、电脑、车载设备等边缘端。更关键的是，整个训练全栈（从量化算子到框架）均在昇腾上原生构建与验证，而非简单移植。该模型家族（0.5B-8B）在多项基准测试上保持了全精度模型95-97%的性能，为资源受限环境下部署和复现大模型提供了可落地的解决方案。

Runway@runwayml · 5月22日71

Yesterday we released Aleph 2.0, our upgraded video editing model that lets you change exactly what you want while keeping everything else the same. Available inside our new Edit Studio, you can work with multishot sequences up to 30 seconds long at 1080p. Learn how to get started today with Runway Academy.

译昨日我们发布了Aleph 2.0，这是我们升级后的视频编辑模型，可让您在保持其他内容不变的情况下，精确修改所需部分。该模型现已集成于全新的Edit Studio中，支持处理最长30秒、1080p分辨率的多镜头序列。立即通过Runway Academy学习如何开始使用。

Chubby♨️@kimmonismus · 5月22日41

June will be huge. -Gemini 3.5 pro (confirmed) -GPT-5.6 (rumored but pretty confident for a release) Still waiting for annoucements Claude Sonnet 4.8 (Claude-Code-/Source-Map-Leak)

译六月将是巨大的一个月。 - Gemini 3.5 pro（已确认） - GPT-5.6（传闻中，但发布可能性很高）仍在等待官方公告 Claude Sonnet 4.8（Claude-Code-/Source-Map-Leak）

Alibaba Cloud@alibaba_cloud · 5月22日69

Qwen3.7-Max is now live on Model Studio with 50% OFF (May 22–June 22)! Reliable Cross-Framework Support. Designed for turnkey deployment and seamless integration into your existing technical stack. 🚀 Try it: https://int.alibabacloud.com/m/1000413314/

译Qwen3.7-Max现已在Model Studio上线，限时五折（5月22日至6月22日）！可靠的跨框架支持。专为一键部署和无缝集成到现有技术栈而设计。 🚀 立即体验：https://int.alibabacloud.com/m/1000413314/

Alibaba Cloud@alibaba_cloud · 5月22日79

Qwen3.7-Max is now live on Model Studio with 50% OFF (May 22–June 22)! 1M Context Window. Built to process and retain large-scale enterprise data streams flawlessly during long-context agent reasoning. 🚀 Try it: https://int.alibabacloud.com/m/1000413314/

译Qwen3.7-Max现已登陆Model Studio，限时五折（5月22日至6月22日）！ 100万上下文窗口。专为在长上下文智能体推理中，无缝处理和保留大规模企业数据流而构建。 🚀 立即体验：https://int.alibabacloud.com/m/1000413314/

Alibaba Cloud@alibaba_cloud · 5月22日82

Qwen3.7-Max is now live on Model Studio with 50% OFF (May 22–June 22)! Flagship Coding Agent Performance. Engineered for reliable, multi-step software execution with minimal human intervention. 🚀 Try it: https://int.alibabacloud.com/m/1000413314/

译Qwen3.7-Max现已登陆Model Studio，限时五折（5月22日至6月22日）！旗舰级编程智能体性能。专为可靠、多步骤的软件执行而设计，最大限度减少人工干预。 🚀 立即体验：https://int.alibabacloud.com/m/1000413314/

Qwen@Alibaba_Qwen · 5月22日77

⚡️⚡️

译⚡️⚡️ [引用 @OpenRouter]：来自 @Alibaba_Qwen 的全新 Qwen3.7-Max 已在 OpenRouter 上线。作为 Qwen3.7 系列的旗舰模型，专为以智能体为中心的工作而构建：编程、办公与生产力任务，以及长周期自主执行。在编程和智能体基准测试中较 Qwen3.6 有显著提升，并支持显式的提示缓存以处理重复上下文。

小互@xiaohu · 5月22日71

网易有道今天开源了 Confucius4 双模型：一个做数学视觉推理，一个做语音克隆有的公司在卷参数规模，有道这次卷的是工程精度和落地成本开源直接放的是完整权重，不是只给 API 诚意满满多模态：http://huggingface.co/netease-youdao/Confucius4 语音：http://github.com/netease-youdao/Confucius4-TTS

译网易有道开源Confucius4双模型，包括一个专注数学视觉推理的多模态模型，以及一个用于语音克隆的TTS模型。此次开源直接提供完整权重，而非仅提供API，强调在工程精度和实际部署成本上的投入，而非单纯追求参数规模。模型已发布于HuggingFace和GitHub平台。

karminski-牙医@karminski3 · 5月22日71

400 TPS！实测智谱 GLM-5.1 以10倍速狂飙智谱刚刚发布了 glm-5.1-highspeed! 赶紧拿脚本测了一下, 输出速度能干到 300 tps+, 首 token 延迟稳定在1s. 这个数据猛到什么程度... 同样的脚本我测了下 glm-5.1 的接口, 输出速度只有 35 tps, 首 token 延迟干到了 9s. 基本是10倍速提升. 使用 glm-5.1 编程或者养龙虾/爱马仕的同学可以直接搞套餐开这个新模型了. 能做到直接吐字不用等. GLM-5.1 单次激活40B, 按照bf16精度计算, 即使不考虑 kvcache 也要80GB的显存, 那么达到 35 tps, 这就是 80x35= 2.8TB/s 的显存带宽. 而如果拉升到 300 tps, 那就是 80x300=24TB/s 的显存带宽. 如果按照 H100 SXM: 3.35 TB/s 计算, 之前单卡的带宽就能达到了, 现在需要8卡的张量并行才可以(当然张量并行也能提升请求并行度). 结果官方发布的技术文档更炸裂, 他们跟 TileRT 团队合作, 从底层把推理链路重做, 直接把显卡性能榨干了！简单说, 传统推理像流水线工厂: CPU 当调度器, 一层层发指令给 GPU, 算完一层把结果写回显存, 再读出来算下一层, 中间还要不停同步. 大量时间其实耗在这些"调度 + 搬运"上, 而不是纯计算. TileRT 的思路是反着来的: 编译阶段就把整个推理流程编排好, 变成一个常驻 GPU 的大 kernel, 推理启动后基本只 launch 一次, 后面 GPU 自己跑. 单卡里面像计算、IO、通信都拆成更小的 tile 级任务; 中间结果尽量不走大显存, 能在寄存器、共享内存、L2 cache 里直传就直传. 多卡则进行分工, 比如 GPU 0 专门干 Sparse Indexer, GPU 1–7 跑 MLA 注意力主干. (另外还有很多优化细节, 大家可以看官方发布的技术文档) 上面这些全都不用 CPU 再深度参与了, 所以提升了大量的性能. so, 正在使用 GLM-5.1 的同学抓紧切模型! #glm51 #glm51highspeed #智谱 #GLM

译智谱近期推出GLM-5.1-Highspeed模型，实测输出速度达300+ tokens/s，首token延迟约1秒，相较于标准版GLM-5.1的35 tps和9秒延迟，性能提升约10倍。技术上，智谱联合TileRT团队重构了推理链路，通过将整个推理流程编译为常驻GPU的大kernel，大幅减少CPU调度与数据搬运开销，并优化单卡内的计算、IO分配及多卡间任务协作，显著提升GPU利用效率。该模型单次激活40B参数，高性能运行需依托多卡并行，建议现有用户切换使用以获得更实时的生成体验。

Alibaba Cloud@alibaba_cloud · 5月22日76

Qwen3.7-Max is now live on Novita AI！ Alibaba Cloud and Novita are teaming up to bring you the latest model built for the Agent Era. Now go build something wild 🚀

译阿里云与Novita AI达成合作，在Novita AI平台首发推出Qwen3.7-Max模型，标志着双方共同推进Agent时代的发展。该模型专为智能体时代设计，强调从“回答”到“执行”的能力跃升。核心优势包括：强大的代码生成与软件工程工作流能力；可靠的智能体编排与多智能体系统协作能力；支持长周期、自主化的复杂任务执行；且具备框架与技术栈无关的兼容性，可适配多种主流开发环境。