Multimodal at the frontier. Built around your business.

译Luma Labs 推出的 UNI-1.1-Max 和 UNI-1.1 多模态模型在 Image Arena 的文本生成图像与图像编辑综合排名中位列第三，且未采用智能体搜索技术。具体来看，在文本生成图像竞技场中，两款模型分别排名第六和第七；在多图像编辑和单图像编辑竞技场中，它们均进入前十一名，其中 UNI-1.1-Max 在单图像编辑中排名第七。这一成绩标志着 Luma Labs 在多模态前沿领域取得了扎实进展。

TestingCatalog News 🗞@testingcatalog · 5月5日57

GOOGLE 👀: Gemini 3.2 Flash became available on the Gemini app for a short time for some users. So far we have: - Gemini 3.2 Flash flashing on Gemini - Updated Gemini 3 Flash models on LM Arena in testing - Deprecation notice for Gemini 2 Flash on Vertex AI, promising upcoming Flash GA update - Google I/O coming on May 19, rumored to arrive with a Gemini 3.5 announcement Flash is flashing! ⚡⚡⚡

译GOOGLE 👀：Gemini 3.2 Flash 曾短暂地在部分用户的 Gemini 应用中可用。截至目前我们已有： - Gemini 3.2 Flash 在 Gemini 上闪现 - 测试中的 LM Arena 上已更新 Gemini 3 Flash 模型 - Vertex AI 上发布 Gemini 2 Flash 的弃用通知，承诺即将推出 Flash 正式发布更新 - Google I/O 将于 5 月 19 日举行，传闻将伴随 Gemini 3.5 的发布 Flash正在闪现！⚡⚡⚡

Artificial Analysis@ArtificialAnlys · 5月5日52

Who do you think created the Peanut 🥜 Image model? 👀 Join the discussion on Discord and share your take: https://discord.gg/8bhAmNw5Z2

译匿名文生图模型Peanut在Artificial Analysis文生图竞技场中首次亮相即位列第八。其模型权重预计很快发布，届时将成为领先的开源权重文生图模型。Peanut被定位为新的开源领导者，性能预计将超越Z-Image Turbo、Qwen-Image和FLUX.2 [dev]等现有模型。更多细节和权重文件即将公布。

Artificial Analysis@ArtificialAnlys · 5月5日69

A new anonymous model debuts at #8 in the Artificial Analysis Text to Image Arena! Peanut’s weights are expected to be released soon, which would make it the leading Text to Image Open Weights Model. Peanut is positioned to be the new leading open weights Text to Image model, surpassing Z-Image Turbo, Qwen-Image, and FLUX.2 [dev]. Further details (and weights) coming soon. See example generations from Peanut in the Artificial Analysis Image Arena below 🧵

译一款新的匿名模型在Artificial Analysis文本转图像竞技场中首次亮相，位列第8！Peanut的权重预计即将发布，这将使其成为领先的文本转图像开源权重模型。 Peanut定位为新的领先开源权重文本转图像模型，超越了Z-Image Turbo、Qwen-Image和FLUX.2 [dev]。更多详细信息（及权重）即将公布。查看下方🧵中Artificial Analysis图像竞技场里Peanut的生成示例。

Chubby♨️@kimmonismus · 5月4日62

A little-known startup just landed on the @ArtificialAnlys AI Video leaderboard, now ranked among the top 6 in the world. Very cool @video_rebirth

译初创公司Video Rebirth的文本生成视频模型Bach-1.0 Preview在Artificial Analysis的全球AI视频排行榜上首次亮相即位列第六。其性能与Vidu Q3 Pro、Kling 3.0 Omni 1080p (Pro)及grok-imagine-video等知名模型相当。该模型计划于五月下旬广泛发布。

Rohan Paul@rohanpaul_ai · 5月4日64

A startup in the Top 6 on Artificial Analysis Text-to-Video Leaderboards Alongside Alibaba, ByteDance, and xAI. Video Rebirth came out of nowhere. The AI video leaderboard has been exclusively trillion-dollar companies. Today, a startup just broke in. Video Rebirth. Super realistic results. Their model BACH just hit Top 6 on @ArtificialAnlys.

译初创公司Video Rebirth凭借其模型Bach-1.0 Preview，在Artificial Analysis的文本转视频榜单中首次进入前六名，打破了该榜单长期由阿里巴巴、字节跳动、xAI等万亿美元级巨头主导的局面。其模型性能与Vidu Q3 Pro、Kling 3.0 Omni 1080p (Pro)及grok-imagine-video等顶尖模型相当，并计划于五月下旬广泛发布。这一突破标志着初创企业在高质量AI视频生成领域取得了显著进展，为该领域的竞争格局带来了新的变数。

小互@xiaohu · 5月4日56

据传Google将在本月的Google I/O 大会上发布一个全新的模型将 Gemini 从“聊天助手”推向“全模态生产力入口” 一个名为Omni 的疑似新模型泄露它可能会承担更深的视频与多模态生成能力，甚至让 Gemini 原生支持视频输出，而不只是文字、图片和调用外部视频模型。如果这个方向成立，Gemini 接下来真正要拼的就不只是模型分数，而是“一个入口完成多种内容生产”：写作、图片、视频、长上下文记忆、复杂任务流，全部都在 Gemini 里打通。同时，Gemini 3.2、3.5 也被传正在测试，重点可能会放在更快、更高效的推理体验上。 Ultra 版本则可能继续往长上下文、重记忆、多步骤工作流方向演进，服务那些需要连续执行、反复调用上下文的高价值任务。

译据传谷歌将在I/O大会发布名为“Omni”的新模型，旨在将Gemini从聊天助手升级为集成写作、图片、视频、长上下文记忆与复杂任务流的全模态生产力平台。该模型可能原生支持视频生成与输出，超越现有的Veo 3.1。同时，Gemini 3.2/3.5版本或专注于提升推理速度与效率，而Ultra版本则向长上下文、重记忆及多步骤工作流方向深化。若消息属实，Gemini将成为首个具备视频输出能力的顶级Omni模型。

Artificial Analysis@ArtificialAnlys · 5月4日56

Bach-1.0 Preview from Video Rebirth debuts at #6 on the Artificial Analysis Text to Video Leaderboard (No Audio)! Bach-1.0 Preview is the latest Text to Video model from @video_rebirth, with similar performance to Vidu Q3 Pro, Kling 3.0 Omni 1080p (Pro), and grok-imagine-video. Bach-1.0 Preview is intended for broad release later in May. See example generations from Bach-1.0 Preview in the Artificial Analysis Video Arena below 🧵

译Bach-1.0 Preview from Video Rebirth 在 Artificial Analysis 文本转视频排行榜（无音频）中首次亮相，位列第6！ Bach-1.0 Preview 是来自 @video_rebirth 的最新文本转视频模型，其性能与 Vidu Q3 Pro、Kling 3.0 Omni 1080p (Pro) 和 grok-imagine-video 相近。 Bach-1.0 Preview 计划于五月下旬广泛发布。在下方 Artificial Analysis 视频竞技场中查看 Bach-1.0 Preview 的生成示例 🧵

Chubby♨️@kimmonismus · 5月3日45

Google Omni model incoming. Probably being prepared for google i/o. However i assume they will launch a new video model with it instead of Veo 3.1 since Seedance jumped to the top a few months ago

译据泄露信息显示，谷歌可能正在为其Gemini平台测试一款全新的Omni模型，专注于视频生成功能，其界面标语为“由Omni驱动”。该模型的内部代号接近当前基于Veo的视频工具“Toucan”。分析指出，若谷歌正式发布名为Gemini Omni的视频生成模型，其性能很可能超越现有的Veo 3.1版本。此举若成真，Gemini将成为首个具备视频输出能力的顶级Omni模型，相关进展或于即将到来的Google I/O大会上正式公布。

TestingCatalog News 🗞@testingcatalog · 5月3日58

GOOGLE I/O 🚨: A NEW OMNI MODEL IS BEING TESTED ON GEMINI FOR VIDEO GENERATION! > "Start with an idea or try a template. Powered by Omni." > This is a new leaked headline from the video generation tab on Gemini. > Omni appears close to "Toucan", an internal name of the current video generation tool powered by Veo. > If Google plans to release Gemini Omni for video generation, it would likely outperform Veo 3.1. > If true (as it is still highly speculative), Gemini will be the first top-tier Omni model with video output! Google I/O 2026 will be hot 🔥

译谷歌正在其Gemini平台测试一款名为“Omni”的新模型，专注于视频生成功能。泄露信息显示，该模型的界面提示用户“从一个想法开始或尝试一个模板”，并注明“由Omni驱动”。这一模型可能与内部代号“Toucan”的视频生成工具密切相关，后者目前由Veo驱动。如果谷歌计划正式发布用于视频生成的Gemini Omni，其性能很可能超越当前的Veo 3.1版本。若消息属实，Gemini将成为首个具备视频输出能力的顶级Omni模型，这标志着谷歌在视频生成领域的重大技术进展，并可能为未来的Google I/O 2026活动预热。此举显示了谷歌在人工智能视频生成方面的持续创新和竞争态势。

Sam Altman@sama · 5月3日48

5.5 xhigh in fast mode is really good i think i got psyoped by twitter on medium for a bit

译5.5 xhigh 在快速模式下真的很棒我想我在推特上被 medium 模式给心理操纵了一阵子

Chubby♨️@kimmonismus · 5月2日51

Nice! Google is preparing for I/o. New models soon

译不错！Google 正在为 I/O 大会做准备。新模型即将推出

TestingCatalog News 🗞@testingcatalog · 5月2日66

GOOGLE 🚨: A new Gemini Flash model has been spotted on LM Arena. Besides that, Vertex AI customers who still use Gemini Flash 2 received an email that it will be distributed soon. > Transition to Gemini 3.1 Flash Lite - Generaly Available soon! Soon 🔜 h/t @hishtadlut

译谷歌新的Gemini Flash模型已在LM Arena上出现。同时，Vertex AI客户收到邮件，Gemini 3.1 Flash Lite即将正式发布。引用推文指出，虽然模型在竞技场中仍显示为“Gemini 3 Flash”，但其输出质量已跃升两个层级，性能更接近当前的Gemini 3.1 Pro，是一次重大升级，实际版本可能是3.1、3.2或3.5 Flash。

Elon Musk@elonmusk · 5月1日55

Grok 4.3

译Grok 4.3 此次发布显示运行 Artificial Analysis Intelligence Index 的成本效益有所提高，Grok 4.3 在智能与成本的帕累托边界上表现稳健。得益于输入 token 价格降低 37.5% 和输出 token 价格降低 58.3%，运行 Intelligence Index 评估的成本为 395 美元，较 Grok 4.20 0309 v2 整体下降约 20%。

Chubby♨️@kimmonismus · 5月1日57

Grok 4.3 is a very good model especially when you think its only 500m parameters! xAI's Grok 4.3 scores 53 on the Artificial Analysis Intelligence Index with ~40% lower input and ~60% lower output pricing vs Grok 4.20, making it one of the most cost-efficient models at its intelligence tier. Biggest gain: a 321-point Elo jump on real-world agentic tasks (GDPval-AA), though it still trails GPT-5.5 by a wide margin.

译xAI发布的Grok 4.3模型在Artificial Analysis Intelligence Index上获得53分，相比Grok 4.20输入成本降低约40%，输出成本降低约60%，性价比突出。其最大亮点是在真实世界代理任务（GDPval-AA）上的ELO评分跃升321点至1500，超越了Gemini 3.1 Pro Preview和Muse Spark等模型，但仍大幅落后于GPT-5.5。该模型在指令遵循和客服任务上表现强劲，同时在Omniscience基准上准确率提升但幻觉率增加。总体而言，Grok 4.3以更低成本实现了更高的智能指数得分，成为同智能层级中成本效益较高的模型之一。

TestingCatalog News 🗞@testingcatalog · 5月1日54

Grok 4.3 is now available on the API 👀

译Grok 4.3 现已可在 API 上使用 👀

Elon Musk@elonmusk · 5月1日61

Grok

译Grok Grok-4.3 的发布价格低于 Grok-4.2，同时智能体性能大幅跃升：在 @ArtificialAnlys 的 GDPval-AA 基准上 ELO 分数提升 321 分至 1500，尽管价格更低，却超越了其他顶级模型。

Berryxia.AI@berryxia · 5月1日54

Gemini Embedding 2 已正式发布！ RAG 知识库的应用又可以支持的更好了。

Berryxia.AI@berryxia · 5月1日46

OpenRouter 又上了匿名新模型Owl Alpha！ 1M 上下文，强大的工具调用能力！猜猜他是谁家的哈哈😂

OpenRouter@OpenRouter · 5月1日68

The new Grok-4.3 from @xai is live on OpenRouter! Grok-4.3 releases at a lower price than Grok-4.2, while seeing a large jump in agentic performance: a 321 point increase to 1500 ELO on @ArtificialAnlys GDPval-AA, surpassing other top models despite the lower price.

译@xai 的新模型 Grok-4.3 现已在 OpenRouter 上线！ Grok-4.3 以比 Grok-4.2 更低的价格发布，同时在代理性能上实现大幅跃升：在 @ArtificialAnlys 的 GDPval-AA 基准上 ELO 分数提升 321 点至 1500，尽管价格更低，但仍超越了其他顶级模型。

Artificial Analysis@ArtificialAnlys · 5月1日54

Suno V5.5 lands at #1 on both the Artificial Analysis Instrumental and Vocals Leaderboards, a notable improvement over Suno's previous V5 model! Suno V5.5 is the latest music generation model from @Suno, released alongside three new features that focus on personalization and identity: ➤ Voices: create a singing voice for generated tracks based on an uploaded vocal sample ➤ Custom Models: personalize up to 3 versions of Suno V5.5 to reflect your own style ➤ My Taste: Suno learns the genres, moods and styles you gravitate towards for more personalized recommendations Suno V5.5 is available via the Suno platform on Pro and Premier subscription tiers, starting at $8/month (~500 songs) when billed annually, with commercial rights included. See more details and listen to samples below 🧵

译Suno公司最新发布的音乐生成模型V5.5，在Artificial Analysis的器乐和人声排行榜上均位列第一，性能较前代V5模型有显著提升。本次更新重点聚焦个性化与身份特征，推出了三项新功能：用户可通过上传人声样本生成定制演唱音色；可个性化定制最多三个反映自身风格的模型版本；系统还能学习用户偏好的音乐流派、情绪和风格，以提供个性化推荐。该模型已通过Suno平台向Pro和Premier订阅用户开放，年费订阅起价为每月8美元（约含500首歌曲生成额度），且包含商业使用权。

Artificial Analysis@ArtificialAnlys · 5月1日66

xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xAI just above Muse Spark and Claude Sonnet 4.6 on the Intelligence Index, and a 4 points ahead of the latest version of Grok 4.20. Grok 4.3 improves its Artificial Analysis Intelligence Index score while reducing cost to run the benchmark suite. Key Takeaways: ➤ Grok 4.3 improves on cost-per-intelligence relative to Grok 4.20 0309 v2: it scores higher on the Intelligence Index while costing less to run the full benchmark suite. Grok 4.3 costs $395 to run the Artificial Analysis Intelligence Index, around 20% lower than Grok 4.20 0309 v2, despite using more output tokens. This makes it one of the lower-cost models at its intelligence level ➤ Large increase in real world agentic task performance: The largest single benchmark improvement is on GDPval-AA, where Grok 4.3 scores an ELO of 1500, up 321 points from Grok 4.20 0309 v2’s score of 1179 Grok 4.3, surpassing Gemini 3.1 Pro Preview, Muse Spark, Gpt-5.4 mini (xhigh), and Kimi K2.5. Grok 4.3 narrows the gap to the leading model on GDPval-AA, but still trails GPT-5.5 (xhigh) by 276 Elo points, with an expected win rate of ~17% against GPT-5.5 (xhigh) under the standard Elo formula ➤ Grok 4.3’s performs strongly on instruction following and agentic customer support tasks. It gains 5 points on 𝜏²-Bench Telecom to reach 98%, in line with GLM-5.1. Grok 4.3 maintains an 81% IFBench score from Grok 4.20 0309 v2 ➤ Gains 8 points on AA-Omniscience Accuracy, but at the cost of lower AA-Omniscience Non-Hallucination Rate of 8 points, so Grok 4.20 0309 v2 still leads AA-Omniscience Non-Hallucination Rate, followed by MiMo-V2.5-Pro, in line with Grok 4.3 Congratulations to @xAI and @elonmusk on the impressive release!

译xAI推出Grok 4.3模型，其在Artificial Analysis智能指数得分达53，超越Muse Spark等模型，较前代提升4分。模型在显著降低成本的同时保持智能水平，输入与输出价格分别降低约40%和60%。在真实世界智能体任务上表现突出，GDPval-AA基准得分大幅提升至1500 ELO，超越Gemini 3.1 Pro Preview等多款模型，但仍落后于GPT-5.5 (xhigh)。其在指令遵循和客服任务上表现强劲，但AA-Omniscience非幻觉率略有下降。

Ant Ling@AntLingAGI · 5月1日76

Ecosystem-first approach continued! Ling-2.6-1T officially landed on @huggingface and the official inference is now live via @novita_labs. Experience the efficiency of Ling-2.6-1T for yourself, front and center on HF model card page! 🔥

译AntLingAGI团队宣布Ling-2.6-1T模型正式开源，已登陆Hugging Face平台，并通过Novita Labs提供官方推理体验。该模型采用混合专家架构，总参数1万亿、激活参数630亿，核心优化方向为“令牌效率”以满足真实生产需求。具体表现为：低令牌开销，能在无需冗长推理链的情况下保持强大智能；可靠的多步执行能力，提升指令、工具、上下文和工作流的控制水平；生产就绪的部署特性，覆盖从代码生成到错误修复的任务，并广泛兼容各类智能体框架。团队旨在通过降低测试、部署、定制和构建的难度，为开发者创造价值。

Google AI@GoogleAI · 5月1日69

Last week, we made Gemini Embedding 2, our first natively multimodal embedding model, available to the general public. Since then, developers have used it to build video analysis tools, visual shopping assistants, and more. But you might be wondering... what is an embedding model? 🤔 Let’s break it down! 1. What is it? Think of an embedding model as a "universal translator." It takes text, images, video, and audio data and turns them into a long string of numbers, like a unique digital fingerprint. 2. How does it work? Historically, search has been text only. Now, instead of just matching data by keyword, Gemini Embedding 2 maps multiple modalities in the same space based on meaning. It "feels" the connection between a video of a soccer goal and the words "game-winning shot" without needing tags. For example, "ocean" and "waves" are placed close together, but "ocean" and "toaster" are miles apart. 3. How can you use it? Developers have been using it to incorporate smarter search functionality into their builds. This means creating tools where you can snap a photo of a product and type "find this in yellow," or search through thousands of hours of video by describing what happens in a scene. 4. Ready to try it out for yourself? You can start using it today via the Gemini API or the Gemini Enterprise Agent Platform.

译谷歌上周正式向公众发布了其首个原生多模态嵌入模型Gemini Embedding 2。该模型如同“通用翻译器”，能将文本、图像、视频和音频数据转化为独特的数字向量。其核心突破在于不再依赖关键词匹配，而是基于语义将不同模态的数据映射到同一空间，从而理解内容间的深层联系。开发者已利用该模型构建视频分析工具、视觉购物助手等应用，实现通过拍照或描述场景进行智能搜索的功能。模型现可通过Gemini API或Gemini Enterprise Agent平台使用。

Google AI Developers@googleaidevs · 5月1日58

Now that Gemini Embedding 2 is GA, let’s explore what the model unlocks — from agentic multimodal RAG to visual search — as it maps text, images, video, audio, and documents into a unified embedding space.

译既然Gemini Embedding 2已正式发布，让我们探索该模型解锁的功能——从智能多模态RAG到视觉搜索——因为它能将文本、图像、视频、音频和文档映射到统一的嵌入空间。

SenseTime@SenseTime_AI · 4月30日59

𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭 𝗟𝗶𝘁𝗲 𝗦𝗲𝗿𝗶𝗲𝘀: 𝗦𝗺𝗮𝗹𝗹 𝗦𝗰𝗮𝗹𝗲, 𝗕𝗶𝗴 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝘆 A new generation of natively unified multimodal models, delivering commercial-grade performance at a compact 8B / A3B scale: • 𝗖𝗼𝗺𝗽𝗹𝗲𝘅 𝗶𝗻𝗳𝗼𝗴𝗿𝗮𝗽𝗵𝗶𝗰 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻 with strong semantic integrity and pixel level precision • 𝗛𝗶𝗴𝗵 𝗹𝗮𝘆𝗼𝘂𝘁 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 with 𝗮𝗰𝗰𝘂𝗿𝗮𝘁𝗲 𝗮𝗻𝗱 𝗿𝗲𝗹𝗶𝗮𝗯𝗹𝗲 𝘁𝗲𝘅𝘁 𝗿𝗲𝗻𝗱𝗲𝗿𝗶𝗻𝗴 • 𝗜𝗻𝗱𝘂𝘀𝘁𝗿𝘆-𝗳𝗶𝗿𝘀𝘁 𝗰𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗶𝗺𝗮𝗴𝗲–𝘁𝗲𝘅𝘁 𝗴𝗲𝗻𝗲𝗿𝗮𝘁𝗶𝗼𝗻, enabling unified reasoning and consistent visual style Now fully open-sourced: 𝗚𝗶𝘁𝗛𝘂𝗯: https://github.com/OpenSenseNova/SenseNova-U1 𝗛𝘂𝗴𝗴𝗶𝗻𝗴 𝗙𝗮𝗰𝗲: https://huggingface.co/collections/sensenova/sensenova-u1 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨1 𝗦𝗸𝗶𝗹𝗹𝘀: https://github.com/OpenSenseNova/SenseNova-Skills 𝗗𝗶𝘀𝗰𝗼𝗿𝗱: https://discord.gg/cxkwXWjp @huggingface @github

译SenseNova U1 Lite Series是新一代原生统一的多模态模型，在紧凑的8B/A3B规模下提供商业级性能。其核心能力包括复杂信息图生成，具备强语义完整性和像素级精度；高布局一致性，实现准确可靠的文本渲染；以及行业首创的连续图像-文本生成，支持统一推理和一致视觉风格。该模型现已完全开源，相关代码和资源可通过GitHub、Hugging Face等平台获取。

OpenRouter@OpenRouter · 4月30日59

New stealth model: Owl Alpha! Owl is a high-performance foundation model designed for agentic workloads. Powerful tool use capabilities and a 1M context window, ready for use in all your favorite productivity apps. Try it now and share feedback to improve the model!

译全新隐形模型：Owl Alpha！ Owl 是一款专为智能体工作负载设计的高性能基础模型。具备强大的工具使用能力和 100 万上下文窗口，可随时用于您喜爱的所有生产力应用。立即试用并分享反馈以改进模型！

Artificial Analysis@ArtificialAnlys · 4月30日56

Tencent has released Hy3-preview, an open weights reasoning model scoring 42 on the Artificial Analysis Intelligence Index, trailing recent open weights peers Hy3-preview is the latest model from @TencentHunyuan. It is a 295B total / 21B active parameter Mixture-of-Experts model, smaller than its December 2025 predecessor Tencent HY 2.0 (406B total / 32B active). Recent leading open weights reasoning models include Qwen3.6 27B (Reasoning, 46), DeepSeek V4 Flash (Reasoning, Max Effort, 47, 284B / 13B) and GLM-5.1 (Reasoning, 51, 744B / 40B). The Intelligence Index is the Artificial Analysis synthesis metric incorporating 10 evaluations covering agentic tasks, coding and scientific reasoning. Key takeaways: ➤ Hy3-preview trails recent open weights peers on GDPval-AA. Hy3-preview scores an Elo of 1235 on GDPval-AA, our agentic real-world work tasks benchmark, behind Qwen3.6 27B (Reasoning, 1414), DeepSeek V4 Flash (Reasoning, Max Effort, 1388) and GLM-5.1 (Reasoning, 1535). GDPval-AA tests models on real-world tasks across 44 occupations and 9 major industries. ➤ Hy3-preview ties GLM-5.1 (Reasoning) on CritPt despite scoring nearly 10 Intelligence Index points lower. Hy3-preview scores 4.6% on CritPt (research-level physics), matching GLM-5.1 (Reasoning, 51 on the Intelligence Index) and ahead of Qwen3.6 27B (Reasoning, 1.1%) but behind DeepSeek V4 Flash (Reasoning, Max Effort, 7.1%). It trails the open weights leaders, including DeepSeek V4 Pro (Reasoning, Max Effort, 12.9%) and Kimi K2.6 (8.0%). ➤ Hy3-preview used ~125M output tokens to run the Intelligence Index. This is ~12% more than GLM-5.1 (Reasoning, 112M) and less than Qwen3.6 27B (Reasoning, 144M) and DeepSeek V4 Flash (Reasoning, Max Effort, 241M). ➤ AA-Omniscience is a relative weakness compared to peers. Hy3-preview scores -35 on the Artificial Analysis Omniscience Index with 28% accuracy and an 87% hallucination rate. This trails DeepSeek V4 Flash (Reasoning, Max Effort, -23), Qwen3.6 27B (Reasoning, -20) and GLM-5.1 (Reasoning, 2). Other information: ➤ Size: 295B total parameters, 21B active parameters ➤ Context window: 256K tokens ➤ License: Tencent HY Community License Agreement, with restricted commercial use ➤ Availability: Weights are available on @huggingface Face and the model is also available on @SiliconFlowAI at $0/$0 per 1M input/output tokens

译腾讯发布开源混合专家模型Hy3-preview，总参数量2950亿，激活参数量210亿。其在Artificial Analysis综合智能指数上得分42，落后于近期开源的GLM-5.1、DeepSeek V4 Flash及Qwen3.6 27B等推理模型。具体评测表现不均衡：在真实世界任务基准GDPval-AA上落后于主要竞品，但在研究级物理评测CritPt上与高分模型GLM-5.1持平；其相对弱项在于AA-Omniscience指数，幻觉率较高。模型采用Tencent HY社区许可协议，商业使用受限，已在Hugging Face和SiliconFlowAI平台提供。

Berryxia.AI@berryxia · 4月30日55

今天看到一条容易被刷掉的消息，但越想越觉得有意思。 LMArena 文本榜最新更新，文心 5.1 Preview 拿下 1476 分，国内第一，全球前十五唯一国产模型，排在 GPT-5.5 和 DeepSeek-V4-Pro 前面。这事本身已经够新闻了。但真正让我多看一眼的，是另一个被忽略的细节。 DeepSeek V4 发了，文心 5.1 Preview 也发了。两家最受瞩目的国产旗舰，主战场都还是文本模型。这一年 AI 圈的声量几乎全在 Agent、多模态、视频生成、推理链。文本？文本好像已经是上个时代的故事。但为什么最强的旗舰，发出来还是文本？因为文本能力是大模型的地基。代码、推理、多模态，全都从文本「长出来」的啊。代码是受限语法的文本，推理是语言层面的符号演算，多模态对齐相当大一部分工作是把信号映射回语言空间。地基差一节，上面所有能力都跟着差一节。这不是行业落伍，是在告诉你一件事，文本依旧是模型拉开差距的分水岭。

译文心5.1 Preview在LMArena文本榜以1476分位列国内第一，是全球前十五名中唯一的国产模型，排名超越GPT-5.5与DeepSeek-V4-Pro。尽管AI领域热点转向Agent与多模态，但DeepSeek V4与文心5.1等旗舰模型仍以文本为核心。作者强调文本能力是大模型的地基，代码、推理等多维度能力均由此衍生，地基差异直接影响上层性能，因此文本仍是模型拉开差距的关键分水岭。引用推文显示，文心5.1在数学、法律与政府、商业管理及软件服务等类别表现突出。

Alibaba Cloud@alibaba_cloud · 4月30日68

Introducing HappyHorse, the latest breakthrough from Alibaba Cloud. Key Features Demonstrated: - Cinematic 1080p Quality: Crystal clear visuals that breathe life into your ideas. - Native Audio-Visual Sync: Perfect lip-sync and sound alignment generated instantly. - Multi-Shot Consistency: Maintain character identity across complex scenes and camera movements. - Instant Generation: Go from prompt to production in seconds. Try HappyHorse for FREE today: https://int.alibabacloud.com/m/1000412663/

译阿里云推出最新AI视频生成模型HappyHorse。该模型具备多项突破性功能：可生成影院级1080p高清画质；原生实现精准的音画同步，确保口型与声音对齐；在复杂场景和镜头切换中保持多镜头角色一致性；支持从文本提示到视频成片的秒级即时生成。用户现可免费试用。

Alibaba Cloud@alibaba_cloud · 4月30日65

Qwen3.6-Plus is now available on @togethercompute. Ship it.

译Qwen3.6-Plus 现已在 @togethercompute 上线。快来使用吧。

宝玉@dotey · 4月30日54

Sam Altman 刚宣布，OpenAI 将在未来几天向“关键网络安全防御者”推送 GPT-5.5-Cyber，一个专门为网络安全打造的前沿模型。他说 OpenAI 会和整个行业生态及政府合作，建立可信的访问机制，目标是尽快帮助保护企业和基础设施。

Sam Altman@sama · 4月30日69

we're starting rollout of GPT-5.5-Cyber, a frontier cybersecurity model, to critical cyber defenders in the next few days. we will work with the entire ecosystem and the government to figure out trusted access for cyber; we want to rapidly help secure companies/infrastructure.

译我们即将在未来几天内向关键网络安全防御者推出GPT-5.5-Cyber，这是一个前沿网络安全模型。我们将与整个生态系统及政府合作，为网络安全领域探索可信访问机制；我们希望迅速帮助企业和基础设施提升安全防护。

Baidu Inc.@Baidu_Inc · 4月30日65

ERNIE 5.1 Preview just went live 🚀 With a lighter, more efficient architecture, it delivers strong performance at its scale. And this is just the start — more ERNIE model updates to come at Baidu Create 2026.

译百度ERNIE 5.1 Preview模型正式上线。该模型采用更轻量高效的架构，在总参数量压缩至前代约1/3、激活参数量约1/2的同时，仅消耗可比模型约6%的预训练成本，实现了在其规模下的领先基础性能。根据@arena的Text Arena榜单，ERNIE 5.1 Preview在全球总排名第13位，并位列中国实验室第一。其在多个细分领域进入全球前十，特别是在法律与政府领域排名第一。百度预告将在2026年的Baidu Create大会上发布更多ERNIE模型更新。

Ant Ling@AntLingAGI · 4月30日55

It was very much of a pleasant surprise to see all the cool demos by combing the Ling-2.6-1T with capable and well-received harnesses like @opencode. Thanks to @novita_labs for another great launch together~ 👏

译Ling-2.6-1T正式开源，来自@AntLingAGI。该模型拥有1T总参数和63B活跃参数，专为实际生产设计，具有token高效性，便于开发者测试、部署和定制。从Ling-2.6-flash升级到1T规模，实现了从快速推理到更强推理的跨越。主推文强调，结合@opencode等工具展示了酷炫演示，体现了模型与现有工具的兼容性和实用性，并对@novita_labs的合作发布表示感谢。

Ant Ling@AntLingAGI · 4月30日53

Thanks Adina~ Token efficiency is the key characteristic leading to the next stage. We need to burn tokens wisely and efficiently in order to make the whole industry sustainable. 🤗🤗

译谢谢Adina~ Token效率是引领下一阶段的关键特性。我们需要明智且高效地消耗token，才能使整个行业可持续发展。🤗🤗

Ant Ling@AntLingAGI · 4月30日72

What's the secret sauce behind the flagship instruct model built for fast execution & high efficiency at scale? Reliable infra with the proper optimizations, from the #SGLang friends at @lmsysorg 以为昨天的 100B 已经打满，今日 1T 方知，打得还可以更满～ 🥳 Onto the next optimization~ 🫡

译SGLang团队（隶属于LMSYS Org）揭示了其旗舰指令模型实现快速、高效、大规模执行的关键在于可靠的基础设施与针对性优化。团队宣布对AntLingAGI发布的Ling-2.6-1T万亿参数模型提供Day-0支持。该模型采用快速思考方法，在保持质量的同时，成本可比同类模型降低约4倍，并在AIME26和SWE-bench基准测试中达到SOTA水平。它专为高级编码、复杂推理和大规模智能体工作流设计，具备万亿参数能力与即时模型延迟。团队正持续进行优化，以进一步提升性能。

Ant Ling@AntLingAGI · 4月30日61

Thanks to the dedicated support for Ling-2.6-1T from day0 partner @vllm_project ! As the pioneer of the 1T sized models, we know how important hardware - software - llm co-design is. The best engineering ecosystem collaboration leads to the best optimization and user experience. Let's ROLL together! 🖖

译AntLingAGI 开源了 Ling-2.6-1T 模型，这是一个面向现实世界智能体工作流程的新旗舰模型。作为 1T 参数规模模型的先驱，团队强调了硬件、软件与 LLM 协同设计的重要性。vLLM 项目从发布首日（Day-0）起即提供支持，体现了顶尖工程生态系统的协作。这种合作旨在实现最佳的优化效果与用户体验，共同推动技术进步。

Deedy@deedydas · 4月30日50

Researchers just estimated the size of all the LLMs by asking it knowledge questions of varying degrees of obscurity! – GPT 5.5: ~10T params – Claude Opus 4.x: ~4-5T – Grok 4: ~3T The idea here is that factual capacity scales log-linearly with size. The paper shows 7 knowledge tiers and T7 is essentially ~0% for all models, suggesting there is still significant headroom for pretraining. Gemini 3.1 Pro is likely >10T given its used as an anchor but has no direct estimate. This means we can infer what different models might cost to some degree and their post-training effectiveness (performance at certain non-factual tasks given its size). One of the coolest papers I’ve read of late.

译研究人员通过询问不同难度知识问题，估计大型语言模型参数大小。结果显示，GPT 5.5约10T参数，Claude Opus 4.x约4-5T，Grok 4约3T。事实性知识容量与模型规模呈对数线性关系。论文提出7个知识层级，最高层级T7对所有模型接近零，表明预训练仍有显著提升空间。Gemini 3.1 Pro可能超过10T参数。此方法有助于推断模型训练成本及后训练在非事实性任务上的性能。

Ant Ling@AntLingAGI · 4月29日71

Last week, we introduced Ling-2.6-1T. Today, Ling-2.6-1T is officially an open model~ 🤗 1T total parameters · 63B active parameters We bring values to developers by making it easier to test, deploy, customize, and build. It is optimized to be "token efficiency" for real production needs: • Lower token overhead: strong intelligence without long reasoning traces • Reliable multi-step execution: better instruction, tool, context, and workflow control • Production-ready deployment: from code generation to bug fixing, with broad agent framework compatibility A sneak pick into the agentic capability in @opencode

译AntLingAGI正式开源其万亿参数旗舰模型Ling-2.6-1T。该模型采用总参数1万亿、激活参数630亿的架构，核心设计理念是“令牌高效”，旨在以极低的令牌开销实现顶尖智能。它通过“快速思考”机制优化，具备可靠的多步骤执行能力，在指令遵循、工具使用和上下文控制方面表现优异。模型为实际生产需求优化，部署便捷，兼容广泛的智能体框架，适用于从代码生成到错误修复等多种任务。