Gemini 3.5 Flash from @GoogleDeepMind is live on OpenRouter! Beats Gemini 3.1 Pro on coding, agentic work, and tool use at Flash-tier price and speed. 1M context, 65K max output, multimodal. $1.50/M input, $9/M output.

译来自@GoogleDeepMind的Gemini 3.5 Flash现已登陆OpenRouter！在编码、智能体任务和工具使用方面超越Gemini 3.1 Pro，同时保持Flash级别的价格和速度。支持100万上下文、6.5万最大输出、多模态。输入$1.50/百万token，输出$9/百万token。

Chubby♨️@kimmonismus · 5月20日26

Thank you Sundar - first I/O and already feeling at home. Gemini 3.5 Flash is genuinely impressive for a model at this price point. The efficiency race is just getting started!

译感谢 Sundar - 第一次参加 I/O 就已感觉宾至如归。 Gemini 3.5 Flash 在这个价位上的表现确实令人印象深刻。效率竞赛才刚刚开始！

Chubby♨️@kimmonismus · 5月20日37

Demis Hassabis talks about how Gemini is helping science move towards a golden age of medicine, so that we will soon be able to cure all diseases. I have goosebumps; I couldn't be more excited.

译Demis Hassabis谈到Gemini如何助力科学迈向医学黄金时代，使我们很快能治愈所有疾病。我激动得起了鸡皮疙瘩，兴奋之情无以言表。

AYi@AYi_AInotes · 5月20日80

Damn! Google has really gone absolutely wild this time. Gemini Omni is about to blow the roof off the ceiling of video generation 🤯 Making videos used to be like building with Lego blocks, piece by piece, slowly. Now it’s giving you a magic Lego factory that can actually think. You chat in natural language, and it understands real-world physics, history, biology, culture—then directly generates or edits any video. Five most mind-blowing abilities that you can use right now: 1Understands real physics—glass marbles colliding, turning, and bouncing in ways that match reality. 2Faces never get distorted—define a character once, put them in any scene, any action. 3Edit videos like you edit ChatGPT text—change backgrounds, swap people, add effects with a single sentence. 4Upload an image and apply any style—make claymation, visualize protein folding, whatever you imagine. 5Video isn’t a dead file anymore—change angles, lighting, objects, even storylines just by chatting. This isn’t a competitor to Sora. This is the first time a world model has truly entered a consumer-facing product. It’s not just generating pixels—it’s simulating a coherent physical and semantic world. Open the Gemini app right now and try Omni Flash. Go try it. You’ll thank me later.

译Google推出Gemini Omni，首个面向消费者的世界模型。它通过自然语言交互，将Gemini的智能与生成媒体系统结合，实现了对物理规律、历史、生物等世界的深刻理解。用户可以像编辑ChatGPT文本一样用单句指令编辑视频，实现人物一致性、风格迁移、角度调整等功能。它不是单纯生成像素，而是模拟连贯的物理与语义世界，标志着AI视频生成从拼接工具向智能创作系统的飞跃。

Chubby♨️@kimmonismus · 5月20日81

The real „wow“ moment is Gemini Omni. A world model towards AGI. It can create anything from any input. This is insane.

译真正的“哇”时刻是 Gemini Omni。一个迈向 AGI 的世界模型。它可以从任何输入创建任何内容。这太疯狂了。

Google AI Developers@googleaidevs · 5月20日84

✨ Introducing Gemini 3.5, our latest family of models combining frontier intelligence with action. The series sets a new standard for agentic models that don't just reason, they execute.

译✨ 推出 Gemini 3.5，这是我们最新的模型家族，将前沿智能与行动能力相结合。该系列为智能体模型树立了新标准，它们不仅能推理，更能执行。

Sundar Pichai@sundarpichai · 5月20日90

Just off stage at #GoogleIO, some highlights from this morning 🧵 Gemini 3.5 Flash is available today for everyone in @antigravity and across our products and APIs. Compared to 3.1 Pro, 3.5 Flash is better across almost all benchmarks with huge progress in coding. It’s also comparable to the best models but very fast (4x faster tokens/ second than other frontier models). And when looking at the intelligence versus output speed, it’s in a league of its own in the top right quadrant.

译刚结束 #GoogleIO 活动，分享今早的一些亮点 🧵 Gemini 3.5 Flash 今日起面向所有用户开放，可在 @antigravity 及我们的产品和 API 中使用。与 3.1 Pro 相比，3.5 Flash 在几乎所有基准测试中表现更优，编程能力大幅提升。它性能可比肩顶尖模型，但速度极快（每秒生成 token 数是其他前沿模型的 4 倍）。从智能水平与输出速度的综合表现来看，它在右上象限独占鳌头。

Google AI@GoogleAI · 5月20日85

Three years ago, Gemini started by understanding the world. With Gemini 2, models learned to think and reason. Late last year, Gemini 3 brought any idea to life. Today, we’re continuing that journey with our Gemini 3.5 series, starting with Gemini 3.5 Flash, delivering frontier performance for agents and coding.

译三年前，Gemini从理解世界开始。随着Gemini 2，模型学会了思考和推理。去年底，Gemini 3将任何想法变为现实。今天，我们继续这段旅程，推出Gemini 3.5系列，首先发布Gemini 3.5 Flash，为智能体和编程提供前沿性能。

🚨 AI News | TestingCatalog@testingcatalog · 5月20日75

GOOGLE I/O 🔥: GEMINI 3.5 FLASH HAS BEEN ANNOUNCED! Gemini 3.5 performs on par with Gemini 3.1 Pro on Artificial Analysis Intelligence benchmark but is much faster.

译谷歌I/O大会🔥：Gemini 3.5 Flash已发布！ Gemini 3.5在人工智能分析智能基准测试中表现与Gemini 3.1 Pro相当，但速度更快。 [引用 @GeminiApp]：Gemini 3.5 Flash来了，这是我们迄今为止在快速高效完成任务方面最好的模型。无论您需要日常任务帮助还是多步骤创意项目，Gemini 3.5 Flash都能应对现实世界的复杂性，助您采取行动。#GoogleIO

🚨 AI News | TestingCatalog@testingcatalog · 5月20日79

GOOGLE I/O 🔥: Gemini 3.5 Flash is now available on AI Studio for testing! Have you tried it yet? 👀

译GOOGLE I/O 🔥：Gemini 3.5 Flash现已在AI Studio上开放测试！你试过了吗？👀

Artificial Analysis@ArtificialAnlys · 5月20日78

Google’s new Gemini 3.5 Flash is the clear leader on the Intelligence vs Speed Pareto frontier and makes large gains on GDPval-AA (real-world agentic tasks), but is 5x the cost of Gemini 3 Flash @GoogleDeepMind gave us pre-release access to Gemini 3.5 Flash, the latest model in its Flash family, which has traditionally has offered faster, lower-cost alternatives to Gemini Pro models. Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, up 9 points from Gemini 3 Flash, driven primarily by agentic performance gains and hallucination reduction. It achieves speeds of over 280 output tokens/s, but higher token usage and token pricing make it over 5x more costly to run the Intelligence Index than Gemini 3 Flash, and 75% more costly than Gemini 3.1 Pro. Gemini 3.5 Flash is $1.50/1M input and $9/1M output tokens, Gemini 3 Flash was $0.5/$3 per 1M input/output tokens, a 3x increase. The rest of the increase was driven by higher token usage when running our benchmarks Key results for Gemini 3.5 Flash with ‘high’ thinking level: ➤ 9 point Intelligence Index improvement: Gemini 3.5 Flash scores 55 on the Artificial Analysis Intelligence Index, up 9 points from Gemini 3 Flash. This places it ahead of Grok 4.3 (high, 53) and Claude Sonnet 4.6 (max, 52). The model improves across nearly all evaluations, with the largest gains coming from agentic evaluations and AA-Omniscience (knowledge and hallucination). On AA-Omniscience, Gemini 3.5 Flash improves by 11 points, driven primarily by reduced hallucinations, with its hallucination rate falling to 61%, a 31 point decrease compared to Gemini 3 Flash ➤ Agentic capability improvements: Gemini 3.5 Flash improves substantially over Gemini 3 Flash across our agentic evaluations, in both GDPval-AA (real-world agentic tasks) and Tau2-Bench Telecom (agentic tool use). Its GDPval-AA result is especially notable, achieving an Elo of 1656, well ahead of Gemini 3 Flash (1204) and Gemini 3.1 Pro (1314), and just behind GPT-5.4 (xhigh, 1674). This represents a meaningful step forward for Google in agentic performance, which has historically been a relative weakness for Gemini models ➤ Speed-intelligence frontier: Gemini 3.5 Flash achieves speeds of over 280 output tokens per second, ~70% faster than Gemini 3 Flash and models such as gpt-oss-120b and GPT-5.4 mini (xhigh). With its 55 Intelligence Index score, this places Gemini 3.5 Flash on the speed-intelligence Pareto frontier alongside Gemini 3.1 Pro and Gemini 3.1 Flash-Lite, reinforcing Google’s strength in models balancing speed and intelligence ➤ 5.5x increase in cost to run: Gemini 3.5 Flash costs $1,552 to run the Artificial Analysis Intelligence Index, 5.5x more than Gemini 3 Flash and 75% more than Gemini 3.1 Pro. This is driven by increases in both token usage and token prices. Output token usage is broadly unchanged from Gemini 3 Flash (73M vs. 72M), but input token usage increases significantly, driven primarily by an increase in the number of turns in agentic evaluations. Gemini 3.5 Flash is priced 3x higher than Gemini 3 Flash at $1.50/$9.00 per 1M input/output tokens, with a 90% discount for cached input tokens ➤ Google continues to lead multimodal performance: Gemini 3.5 Flash is multimodal, supporting image, video, and speech input alongside text. This differs from many proprietary models, including Claude Opus 4.7, Grok 4.3, and GPT-5.5, which support image input only. In our multimodal evaluation, MMMU-Pro, Gemini 3.5 Flash scores 84% - the highest score recorded. This puts models from Google in the top two spots, with Gemini 3.1 Pro scoring 82% Key model details: ➤ Context window: Retains the same 1M context window as Gemini 3 Flash ➤ Multimodality: Text, image, video and speech input with text output only ➤ Pricing: $1.50/$9.00 per million input/output tokens, with a 90% discount for cached input tokens Congratulations @GoogleDeepMind , @sundarpichai and @demishassabis on the great release!

译谷歌发布新模型Gemini 3.5 Flash，其在智能指数上提升9分至55分，超越Grok 4.3和Claude Sonnet 4.6，尤其在代理任务和知识真实性（大幅减少幻觉）方面进步显著。输出速度超280 tokens/s，使其位于速度与智能的领先前沿。然而，模型运行成本相比前代增加5.5倍，主要由于输入令牌用量及定价上涨。此外，它在多模态评估MMMU-Pro中取得最高分，支持多模态输入，展现了谷歌的综合优势。

Chubby♨️@kimmonismus · 5月20日55

Gemini 3.5 pro next month!!!

译Gemini 3.5 Pro下月发布！！！

Chubby♨️@kimmonismus · 5月20日68

Insane evals for a Flash model! Gemini 3.5 Flash is really good for its size!

译一个Flash模型的评测结果太疯狂了！Gemini 3.5 Flash对于其尺寸来说真的非常出色！

Jeff Dean@JeffDean · 5月20日85

1/ Today at #GoogleIO, we’re releasing Gemini 3.5, our latest family of models combining frontier intelligence with action. We’re starting by releasing 3.5 Flash, which is built to help you execute complex, long-horizon agentic workflows. Gemini 3.5 Flash is our strongest model for coding and agent http://yet.It outscores 3.1 Pro on agentic and coding benchmarks like Terminal-Bench and MCP Atlas, while running 4x faster than other frontier models. Used in Google Antigravity, 3.5 Flash is even further optimized to be up to 12x faster. It’s a powerful engine to deploy sub-agents that collaborate, run high-frequency iterative loops, and solve real-world problems at scale. Some highlights we’re excited about 🔽

译在Google I/O大会上，谷歌正式推出Gemini 3.5系列模型，首个发布的Gemini 3.5 Flash专为执行复杂、长周期的代理工作流而设计。该模型在Terminal-Bench和MCP Atlas等编程与代理基准测试中得分超越3.1 Pro，且运行速度可达其他前沿模型的4倍。若在Google Antigravity环境中使用，速度提升可高达12倍。它能高效部署协同工作的子代理，通过高频迭代循环来解决现实世界的大规模问题。

Google DeepMind@GoogleDeepMind · 5月20日78

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

译我们推出Gemini Omni：这是迈向一个能从任何内容生成任何内容的模型的第一步——从视频开始。它结合了Gemini的智能与我们的生成式媒体系统——代表了在世界理解、多模态和编辑方面的飞跃🧵

Google DeepMind@GoogleDeepMind · 5月20日81

Introducing Gemini 3.5: our newest family of models combining frontier intelligence with real-world action. The first release is 3.5 Flash, our strongest model yet for agents and coding 🧵

译推出 Gemini 3.5：我们最新的模型系列，将前沿智能与现实行动相结合。首个发布版本是 3.5 Flash，这是我们迄今为止在智能体和编码方面最强大的模型 🧵

Google Gemini@GeminiApp · 5月20日79

Gemini 3.5 Flash is here and it's our best model yet for getting things done quickly and efficiently. Whether you need help with everyday tasks or multi-step creative projects, Gemini 3.5 Flash navigates real-world complexity to help you take action. #GoogleIO

译Gemini 3.5 Flash现已推出，这是我们迄今为止在快速高效完成任务方面表现最佳的模型。无论您需要处理日常任务还是多步骤创意项目，Gemini 3.5 Flash都能应对现实世界的复杂性，助您采取行动。#GoogleIO

🚨 AI News | TestingCatalog@testingcatalog · 5月20日74

GOOGLE I/O 🔥: GEMINI 3.5 FLASH HAS STARTED ROLLED OUT ON GEMINI AND APIs! Testing time soon 👀

译谷歌I/O 🔥：Gemini 3.5 Flash 已开始在 Gemini 和 API 上推出！即将开始测试 👀

🚨 AI News | TestingCatalog@testingcatalog · 5月20日75

GOOGLE I/O 🔥: GEMINI OMNI FLASH HAS BEEN ANNOUNCED AND IS NOW AVAILABLE ON GEMINI AND GOOGLE FLOW. GEMINI OMNI PRO IS COMING SOON 🤩

译谷歌 I/O 🔥：GEMINI OMNI FLASH 已发布，现已在 GEMINI 和 GOOGLE FLOW 上可用。 GEMINI OMNI PRO 即将推出 🤩

🚨 AI News | TestingCatalog@testingcatalog · 5月20日67

GOOGLE I/O 🔥: GEMINI 3.5 FLASH HAS BEEN ANNOUNCED! Gemini 3.6 performs on par with Gemini 3.1 Pro on Artificial Analysis Intelligence benchmark but is much faster.

译谷歌I/O 🔥：GEMINI 3.5 FLASH 已发布！ Gemini 3.6 在人工智能分析智能基准测试中表现与 Gemini 3.1 Pro 相当，但速度更快。

Chubby♨️@kimmonismus · 5月20日77

„Progress towards AGI“: Gemini Omni - world models -Gemini Omni official!! It can create anything from any input!!!

译„迈向AGI的进展“：Gemini Omni - 世界模型 -Gemini Omni官方发布！！它可以从任何输入创建任何内容！！！

Chubby♨️@kimmonismus · 5月20日54

Gemini 3.5 Flash official! Insanely fast an capable model

译Gemini 3.5 Flash官方发布！速度极快且能力强大的模型

小互@xiaohu · 5月20日48

Google 全新Omni 模型 🫡

歸藏(guizang.ai)@op7418 · 5月20日67

哇！谷歌新视频模型 Gemini Omni Flash 已经上线 FLow

歸藏(guizang.ai)@op7418 · 5月19日58

谷歌新的视频模型 Gemini Omni 已经开始放量了

Chubby♨️@kimmonismus · 5月19日20

Just imagine, OpenAI waits for Google I/O only to strike back on Thursday with GPT-5.6.

译想象一下，OpenAI等待Google I/O大会，只为在周四用GPT-5.6进行反击。

AYi@AYi_AInotes · 5月19日64

Damn it！SAM3绝逼要封神了！不但开源而且强的一批！最牛逼的地方是追踪能力，即使在篮球比赛这种复杂到爆炸的场景里也稳得一逼！！

🚨 AI News | TestingCatalog@testingcatalog · 5月19日76

GOOGLE I/O 🔥: We are getting Gemini 3.5 Flash today! > GEMINI > GEMINI > GEMINI > GEM 👀

译谷歌I/O 🔥：我们今天将迎来 Gemini 3.5 Flash！ > GEMINI > GEMINI > GEMINI > GEM 👀 [引用 @AiBattle_]：Gemini 3.5 Flash 刚刚出现在 Google Cloud 控制台中它来了

Rohan Paul@rohanpaul_ai · 5月19日49

Gemini 3.5 in few more hours. 🔥

译Gemini 3.5将在几小时后发布。🔥 [引用 @_anshulr]：Gemini Gemini Gemini Gem

Alibaba Cloud@alibaba_cloud · 5月19日60

🚀🚀Qwen3.7 Preview lands on Arena！ ⚡️⚡️Here come Qwen3.7-Plus-Preview. Alibaba now #5 in Vision.🎨 Can't wait to release Qwen3.7 series models！Stay tuned! @arena

译🚀🚀Qwen3.7预览版登陆竞技场！ ⚡️⚡️Qwen3.7-Plus-Preview来了。阿里巴巴现在在视觉领域排名第五。🎨 迫不及待要发布Qwen3.7系列模型了！敬请期待！@arena

Alibaba Cloud@alibaba_cloud · 5月19日55

🚀🚀Qwen3.7 Preview lands on Arena！ ⚡️⚡️Here come Qwen3.7-Max-Preview. Alibaba now #6 lab in Text. Can't wait to release Qwen3.7 series models！Stay tuned! @arena

译阿里巴巴旗下通义千问的Qwen3.7系列模型在AI评测平台Arena首次公开。其中，Qwen3.7 Max Preview在文本竞技场总排名第13，使阿里在该平台位列第六；在数学、专业知识、软件与IT、编程等多个细分领域排名进入前十。此外，Qwen3.7 Plus Preview在视觉竞技场排名第16，阿里在该领域位列第五。官方表示即将正式发布Qwen3.7系列完整模型。

小互@xiaohu · 5月19日70

性能和 Opus 相当，价格却便宜了 30 倍？ Cursor 发布自研编码模型Composer 2.5 评分上：Composer 2.5 全部进入 Opus 4.7 的同一区间，最大差距不到 1 分。价格上：Opus 4.7 大约每百万输入 token 15 美元、输出 75 美元，Composer 2.5 输入便宜 10 倍、输出便宜 30 倍。 Cursor 称 Composer 2.5 相比 Composer 2 在智能和行为表现上都有明显提升，尤其是长时间任务、复杂指令遵循、协作顺滑度。长任务能在跨越数十万 token 的 rollout 中持续推进，不容易跑偏复杂指令遵循更可靠，沟通风格和投入级别校准也更稳，干活的力度调得更合适

译Cursor发布自研编码模型Composer 2.5，其性能与Opus 4.7相当，但在成本上具有显著优势。价格方面，Composer 2.5的输入成本比Opus 4.7低10倍，输出成本低约30倍。技术层面，该模型在智能和行为表现上较前代有明显提升，尤其擅长处理长时间、大上下文的复杂任务，指令遵循的可靠性与协作流畅度也得到增强。

Berryxia.AI@berryxia · 5月19日76

今天就被奥德赛实验室的“实际”模型刷屏！ Odyssey刚刚把“世界模型”直接拉进多人模式了。 Agora-1，全球第一个真正实时的多agent世界模型。人类和AI现在可以同时进同一个模拟世界，实时互动、互相影响。他们直接拿经典GoldenEye死亡竞赛做了可玩的研究预览。你现在就能进去，和AI一起开黑、互射、抢旗，模型会实时生成画面和声音，整个世界持续更新。这已经不是单人生成视频，而是多人共享的活世界。 Odyssey说，长期来看，多agent世界模型会彻底改变游戏、模拟、教育、机器人和AI协作的方式。大家不再是旁观者，而是真正一起生活在同一个模拟里。现在就可以去试：https://agora.odyssey.ml 完整介绍在这里：https://odyssey.ml/introducing-agora-1

译奥德赛实验室推出Agora-1，这是全球首个实时多agent世界模型，允许多人与AI同时在同一个模拟世界中实时互动并相互影响。该模型以经典游戏GoldenEye死亡竞赛为演示场景，提供可玩研究预览，用户现在即可体验与AI共同参与动态生成的模拟世界。这标志着从单人生成视频向多人共享“活世界”的转变，长期来看可能重塑游戏、模拟、教育、机器人及AI协作等领域，使人类从旁观者变为与AI共同生活的参与者。

meng shao@shao__meng · 5月19日71

Cursor 发布 Composer 2.5，仍基于 Kimi K2.5，同时因为与 SpaceXAI 合作，马斯克亲自发帖证实 Composer 2.5 已经开始使用 Colossus 2 算力训练，同时正在合作从零训练一个算力规模 10 倍以上的全新模型！ Composer 2.5 相对 Composer 2 在智能水平和行为表现上均有显著提升，重点改进了三类能力：长任务的持续推进、复杂指令的可靠遵循、协作交互的自然度。 https://cursor.com/blog/composer-2-5 三项关键训练创新 1. 定向文本反馈强化学习解决问题：长任务（数十万 token 的 rollout）中，最终奖励难以告诉模型究竟是哪一步出了错——典型的 RL 信用分配难题。 2. 合成训练数据合成任务量是 Composer 2 的 25 倍。其中一种代表性方法是 feature deletion： · 给模型一个有完整测试套件的代码库 · 删除若干代码以剥离某个特性 · 让 agent 重新实现该特性，以原测试作为可验证奖励 3. 基础设施层优化继续预训练阶段使用 Muon 优化器 + 分布式正交化： · 按模型自然粒度跑 Newton-Schulz（attention 按 head，MoE 按 expert） · 分片张量先 all-to-all 拼回完整矩阵，正交化后再 all-to-all 散回；通信与计算异步重叠 · 1T 模型的优化器单步耗时仅 0.2s 训练目标的"软"维度 Cursor 明确指出现有 benchmark 无法很好衡量的两个维度，他们专门优化了： · Communication style（沟通风格） · Effort calibration（投入度校准——什么时候该多想、什么时候该收手）这两点在实际协作中体感差异很大，也是这次定向文本反馈方法的重点应用场景。

译Cursor发布迄今最强模型Composer 2.5，仍基于Kimi K2.5。模型已与SpaceXAI合作，使用Colossus 2算力开始训练，并计划合作训练一个规模大10倍的全新模型。Composer 2.5在长任务推进、复杂指令遵循及协作自然度方面均有显著提升。关键创新包括：采用定向文本反馈强化学习解决长任务信用分配问题、使用25倍于前代的合成数据进行训练，以及通过Muon优化器与分布式正交化技术优化基础设施层。此外，模型还专门针对沟通风格和投入度校准等协作“软”维度进行了优化。

Berryxia.AI@berryxia · 5月19日62

卧槽，这个模型真的有点东西啊! 看完后就想问什么时候可以上手啊！ Odyssey AI实验室刚刚扔出一个真正让人眼前一亮的家伙：Starchild-1。这是全球第一个实时多模态世界模型。它不只是生成画面，还能同时生成真实世界的声音。视频里你能看到一个完整的场景：画面在动，声音同步响起，视觉和听觉完全融为一体，像真正活过来的世界模拟。以前的世界模型大多只能“看”世界，现在Starchild-1直接学会了“听”。这不仅仅是又一个视频生成工具，更大的意义是朝着通用世界模型又迈出的关键一步，真正理解并模拟物理世界的下一步。 Odyssey团队说，他们正在用这种新形式的多模态智能，重新定义AI对现实的认知。

译Odyssey AI实验室发布了Starchild-1，这是全球首个实时多模态世界模型。该模型不仅能生成视频画面，还能同步生成与之匹配的声音，实现了视觉与听觉的真正融合，模拟出完整、鲜活的世界动态。与以往只能“看”世界的世界模型不同，Starchild-1实现了“听”的能力。这被视为向通用世界模型迈出的关键一步，旨在重新定义AI对现实世界的认知与模拟方式。

🚨 AI News | TestingCatalog@testingcatalog · 5月19日68

GOOGLE I/O 🔥: These legends are AI-generated via an upcoming Gemini Omni model. > Both videos are 8s HD samples. > Video with Sandar and Demis is likely generated as an image-to-video using Omni for style editing. > Logan's video is likely a "Likeness" Avatar and Omni video. And "GEMINI" means a new model release! 🤯

译谷歌I/O 🔥：这些传奇人物是通过即将推出的Gemini Omni模型生成的AI图像。 > 两段视频均为8秒高清样本。 > 与Sundar和Demis相关的视频很可能是使用Omni进行风格编辑的图像转视频生成。 > Logan的视频则可能是“相似度”虚拟形象与Omni视频的结合。而“GEMINI”意味着新模型的发布！🤯

karminski-牙医@karminski3 · 5月19日59

究极"拼好模"出现了! 字节跳动 Lance! 字节跳动刚发布了一个开源模型 Lance, 激活参数量只有 3B. 但是这个模型可以接受文本, 图片, 视频输入, 然后同时可以输出文本, 图片, 视频! 所以这一个模型就能完成像图片理解, 视频理解, 文生图, 图生图, 图片编辑, 文生视频, 图生视频, 视频编辑等任务. 而训练团队在技术报告中透露, 训练成本仅仅是 128 涨 A100 显卡 (按照大厂算力来说纯纯是把冗余算力拿来用了). 那为啥说是"拼好模"呢? 原因是团队并没有完全从0造轮子. 模型的视觉输入模块直接用了 Qwen2.5-VL-ViT (用来看图和视频), 而视觉输出模块是 Wan2.2_VAE (用来画画). 而模型本体是两个: Lance_3B (用来做图片的理解、生成或编辑任务) Lance_3B_Video (用来做视频相关的任务, 比如文生视频、图生视频) 所以, 这完全是一个研究性项目了, 而项目本身的亮点其实恰好是"拼得好". 这个模型不像之前许多自称为全能模型那样直接把大语言模型 (LLM) 和扩散模型 (Diffusion) 硬拼接在一起 (即所谓的 Pipeline 方案) . 而是在一个共享的交错序列 (Interleaved sequence) 中同时处理文本、图像和视频的上下文. 这样做最大的好处是统一了语义空间, 让模型的理解能力和性能更好. (从评测来看3B就接近了许多10B甚至20B模型的水平) 另外还引入了多任务协同. 简单来讲, 理解任务 (图片转向量) 和生成任务 (向量转图片) 在模型内部本身是互斥的. Lance 创新性地在同一个框架内加入了专用专家模块, 成功缓解了这种冲突, 让模型既能做 VQA (视觉问答) , 又能做图像/视频生成和编辑. 期待一波实际应用落地, 这个模型对于端侧和多模态 Agent 来讲意义是重大的, 有很多之前需要多个模型协作的场景都能用单个模型做了. #lance #全模态模型

译字节跳动开源了全模态模型Lance，其激活参数量仅为3B，却能同时处理文本、图片和视频的输入与输出，完成理解、生成与编辑等多种任务。该模型通过模块化拼接构建了Lance_3B与Lance_3B_Video两个版本，其创新在于采用共享交错序列统一语义空间，并引入专用专家模块协调理解与生成的互斥关系，使得小参数模型性能接近更大规模模型。训练仅需128张A100，对端侧部署和多模态Agent应用具有重要价值。

Chubby♨️@kimmonismus · 5月19日71

Huge, did NOT expect that release. Evals looks very solid, significant jump compared to composer 2! But: it’s 10x more efficient than the competition. Looks really exciting. Need to try it out

译没想到这次发布这么重磅。评测结果看起来非常扎实，相比Composer 2有显著提升！但重点是：它的效率是竞争对手的10倍。看起来真的很令人兴奋。需要试用一下。

Chubby♨️@kimmonismus · 5月19日62

Intelligence too cheap to meter. This is the real deal. Composer 2.5 is an efficiency-beast

译智能成本低到难以计量。这是真正的突破。Composer 2.5是效率怪兽。

Rohan Paul@rohanpaul_ai · 5月19日64

Can a smaller model purpose-built for one domain beat a frontier general model that's 100× its size? A recent paper showed yes — and not by a small margin. Raven 3.5 from PolyAI shows that a smaller specialist model can beat bigger general models on customer service calls. It beats GPT-5 and Claude Sonnet 4.6 on all 4 customer service benchmarks while staying under 300ms latency. This is one of the live debates in ML. Every researcher is asking this question. The paper is the empirical answer. PolyAI's research team published “Raven 3.5: The post-training recipe that beats GPT-5 for customer service” —- Voice agents are moving from call-center software into everyday product infrastructure. PolyAI’s launch targets the gap between website traffic and real customer conversations. Made every website capable of answering out loud. PolyAI helps enterprises fix slow phone support, long wait times, costly contact centers, robotic IVRs, and missed revenue from abandoned calls. Its voice agents handle customer conversations 24/7 across voice, chat, SMS, and social channels in 45+ languages. The result is faster support, lower operating cost, more consistent answers, and better customer experience at enterprise scale. 📞 PolyAI is launching 2 new voice AI products: ADK, a code-first Agent Development Kit for building production voice agents from your own IDE, and PolyPhone, which turns any website into a live voice AI agent in about 10 minutes. ADK connects directly into Agent Studio, so developers can build, manage, and deploy agents from the terminal. PolyPhone reads a website, understands things like FAQs and product details, then creates a voice agent that can be embedded on any webpage without needing telephony setup. The bigger point: enterprise voice AI is moving from “contact center project” to “something teams can build and ship much faster.” 🧵 1

译PolyAI研究证实，专为客服设计的较小模型Raven 3.5，在性能上显著超越了规模大其100倍的通用前沿模型。该模型在所有四项客服基准测试中击败GPT-5和Claude Sonnet 4.6，并将响应延迟控制在300毫秒内。这项发布同时包括ADK代码开发工具包和PolyPhone网页语音生成工具，助力企业快速构建生产级语音代理。此举旨在将企业语音AI从大型项目转变为可快速部署的基础设施，从而有效解决客服等待时间长、成本高等问题，提升服务效率与客户体验。