Iliad (Troy) trailer made by Grok Imagine 1.5, which was just released

译伊利亚特（特洛伊）预告片由刚刚发布的 Grok Imagine 1.5 制作

Berryxia.AI@berryxia · 6月4日70

我擦@！我发现现在Apple的MLX框架和模型都可以Day0发布了？这看来是同步进行操作的，MLX框架以及和模型厂商直接第一时间进行了对接啊！强烈建议Mac的同学直接上MLX框架的模型，速度一般至少10-20%还是有的。

Berryxia.AI@berryxia · 6月4日67

大家还在把音频AI当成视觉和文本的边缘附属品时，一个开源模型直接把语音、音乐、环境音三件事彻底统一到一个模型里，干翻了所有闭源方案。真的试试实际效果如何，看着是真的不粗~~ 大家本地搭音频Agent，想让AI不光听懂人说话，还能分辨背景音乐、环境音效，甚至自动剪辑播客。之前所有方案不是闭源贵得离谱，就是语音和音乐两套系统，串起来一塌糊涂。今天MOSS-Audio直接把这个痛点干掉了。 OpenMOSS团队这个模型刚刚冲上Hugging Face Trending第一。它把Speech、Sound、Music真正做到了audio-language统一建模：扔一段带背景音乐的对话，它能同时转录语音、识别环境音、理解音乐情绪，还能生成文本描述或者直接做下游任务。不是简单堆数据，而是真正从架构上打通了音频世界。开源可商用，Hugging Face和GitHub代码全放出来了，普通开发者现在就能拉下来本地跑。这其实把行业当前最主流的认知直接反转了：真正通往超级智能的下一块拼图，不是继续卷视觉+文本，而是让AI像人一样同时感知声音世界。音频从来不是附属，将和文本同等重要的感官入口。谁先把这一块做通，谁就抢到了下一代agent的先机。以前我们总觉得音频AI要等闭源大厂慢慢迭代，现在开源社区用一个模型就把“语音+声音+音乐”这个三合一难题端上来了，速度和开放度反而领先。

译OpenMOSS团队发布MOSS-Audio，一个融合语音（Speech）、环境音（Sound）、音乐（Music）的开源音频-语言模型，已冲上Hugging Face Trending第一。该模型从架构上打通三大音频域，可同时转录对话、识别背景音、理解音乐情绪并生成文本或执行下游任务。模型完全开源可商用，代码和权重已在Hugging Face及GitHub公布，开发者可本地运行。

小互@xiaohu · 6月4日71

Google 发布 Gemma 4 12B 开源模型 16GB 笔记本跑全模态 AI Gemma 4 12B 采用了一种叫"Unified"的无编码器架构，让文字、图像、音频、视频四种输入直接进入同一个 Transformer 主干网络处理。模型可直接处理原始的图像和声音用一个类比讲清楚传统多模态模型处理图片和音频的方式，类似于一个只会中文的老板配了两个翻译：一个英文翻译（视觉编码器），一个日文翻译（音频编码器）。每次有英文或日文材料进来，必须先让翻译转成中文，老板才能看懂。翻译本身占工位（显存），翻译过程要排队等（延迟），而且老板拿到的是翻译加工过的版本，不是原文。 Gemma 4 12B 做的事情是：把两个翻译都裁了，让老板自己学会了直接看英文和日文。几个关键数字： 16GB 显存或统一内存能跑，4-bit 量化低到 8GB，目标就是在普通笔记本上本地运行 256K Token 上下文窗口，支持 140+ 种语言内置 Thinking 模式（逐步推理）和原生 Function Calling

译Google 发布 Gemma 4 12B 开源模型，采用无编码器 Unified 架构，可直接处理文本、图像、音频、视频，无需独立编码器。16GB 显存可运行，4-bit 量化后低至 8GB。支持 256K token 上下文、140+ 语言，内置 Thinking 模式和 Function Calling。

MiniMax (official)@MiniMax_AI · 6月4日77

15.6× faster decoding at 1M tokens 🔥 Thanks @FireworksAI_HQ for powering the inference behind M3. Try it now 👇

译15.6× faster decoding at 1M tokens 🔥 感谢 @FireworksAI_HQ 为 M3 提供推理支持。立即尝试 👇

Berryxia.AI@berryxia · 6月4日69

Google 昨晚发布Gemma 4 12B 多模态的大模型，至少需要16G 内存就可以运行。应该和Qwen 的模型进行对比其效果如何～

DogeDesigner@cb_doge · 6月4日70

SpaceXAI keeps raising the bar. 🔥 Grok Imagine Video 1.5 preview is now live on the API, and the results look insanely cinematic. 📽️ Go try it yourself. 💻 Godspeed SpaceXAI. 🚀

译SpaceXAI 不断刷新标准。🔥 Grok Imagine Video 1.5 预览版现已上线 API，效果看起来极为电影感。📽️ 去亲自试试吧。💻 祝 SpaceXAI 好运。🚀

MiniMax (official)@MiniMax_AI · 6月4日78

Mem0 is an official launch partner for MiniMax M3! M3's 1M token context window + @mem0ai 's memory layer = AI apps that truly remember. Build personalized AI agents with persistent memory, now with 50% off M3 during launch week. Get started with Minimax → https://platform.minimax.io/docs/guides/models-intro Sign up with mem0 → http://app.mem0.ai/?utm_source=minimax_x_post

译Mem0 是 MiniMax M3 的官方启动合作伙伴！ M3 的 1M token 上下文窗口 + @mem0ai 的记忆层 = 真正记住的 AI 应用。构建具有持久记忆的个性化 AI 智能体，现在启动周内 M3 享五折优惠。开始使用 Minimax → https://platform.minimax.io/docs/guides/models-intro 注册 mem0 → http://app.mem0.ai/?utm_source=minimax_x_post

Greg Brockman@gdb · 6月4日71

Major upgrade to GPT-Rosalind, with much better intelligence for drug discovery, analysis, design, and experimental workflows:

译GPT-Rosalind 重大升级，药物发现、分析、设计和实验工作流的智能大幅提升：

🚨 AI News | TestingCatalog@testingcatalog · 6月4日53

Reve 2.0 is now available, and it landed in second place in the text-to-image arena, outranking Nano Banana 2. > We invented a new way to generate and edit any image using precise layouts. For the first time, it’s possible to create images you can touch. > Images are represented as code, so every part of an image becomes addressable, editable, and manipulable. > Every image in Reve is segmented and labeled, giving you precise control over every region and element.

译新模型 Reve 2.0 上线，在 Text-to-Image 竞技场中排名第二，超越 Nano Banana 2 和 GPT-Image-1.5。该模型采用全新图像生成与编辑方式，利用精确布局实现可交互的图像创作：图像被表示为代码，每个区域均可寻址、编辑和操控；图像被自动分割并标注，用户可对每一元素进行精细化控制。

OpenAI@OpenAI · 6月4日67

We’re bringing new capabilities to GPT-Rosalind, a model series purpose-built for life sciences research at enterprise scale. It brings GPT-5.5’s agentic coding and tool use together with stronger intelligence for drug discovery, analysis, design, and experimental workflows. https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind

译我们正在为 GPT-Rosalind 带来新功能，这是一个专为企业级生命科学研究打造的模型系列。它将 GPT-5.5 的智能体编码和工具使用能力与更强大的智能相结合，用于药物发现、分析、设计和实验工作流程。 https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind

fofr@fofrAI · 6月4日61

Ideogram v4 > a scan of a page from my high school A3 art pad, highly original niche pencil piece working on the aura of unusual cross sections and fluidity of otherwise solid surfaces in human portraiture with offset recursion, not anatomical, the cross sections reveal something else, very detailed and complex, no other anatomy, no embellishments, no pencil shavings, no tea stains, clean white paper

译Ideogram v4 表现出色，开放权重。图像清晰，感觉焕然一新。

MiniMax (official)@MiniMax_AI · 6月4日65

@mem0ai is an official launch partner for MiniMax M3! M3's 1M token context window + @mem0ai 's memory layer = AI apps that truly remember. Build personalized AI agents with persistent memory, now with 50% off M3 during launch week. Get started with Minimax → https://platform.minimax.io/docs/guides/models-intro Sign up with mem0 → http://app.mem0.ai/?utm_source=minimax_x_post

译@mem0ai 是 MiniMax M3 的官方发布合作伙伴！ M3 的百万 token 上下文窗口 + @mem0ai 的记忆层 = 真正能记住的 AI 应用。构建带有持久记忆的个性化 AI 智能体，发布周期间 M3 可享 5 折优惠。开始使用 Minimax → https://platform.minimax.io/docs/guides/models-intro 注册 mem0 → http://app.mem0.ai/?utm_source=minimax_x_post

Sundar Pichai@sundarpichai · 6月4日73

Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!

译Gemma 4 系列累计下载量突破1.5亿次，Google随之推出新成员Gemma 4 12B。该模型仅12B参数，可在16GB VRAM笔记本上本地运行，兼顾尺寸与性能，支持多步推理和智能体工作流。采用Apache 2.0开源许可，供社区使用。

fofr@fofrAI · 6月4日69

Ideogram v4 is really good, and open weights. Images are crisp and feel fresh.

译Ideogram v4 真的很好，而且开源权重。图像清新锐利，令人耳目一新。

Chubby♨️@kimmonismus · 6月4日71

Gemma 4 12B shipped today under the label "encoder-free." A local 12b model that shows really good results. I'm a big fan of Gemma Gemma 4 12B is out: a dense, fully open model (Apache 2.0) that runs on a 16GB laptop and does agentic reasoning, vision and audio at a quality Google puts near its 26B model. The reason a 12B can pull this off: Google removed the separate vision and audio encoders and feeds both straight into the model, which keeps the memory footprint small enough for consumer GPUs. For on-device assistants and private coding agents, that lowers the bar a lot. always look forward to the updates. 12b is a good sweet spot in terms of size. a few facts: Vision: the 550M encoder (27 transformer layers) is now a 35M embedder, one matmul on 48x48 pixel patches. Roughly 15x smaller. Audio: the 300M encoder (12 conformer layers) is gone. Raw 16kHz audio cut into 40ms frames, projected straight into the LLM. So encoding didn't vanish, it collapsed into the backbone. The payoff is real: one shared set of weights, so you LoRA-tune vision, audio and text in a single pass.

译Google 开源 Gemma 4 12B（密集参数，Apache 2.0 许可），采用全新无编码器架构：移除独立的视觉（550M 参数、27 层 Transformer）和音频（300M 参数、12 层 Conformer）编码器。视觉改为 35M 嵌入层（约缩小 15 倍），音频以 40ms 帧直接投影到大语言模型。模型在 16GB VRAM 笔记本上即可运行智能体推理、视觉和音频任务，性能接近 26B 参数模型。共享权重支持一次 LoRA 调优覆盖视觉、音频和文本。

DogeDesigner@cb_doge · 6月4日78

SpaceXAI is cooking.

译Grok Imagine 1.5 预览版已发布，即日起可在 API 中体验。SpaceXAI 正在发力。

Demis Hassabis@demishassabis · 6月4日74

Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it’s tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!

译Demis Hassabis 宣布 Gemma 4 系列下载量突破 1.5 亿，并正式发布新版 Gemma 4 12B 模型。该模型是一个统一的、无编码器的多模态模型，兼具边缘端效率与高级推理能力。尽管参数规模仅为 12B，但性能强劲，且足够小巧，可在仅需 16GB VRAM 的笔记本上本地运行。采用 Apache 2.0 开源许可证，方便开发者自由构建。

AYi@AYi_AInotes · 6月4日70

世界最好的开源图像模型，仅次于GPT－image-2和Nanobanana2

Artificial Analysis@ArtificialAnlys · 6月4日71

Jensen Huang’s keynote at Computex used Artificial Analysis benchmarks to communicate the performance of Nemotron 3 Ultra Jensen used our Artificial Analysis Intelligence Index vs. Output Speed chart to communicate the performance of NVIDIA’s new Nemotron 3 Ultra model. The presentation also highlighted GDPval-AA, Artificial Analysis' benchmark that uses OpenAI's GDPval dataset to evaluate models on economically valuable tasks NVIDIA additionally highlighted Artificial Analysis Text to Image and Image to Video Arena Elos to promote the NVIDIA Cosmos 3 model family. Congratulations @NVIDIAAI on the launches!

译Jensen Huang 在 Computex 主题演讲中引用 Artificial Analysis 的 Intelligence Index vs. Output Speed 图表，介绍 NVIDIA 新模型 Nemotron 3 Ultra 的性能。演讲还提及 GDPval-AA——Artificial Analysis 基于 OpenAI 的 GDPval 数据集评估模型在经济价值任务上的基准。NVIDIA 同时用 Artificial Analysis 的文生图和图生视频 Arena Elo 评分推广 Cosmos 3 模型族。

Krea@krea_ai · 6月4日74

introducing Ideogram v4.0. 2k native resolution, excellent text rendering, and support for JSON prompts. try it now in Krea.

译介绍 Ideogram v4.0。原生 2K 分辨率，出色的文字渲染，支持 JSON 提示词。立即在 Krea 中体验。

elvis@omarsar0 · 6月4日76

Another banger open-source release. Miso One is an 8B text-to-speech model with real emotional range, so voiceovers carry warmth, hesitation, and excitement instead of sounding flat. It's purpose-built for voiceover work like shorts, podcasts, and educational content, and it runs at 110ms latency, which is faster than human reaction time. The best part is that the weights are fully open source, so you can clone the repo, self-host, fine-tune, and keep your data private. Worth checking out if you're building voice into your tools and products: http://github.com/MisoLabsAI/MisoTTS

译Miso Labs 开源 8B 参数文本转语音模型 Miso One，专注于生成富有情感的表达，如温暖、犹豫或兴奋，告别机械音。模型专为短视频、播客和教育内容等旁白场景设计，推理延迟仅 110 毫秒，快于人类反应时间。模型权重完全开源，支持自托管、微调和数据私有化，API 即将开放。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日74

Ideogram announced Ideogram 4.0, a new SOTA open image generation model! > Ideogram 4.0 lands in the 8th spot on LM Arena and the 5th spot on Design Arena in the text-to-image category, and is getting close to Nano Banana Pro's performance. > Ideogram 4.0 features dense, accurate text rendering, native 2K resolution, active background transparency, and precise layout control.

译Ideogram 4.0 开源图像生成模型发布，在 LM Arena 文生图类别排名第 8，Design Arena 第 5，评分 1204，成为该领域排名最高的开放模型，性能接近 Nano Banana Pro。主要特性包括密集准确的文本渲染、原生 2K 分辨率、活动背景透明度及精确布局控制。

Chubby♨️@kimmonismus · 6月4日75

Miso One is live: an open-weights voice model built to sound like a real person reading, with actual warmth and pacing where most TTS still goes flat. 8B params, free on GitHub, with one-shot voice cloning from a short sample at 110ms latency. Self-host it and your audio data never leaves your machine. No API needed, no lock-in. Type any line into the demo and hear it before you clone the repo.

译Miso One 正式发布，一个 8B 参数的开源权重语音模型（TTS），旨在模拟真实人类朗读的温暖与节奏。它支持一次语音克隆（只需短样本），推理延迟仅 110ms。模型权重已开源至 GitHub，无需 API 即可自托管，音频数据不离开本地。API 访问即将推出。演示已上线，可先试听再克隆仓库。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日65

GOOGLE 🔥: A new Gemma 4 12B is now available on Huggingface under Apache 2.0 license! > Built with the same multimodal functionality as Gemma 4 E2B and E4B (text, audio, image, and video inputs), it brings native audio and vision understanding directly to local environments without the need for separate encoders. > This unified approach to multimodality makes the model encoder-free, offering a deployment size that is perfect for consumer devices and streamlined local execution.

译Google 最新的 Gemma 4 12B 模型已上线 Hugging Face，采用 Apache 2.0 许可证。该模型与 Gemma 4 E2B/E4B 共享相同多模态能力，支持文本、音频、图像和视频输入，无需单独编码器即可实现原生音频和视觉理解。这种无编码器统一设计方案使其部署体积更小，非常适合消费级设备和本地执行环境。官方称其旨在弥合边缘效率与高级推理之间的差距。

Google AI Developers@googleaidevs · 6月4日77

We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

译Google发布Gemma 4 12B，一款无编码器的统一多模态模型，可直接将视觉和音频输入送入LLM主干，无需传统多模态编码器。该模型填补了移动端E4B模型与26B MoE模型之间的空白，封装前沿推理与原生音频能力，采用Apache 2.0许可。在16GB VRAM下即可本地运行复杂多步骤智能体工作流，性能接近26B模型。

SenseTime@SenseTime_AI · 6月3日73

A plain sneaker image went in. Marketing visuals came out. #SenseNova U1 — see, think, create — all in one model. #OpenSourced. This is the architecture shift people keep talking about. Shoutout @AiLockup for the demo 🔥 🎥Watch the video: https://youtu.be/9IFgPqMWBGg Try it today: 🎛️ SenseNova Studio: https://unify.light-ai.top/ (Try infographics; also join Discord for text-image interleaved gen) 🤗 https://huggingface.co/collections/sensenova/sensenova-u1 🛠️ https://github.com/OpenSenseNova/SenseNova-U1 👾 Discord: https://discord.com/invite/BuTXPHmQub @huggingface @github

译商汤（SenseTime）开源SenseNova U1模型，宣称实现“看、思考、创作”一体——从一张普通运动鞋图片直接生成营销视觉效果。该模型代表了架构上的范式转变。用户可通过SenseNova Studio、HuggingFace和GitHub尝试使用。

Alibaba Cloud@alibaba_cloud · 6月3日71

Qwen: Foundation Models for the Agent Era with Steven Hoi, Head of Multimodal Interaction, Tongyi Large Model BU Qwen3.7 delivers major breakthroughs in reasoning, fully upgrading native agentic capabilities across tool use, coding, and long-horizon tasks.

译Qwen：面向智能体时代的基座模型，由通义大模型BU多模态交互负责人Steven Hoi介绍。 Qwen3.7在推理方面取得重大突破，全面升级了工具使用、编码和长程任务的原生智能体能力。

Satya Nadella@satyanadella · 6月3日82

With the new MAI models and Frontier Tuning capabilities we announced today, we're focused on helping every company move from just consuming a frontier model to fully participating at the frontier.

译凭借我们今天宣布的全新MAI模型和前沿调优能力，我们致力于帮助每家公司从仅仅使用前沿模型，转变为全面参与前沿领域。

Berryxia.AI@berryxia · 6月3日74

老树开新花了，这个老大哥微软今天发布新模型了😄 刷一波存在感哈哈哈，不然都没有人记得了~ Microsoft AI今天直接甩出七个全新MAI模型。官方说：不是简单迭代，而是从零开始、干净数据血统、零蒸馏训练的一整个家族。 MAI-Thinking-1主推理、MAI-Code-1-Flash主编码、MAI-Image-2.5主图像、MAI-Transcribe-1.5主转录、MAI-Voice-2主语音，还有各自的Flash版本。最狠的是MAI-Code-1-Flash，直接在SWE-Bench Verified上干到71.6，比Claude Haiku 4.5高5分，Pro榜单高16分，还省60% token，现在已经在Copilot里逐步上线。 MAI-Image-2.5在Arena图像编辑排第二、文本生图排第三，精准保留人脸、logo和细节，已经直接塞进PowerPoint和OneDrive。 MAI-Transcribe-1.5在43种语言上同时拿准度和速度第一，一小时音频15秒搞定。 MAI-Voice-2能控情绪、支持多语言code-switching，长内容说话人身份也稳。它们不是各自为战，而是设计成一个能无缝协作的家族。Microsoft这次没玩“一个大模型通吃”，而是把每个任务拆开，用干净数据从头训，公开所有技术细节和学习心得。这其实把行业当前最主流的路径反过来了。大家都在卷参数规模、卷蒸馏别人家的输出，Microsoft却在说：真正长期有竞争力的，是从零构建、血统干净、任务专精、还能互相配合的模型家族。实际效果如何，其实还有待大家的测试~~期待看看实际表现！

译微软在Build大会宣布推出七个全新的MAI模型家族。该家族以“干净数据血统”从零开始训练，旨在任务专精并能无缝协作。其中，MAI-Code-1-Flash在SWE-Bench Verified上得分71.6，比Claude Haiku 4.5高出5分，并能节省60% token。MAI-Transcribe-1.5处理一小时音频仅需15秒，在43种语言上实现速度与准度领先。微软此次发布旨在展示其从零构建、专精且能协同工作的模型发展路径。

Berryxia.AI@berryxia · 6月3日64

微软的新模型MAI-Image-2.5 在图像编辑中斩获第二名的位置。那么可以看出来还是GPT-Image-2 最强，第一！ Google 的Nano Banana 模型都已经被微软的MAI超越了…… Google 老大哥能不能整点新活儿出来啊，Pro会员都要到期了…

译微软发布新模型MAI-Image-2.5，并在Image Edit Arena（单图编辑）评测中取得第二名，得分为1401。根据评测数据，该模型分数比Nano Banana 2、Grok Imagine Image Quality和ChatGPT-Image-Latest-High Fidelity高出10分。尽管取得了进步，但评测显示当前的第一名仍是GPT-Image-2模型。该消息来源于X用户@berryxia。

meng shao@shao__meng · 6月3日72

Microsoft Build 一口气发布了 7 个模型！微软，最后再信你一次 (1)(1)(1)(1)(1)(1)(1) 😄

译微软Build大会一口气发布了7个模型！微软，最后再信你一次 (1)(1)(1)(1)(1)(1)(1) 😄

MiniMax (official)@MiniMax_AI · 6月3日74

We wrapped a live session on M3 yesterday with the @togethercompute team & our researchers @zpysky1125 and @HaohaiSun A few highlights 🧵 1. MSA (MiniMax Sparse Attention) is the star ⭐️. Unlike CSA/HCA, which compress the KV cache, MSA keeps the real, uncompressed KV and does block-level selection with a small top-K. That's how the 1M context window stays tractable. 2. The efficiency win is huge. In our previous generation, ~30% of per-decode wall-clock time went to the attention kernel. With MSA that now drops to ~5%. Big gains for long-context generation. 3. M3 isn't just a coding model. Natively multimodal (image + video in), ability to handle long-horizon agentic tasks, and even operate a desktop computer. People are already throwing game-dev + Minecraft-style builds at it (Unity included) and it's holding its own. 4. M3 can self-evaluate on vision-coding tasks: it builds a website or SVG, browses and inspects its own rendered output, judges it, and iterates - grading work visually. 5. We're also seeing junior-analyst-level performance on finance tasks; something we haven't even showcased publicly yet. 6. What's next: harder long-horizon / multi-file tasks in future releases, scaling data + post-training (RL) compute toward pre-training scale, and going deeper into finance, legal & bio. Thanks to everyone who joined 🙏 Try M3 link in the comments👇

译MiniMax M3模型通过Live Session分享了核心信息。其MSA技术采用块级Top-K选择，保持真实、未压缩的KV缓存，使1M token上下文窗口高效运行。该技术将长上下文生成的注意力内核解码时间从约30%降至约5%，效率提升显著。M3是原生多模态模型，支持图像视频输入，可处理长程智能体任务及桌面操作，并具备视觉自评估迭代能力。模型在金融任务中展现出初级分析师水平。未来版本将聚焦更复杂的长程任务，并扩展金融、法律与生物领域。Together AI为其提供推理服务。

MiniMax (official)@MiniMax_AI · 6月3日80

MiniMax-M3 #6 overall on @ValsAI the new open-weight SOTA 🚀

译MiniMax-M3 在 @ValsAI 排名中位列第六新的开源权重 SOTA 🚀

Rohan Paul@rohanpaul_ai · 6月3日81

Microsoft unveiled MAI-Thinking-1. So Microsoft now has a full in-house pipeline for building stronger reasoning models again and again. Microsoft calls this system a “hill-climbing machine,” meaning it keeps improving the data, training setup, rewards, safety tests, and evaluations as one connected process. Strong for its size, including 97.0% on AIME 2025, 87.7% on LiveCodeBench v6, and 52.8% on SWE-Bench Pro. MAI-Thinking-1 is the first model from that process, using 35B active parameters inside a 1T total parameter mixture-of-experts model, where only part of the model runs for each token. The base model was trained from scratch on 30T mostly human-generated tokens, with Microsoft saying it avoided third-party model distillation during pre-training. After that, the team used reinforcement learning, which means the model practiced tasks and improved from feedback, to teach math reasoning, coding, tool use, helpfulness, and safety.

译微软发布了 MAI-Thinking-1，这是一款采用 MoE 架构的模型，拥有 35B 活跃参数和 1T 总参数。该模型从零开始在 30T tokens 上完成预训练，且未使用第三方模型蒸馏。微软称其迭代优化流程为“爬山机器”。在基准测试中，该模型于 AIME 2025 获得 97.0%，在 LiveCodeBench v6 获得 87.7%，在 SWE-Bench Pro 获得 52.8% 的成绩。

Chubby♨️@kimmonismus · 6月3日63

Mai-1 thinking: Mid size model, 45b active parameter, MoE, side by side with sonnet 4.6 0 distillation „Microsoft’s first reasoning model“

译Mai-1 thinking：中型模型，45b 活跃参数，MoE，与 Sonnet 4.6 并列 0 知识蒸馏 “微软的首个推理模型”

Artificial Analysis@ArtificialAnlys · 6月3日64

Microsoft has released MAI-Transcribe-1.5: an exceptionally fast speech transcription model at a speed factor of ~276x, while still achieving 2.4% on AA-WER (#3), leading the accuracy-speed Pareto frontier MAI-Transcribe-1.5 is Microsoft AI (MAI)’s latest speech transcription model, coming in at 3rd overall on the on the Artificial Analysis Word Error Rate (AA-WER) leaderboard, behind Alibaba’s Fun-Realtime-ASR-preview (1.7% WER), and ElevenLabs Scribe v2 (2.2% WER). The model stands out as the fastest STT model in the top 10 for accuracy, processing audio at ~276x real-time - this is more than double the speed of the second fastest model in the top 10 for accuracy. The new model supports keyword biasing (improved recognition of rarer vocabulary such as names and medical terminology), in addition to support for 43 languages including English, French, Arabic, Japanese, and Chinese. See more details below ⬇️

译微软AI发布了MAI-Transcribe-1.5语音转录模型。该模型在AA-WER排行榜上位列第三，词错误率（WER）为2.4%，仅次于阿里巴巴的Fun-Realtime-ASR-preview（1.7%）和ElevenLabs Scribe v2（2.2%）。其主要特点是速度极快，处理速度约为276倍实时，是准确率前十模型中第二快模型速度的两倍以上，因此在准确率-速度帕累托前沿上处于领先地位。模型还支持关键词偏差识别，并涵盖包括英语、法语、阿拉伯语、日语和中文在内的43种语言。

🚨 AI News | TestingCatalog@testingcatalog · 6月3日70

MICROSOFT 🔥: New MAI Code 1 Flash and MAI Thinking 1 models have been revealed on the official MAI website! Also, MAI Image 2.5, MAI Voice 2, and MAI Transcribe 1.5 are there too. > MAI-Code-1-Flash plans and reasons through complex coding tasks from start to finish, so you spend less time debugging and more time building. > MAI-Thinking-1 (35B active, ~1T total parameters, MoE) has a smaller inference footprint than much larger models, yet is competitive with Claude Opus 4.6 on SWE-Bench Pro. h/t @MeetPatelTech

译微软在官网更新了 MAI 模型系列，重点发布了 MAI Code 1 Flash 和 MAI Thinking 1。MAI Thinking 1 拥有 35B 活跃参数和约 1T 总参数，采用 MoE 架构，其推理成本低于更大型模型，但在 SWE-Bench Pro 上的表现可与 Claude Opus 4.6 竞争。MAI Code 1 Flash 则专注于通过规划和推理来完成端到端的复杂编码任务。此外，MAI Image 2.5、MAI Voice 2 及 MAI Transcribe 1.5 也同步上线。

Artificial Analysis@ArtificialAnlys · 6月3日62

Krea 2 Medium debuts at #6 on the Artificial Analysis Text to Image Leaderboard, trailing only models from OpenAI, Google, and NVIDIA! Krea 2 is @krea_ai's first image model family trained entirely from scratch (Krea 1 was developed in collaboration with Black Forest Labs). Krea 2 is available in two variants: Krea 2 Medium, and Krea 2 Large, which is more comparable to FLUX.2 [pro] in our arena. Notably, Krea 2 Medium outranks the larger, more expensive Krea 2 Large in our arena. Krea describes Medium as smaller and faster, with extensive post-training that makes its outputs especially stable and consistent across generations. While Large is positioned as the more capable model, our leaderboard results align with Krea's view that Medium "handles the broadest range of use cases reliably." Both models generate at 1K resolution and share a distinct set of generation controls via the API: ➤ Style transfer: Krea can extract the style of up to 10 reference images, with each image being able to be weighted in terms of importance ➤ Creativity Setting: A configurable API parameter (raw, low, medium, high) that sets how closely the model follows the prompt versus reinterpreting it ➤ Moodboards: A collection of images that can be collected in the application to apply a style transfer onto the image (separate from individual style reference images) At $30 per 1k images via Krea's API, Krea 2 Medium is priced below comparable models such as Nano Banana Pro at $134/1k images or grok-imagine-image-quality at $50/1k images. Krea 2 Large is priced at $60 per 1k images, and both models' prices increase with the use of the Style Transfer and Moodboard features. Both models are available in the Krea app, via Krea's API, and on official third-party launch partners. Congratulations to @krea_ai on the launch! See below for comparisons between Krea 2 and other leading models in our Artificial Analysis Image Arena 🧵

译Krea AI自研的文生图模型Krea 2 Medium在Artificial Analysis排行榜上位列第6，仅落后于OpenAI、Google和NVIDIA的模型。值得注意的是，体积更小、速度更快的Medium版本在排名上超过了定位更强大的Large版本。两款模型均支持通过API进行风格迁移和创意控制等操作，生成1K分辨率图像。定价方面，Krea 2 Medium为30美元/千张，Krea 2 Large为60美元/千张。

StepFun@StepFun_ai · 6月2日73

Open weights are moving from model cards into real coding workflows. Step 3.7 Flash is designed for fast agentic coding, reliable tool calling, and multimodal understanding. Big thanks for the blog from the @kilocode team: https://blog.kilo.ai/p/new-models-from-stepfun-and-minimax

译阶跃星辰发布 Step 3.7 Flash 模型，强调其为快速智能体编程设计，具备可靠的工具调用与多模态理解能力。该模型采用开放权重。同期，MiniMax 也开源了 M3 模型。两者已均在 Kilo 中上线。此次发布凸显了开放权重模型正从模型卡片走向实际编程工作流的趋势。