Artificial Analysis@ArtificialAnlys · 6月5日30

Open weights are reshaping coding and agentic workloads, and we’re excited to continue this conversation in person. Join us alongside @nvidia , @AWSstartups , @MiniMax_AI , @coderabbitai , and @trydaily to discuss open weight models and where the ecosystem is headed: https://luma.com/jf188vvq?tk=9QRzzY

译开放权重正在重塑编码和智能体工作负载，我们很高兴能在线下继续这场对话。请与 @nvidia 、@AWSstartups 、@MiniMax_AI 、@coderabbitai 和 @trydaily 一起，讨论开放权重模型以及生态系统的未来方向：https://luma.com/jf188vvq?tk=9QRzzY

Berryxia.AI@berryxia · 6月5日61

我今天刷到Firecrawl的里程碑数据，直接看傻了。短短两年，他们已经抓取了80亿+网页。同时1.25M+开发者在用，15万+家公司把他们当基础设施，GitHub星星125K+直接冲进全球前100仓库，npm和PyPI每周下载量超过250万次。我盯着这些数字，突然意识到一个反直觉的事实：两年前，大家还觉得网页抓取是老掉牙的工程活儿，要么贵、要么慢、要么数据脏得没法喂AI。真正做agent的人天天卡在“怎么稳定拿到最新网页内容”这个坑里。结果Firecrawl用行动把这个坑彻底填平了。他们不是简单做一个爬虫，而是把整个“web上下文层”做成了AI时代的基础设施。 agent想搜索、想抓取、想实时交互，现在直接调用一个API就行，干净、结构化、还能规模化。最狠的是，他们还在加速：下一波80亿已经在路上了。这其实戳破了当前AI圈最大的集体幻觉，很多人以为模型参数越大、推理越强就赢了，却忘了真正决定agent上限的，是它能不能可靠、持续、低成本地获取真实世界的最新数据。 Firecrawl用80亿页这个数字直接证明：AI的下一战场，已经从“谁的模型更聪明”转向“谁能把整个互联网变成AI可直接消费的上下文”。

译Firecrawl在两年内已抓取80亿+网页，拥有125万+开发者、15万+公司客户，GitHub星标125K+（全球前100仓库），npm和PyPI周下载量超250万。主推文指出，这一数据表明AI竞争正从模型参数转向“将互联网转化为可供AI直接消费的上下文”——Firecrawl通过API提供干净、结构化、可规模化的实时网页数据，填平了智能体获取最新内容的瓶颈，成为AI时代的基础设施层。

🚨 AI News | TestingCatalog@testingcatalog · 6月5日72

NVIDIA 🔥: Nemotron 3 Ultra has been released on Huggingface with 5x faster inference and 30% lower costs in comparison to other open models. > Nemotron-3-Ultra-550B-A55B-NVFP4 is a frontier-scale large language model (LLM) trained by NVIDIA, designed to deliver strong agentic, reasoning, and conversational capabilities.

译NVIDIA 在 Huggingface 上发布 Nemotron 3 Ultra（Nemotron-3-Ultra-550B-A55B-NVFP4），一个 550B 参数的 MoE 前沿智能开源大语言模型，专为长时间运行的 AI 智能体设计。相比其他开源前沿模型，推理速度提升 5 倍，复杂智能体任务成本降低 30%。模型具备强大的智能体、推理和对话能力。

Chubby♨️@kimmonismus · 6月5日66

That’s so cool! I love the creativity of those guys. An open model for live music generation only 2.4B parameters. If you are bored on long flights you can now start creating bangers

译那太酷了！我爱这些家伙的创意。一个仅2.4B参数的开放模型，用于实时音乐生成。如果你在长途飞行中无聊，现在可以开始创作神曲了。

Google AI Developers@googleaidevs · 6月5日70

Play our new open-weights music model, @GoogleMagenta RealTime 2, using a MIDI keyboard, live text prompts, and even hand gestures ✌️ https://x.com/GoogleMagenta/status/2062589313372594538

译Google AI for Developers 宣布推出开放权重的实时音乐模型 Magenta RealTime 2 (MRT2)。该模型可通过 MIDI 键盘、实时文本提示甚至手势进行演奏。MRT2 在 MacBook 上原生运行，延迟低于 200ms，提供开放权重、开源推理引擎以及配套应用和插件套件。

宝玉@dotey · 6月5日29

我知道的所有做AI Agent的团队都很拼，不是老板逼着的，是为了心中理想，所以心甘情愿加班和搞封闭开发👍 有点我好奇的是：Kimi 团队在开发 Kimi Code 的时候，是自家模型 token 用的多还是 Claude 或者 GPT 模型的 Token 用的多呢？ 🤔

译宝玉发推称所有AI Agent团队都为理想自愿加班封闭开发，并好奇Kimi团队开发Kimi Code时用自家token多还是Claude/GPT token多。@real_kai42透露，一个月前他决心重构Kimi Code，花几千刀token做架构分析与验证，确定方案后组建团队封闭开发，过程中不断吵架推翻重来，最终开源后因皮质醇过度分泌病倒。他感叹封闭开发是工程效率奇迹，集体主义远胜个人英雄主义。

Nathan Lambert@natolambert · 6月5日59

It's been a great effort by the early and growing American open-model labs since last June to put the US much more back on the map. We were getting totally owned last June. Nvidia, Ai2, Arcee, Gemma, GPT-OSS and a few others will be seen as saving American open AI.

译自去年六月以来，早期且不断壮大的美国开源模型实验室付出了巨大努力，使美国重新回到地图上。去年六月我们被彻底打败了。 Nvidia、Ai2、Arcee、Gemma、GPT-OSS 和其他几个将被视为拯救了美国开源AI。

OpenCode@opencode · 6月5日66

Nemotron 3 Ultra is now free on OpenCode text · 1M context · fully open source NVIDIA's latest open source model

译Nemotron 3 Ultra 现已在 OpenCode 上免费提供文本 · 1M context · 完全开源 NVIDIA 最新开源模型

Chubby♨️@kimmonismus · 6月4日81

1/ NVIDIA shipped Nemotron 3 Ultra today, a fully open 550B model with 55B active params, with the weights, training data, and complete recipe all released openly. That alone is rare at this scale. The headline however actually is speed. Ultra is a hybrid Mamba-Attention MoE, an architecture built for fast decoding and a light memory footprint over long contexts, and NVIDIA clocks it at roughly 6x (!) the throughput of comparable open models on long-output agent workloads while holding the same accuracy. That's a serious engineering result, and it's aimed exactly where the industry is heading: autonomous agents that run long, multi-turn tasks where throughput per GPU is what actually costs money. It was pre-trained in 4-bit (NVFP4) across 20T tokens, the largest stable run of its kind shown to date. And the post-training introduces MOPD, where ten-plus specialist teacher models distill their skills into the student on its own rollouts, sometimes pushing it past the teachers themselves. The interesting aspect:This is a frontier-class model you can fully reproduce.

译NVIDIA 正式发布 Nemotron 3 Ultra，550B 总参数（55B 活跃）的完全开源 MoE 模型，权重、训练数据和完整配方全部公开。采用混合 Mamba-Attention 架构，专为长上下文快速解码和轻内存占用设计。在长输出智能体工作负载上，吞吐量约为可比开源模型的 6 倍（推理速度提升 5 倍），复杂智能体任务成本降低最多 30%。该模型在 4-bit（NVFP4）精度下预训练 20T tokens，后训练使用 MOPD 技术，由十余个专家教师模型蒸馏技能至学生模型。这是首个达到前沿水平且可完全复现的开源模型。

Nathan Lambert@natolambert · 6月4日60

Safety by narrow control has shown to fail many times. Need more transparency on the absolute frontier, and openness close behind.

译狭窄控制的安全已多次证明会失败。在绝对前沿上需要更多透明度，开放紧随其后。

SenseTime@SenseTime_AI · 6月4日69

"𝗦𝗲𝗿𝗶𝗼𝘂𝘀𝗹𝘆 𝗶𝗺𝗽𝗿𝗲𝘀𝘀𝗶𝘃𝗲 𝘀𝘁𝘂𝗳𝗳". Thanks for the kind words, @gurru_tech — that's 𝗦𝗲𝗻𝘀𝗲𝗡𝗼𝘃𝗮 𝗨𝟭 turning prompts into professional infographics. Unified model that natively understands and generates text and images. Open-sourced. Run it yourself. 🎥Watch the video: https://youtu.be/HKz2e3STUwg 🎛️ SenseNova Studio: https://unify.light-ai.top/ (Try infographics; also join Discord for text-image interleaved gen) 🤗 https://huggingface.co/collections/sensenova/sensenova-u1 🛠️ https://github.com/OpenSenseNova/SenseNova-U1 👾 Discord: https://discord.com/invite/BuTXPHmQub

译商汤 SenseTime 推出 SenseNova U1 开源多模态模型，实现原生理解与生成文本和图像，可一键将提示词转化为专业信息图。该模型被开发者 @gurru_tech 评价为“非常令人印象深刻”。项目已开源，提供 SenseNova Studio 在线试用，并公开 HuggingFace 模型集合、GitHub 源码仓库及 Discord 社区入口。

elvis@omarsar0 · 6月4日74

NEW: NVIDIA ships 550B MoE open model for long-running agents. Very exciting times to see more open models to support local long-running coding agents.

译NVIDIA 今日发布 Nemotron 3 Ultra，一个 550B MoE 前沿智能开源模型，专为长时间运行智能体设计。相比其他开源前沿模型，推理速度提升 5 倍，复杂智能体任务成本降低 30%。

Artificial Analysis@ArtificialAnlys · 6月4日74

NVIDIA has just released Nemotron 3 Ultra, the new most intelligent US open weights model, with leading speed for its intelligence Nemotron 3 Ultra scores 47.7 on the Artificial Analysis Intelligence Index, well ahead of the next strongest US open weights models, Gemma 4 31B (39.2), Nemotron 3 Super (36.0) and gpt-oss-120b (33.3), but behind the Chinese-led open weights frontier (Kimi K2.6 at 53.9). We partnered with @NVIDIA to evaluate this model for intelligence and speed ahead of its public release. These figures use the final NVFP4 weights that NVIDIA recommends for inference, but our tests show minimal intelligence impact compared to BF16 testing, with higher precision resulting in an Artificial Analysis Intelligence Index score of 48.2 vs. the NVFP4 score of 47.7. Key Takeaways: ➤ Nemotron 3 Ultra leads in speed for its intelligence: through BlackBox AI ahead of release, Nemotron 3 Ultra is served at over 400 output tokens per second - this is slightly faster than the typical serving speed of gpt-oss-120b despite being >4X larger, and comes with significantly greater intelligence ➤ Largest Nemotron 3 model so far: with approximately 550 billion total parameters and 55 billion active, Nemotron 3 Ultra is significantly larger than its siblings and is the largest and most intelligent US open weights model release ever ➤ Nemotron 3 Ultra is the leading US open weights model on the Artificial Analysis Intelligence and Agentic Indexes by far, but Gemma 4 31B scores ~1 point higher on the Coding Index (comprised of Terminal-Bench Hard and SciCode)

译NVIDIA 发布 Nemotron 3 Ultra，为目前最智能的美国开源权重模型。在 Artificial Analysis Intelligence Index 得分 47.7，领先 Gemma 4 31B（39.2）、Nemotron 3 Super（36.0）和 gpt-oss-120b（33.3），但低于中国开源模型 Kimi K2.6（53.9）。模型总参数约 550B，激活 55B，推理速度超 400 tokens/s，较 gpt-oss-120b 略快且智能显著更高。NVFP4 精度得分 47.7，BF16 得分 48.2，精度差异极小。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日63

HeyGen announced a new FRAME.md format 👀 This format converts DESIGN.md files (which describe your brand guidelines) into a new format that also explains how to generate branded videos. It comes as an open-source repository that any brand can use, so the new FRAME.md file can steer your video generation agents.

译HeyGen 宣布了新的 FRAME.md 格式，可将品牌指南文件 DESIGN.md 转换为专为视频和动态设计的规范。原有 DESIGN.md 适用于静态屏幕，但应用到视频时，AI 智能体会误将其解读为网页和幻灯片。FRAME.md 教会智能体如何生成真正的品牌视频。该项目已以开源仓库发布，任何品牌均可使用，通过 FRAME.md 文件来引导视频生成智能体。

Artificial Analysis@ArtificialAnlys · 6月4日67

StepFun's Step 3.7 Flash sits on the Intelligence vs Output Speed Pareto frontier, scoring 43 on the Artificial Analysis Intelligence Index and is served at over 400 output tokens/s Step 3.7 Flash (open weights, Apache 2.0) is a significant upgrade on Step 3.5 Flash and stands out for its speed and gains in agentic performance (particularly GDPval-AA). 400 output tokens/s is more than double other models of a similar size class. Contributing to this speed is that the model has only 11B active parameters and the model ships with trained Multi-Token Prediction heads (3) that predict several tokens in a single forward pass, letting it decode multiple tokens at once using speculative decoding. Key results for Step 3.7 Flash with the high reasoning level: ➤ 4 point Intelligence Index improvement: Step 3.7 Flash scores 42.6 on the Artificial Analysis Intelligence Index, up 4 points from Step 3.5 Flash 2603 (38.5). It is equivalent to Qwen3.5 122B A10B (41.6) and trails MiniMax-M2.7 (49.6) and DeepSeek V4 Flash (Max Effort, 46.5) ➤ Speed-intelligence frontier: Step 3.7 Flash achieves ~400 output tokens/s on StepFun's first-party API, placing the model on the Intelligence vs Output Speed Pareto frontier. StepFun has released the weights for this model and we expect several third-party providers to serve this model ➤ Agentic capability improvements: Step 3.7 Flash improves over Step 3.5 Flash 2603 across our agentic evaluations, in both GDPval-AA (real-world agentic tasks) and TerminalBench Hard (agentic coding and terminal use). It achieves a GDPval-AA Elo of 1298, up from 1070 for Step 3.5 Flash 2603, and it's TerminalBench Hard score increases to 35.6% from 32.6%. AA-LCR (Long Context Reasoning) improves to 63.7% from 54.3%. Scores for other evals remain relatively flat ➤ Weaker on knowledge and hallucination than peers: While Step 3.7 Flash trails competitors overall on AA-Omniscience (-38), it improves from Step 3.5 Flash 2603 (-44). It has an AA-Omniscience accuracy of 25.4% and a hallucination rate of 84.4% ➤ Native multimodal support, new in this generation: Step 3.7 Flash introduces a 1.8B-parameter vision encoder for native image understanding, where Step 3.5 Flash was text-only. On MMMU-Pro (multimodal reasoning) it scores 75.3%, roughly matching Qwen3.5 122B A10B (75.0%). Among its same-size open weights peers, MiniMax-M2.7, DeepSeek V4 Flash, and gpt-oss-120b are text-only Key model details: ➤ Context window: 256K tokens ➤ Parameters: 198B total, 11B active (MoE). At BF16 native precision, Step 3.7 Flash requires ~400GB to store the weights. StepFun has also released FP8 (~200GB) and NVFP4 (~100GB) versions for lower-memory deployment ➤ License: Apache 2.0 ➤ Availability: Currently Step 3.7 Flash is available on @StepFun_ai 's first-party API

译StepFun 开源 Step 3.7 Flash（Apache 2.0），总参数 198B、激活 11B（MoE），上下文 256K。在 Artificial Analysis 智能指数上得分 42.6，较 Step 3.5 Flash 提升 4 分，输出速度超 400 tokens/s，通过 Multi-Token Prediction（3 个 token）加速。新增 1.8B 视觉编码器支持原生多模态，MMMU-Pro 得分 75.3%。代理能力提升：GDPval-AA Elo 从 1070 升至 1298，TerminalBench Hard 达 35.6%，AA-LCR 63.7%。知识/幻觉仍弱：AA-Omniscience 准确率 25.4%，幻觉率 84.4%。提供 BF16、FP8、NVFP4 精度权重以降低部署成本。

Jeff Dean@JeffDean · 6月4日75

Check out our Gemma 4 12B model: it's a super capable open weights model that can run directly on your laptop.

译来看看我们的 Gemma 4 12B 模型：它是一个功能非常强大的开源权重模型，可以直接在你的笔记本电脑上运行。

MiniMax (official)@MiniMax_AI · 6月4日71

M3 is back in the free tier on @opencode 🚀 Jump in and try it while it lasts!

译MiniMax M3 即将推出，现在即可在 OpenCode 免费试用。M3 已回到免费层，快来体验！

小互@xiaohu · 6月4日73

Ideogram 发布首个开源AI图像模型：Ideogram 4.0 宣称文字渲染和版面控制拉到了开源天花板传统文生图只能写一段 prompt 然后祈祷模型把东西放对位置 Ideogram 4.0 引入了 bounding box（边界框）控制：你可以用坐标精确指定每个元素放在画面的哪个区域。结构化 JSON 提示词：Ideogram 4.0 不只接受纯文本 prompt，还支持一套结构化 JSON 提示词格式。多语言文字渲染：英文 OCR 准确率达到 0.97（X-Omni 基准测试），并支持跨语言的密集文字渲染，支持（中日韩等非拉丁文字）

译Ideogram 发布首个开源 AI 图像模型 Ideogram 4.0，主推文字渲染与版面控制。模型引入 bounding box（边界框）控制，允许用坐标精确指定元素位置；支持结构化 JSON 提示词格式，不再仅限纯文本；英文 OCR 准确率达 0.97（X-Omni 基准），支持跨语言密集文字渲染，涵盖中日韩等非拉丁文字。

小互@xiaohu · 6月4日71

Google 发布 Gemma 4 12B 开源模型 16GB 笔记本跑全模态 AI Gemma 4 12B 采用了一种叫"Unified"的无编码器架构，让文字、图像、音频、视频四种输入直接进入同一个 Transformer 主干网络处理。模型可直接处理原始的图像和声音用一个类比讲清楚传统多模态模型处理图片和音频的方式，类似于一个只会中文的老板配了两个翻译：一个英文翻译（视觉编码器），一个日文翻译（音频编码器）。每次有英文或日文材料进来，必须先让翻译转成中文，老板才能看懂。翻译本身占工位（显存），翻译过程要排队等（延迟），而且老板拿到的是翻译加工过的版本，不是原文。 Gemma 4 12B 做的事情是：把两个翻译都裁了，让老板自己学会了直接看英文和日文。几个关键数字： 16GB 显存或统一内存能跑，4-bit 量化低到 8GB，目标就是在普通笔记本上本地运行 256K Token 上下文窗口，支持 140+ 种语言内置 Thinking 模式（逐步推理）和原生 Function Calling

译Google 发布 Gemma 4 12B 开源模型，采用无编码器 Unified 架构，可直接处理文本、图像、音频、视频，无需独立编码器。16GB 显存可运行，4-bit 量化后低至 8GB。支持 256K token 上下文、140+ 语言，内置 Thinking 模式和 Function Calling。

fofr@fofrAI · 6月4日61

Ideogram v4 > a scan of a page from my high school A3 art pad, highly original niche pencil piece working on the aura of unusual cross sections and fluidity of otherwise solid surfaces in human portraiture with offset recursion, not anatomical, the cross sections reveal something else, very detailed and complex, no other anatomy, no embellishments, no pencil shavings, no tea stains, clean white paper

译Ideogram v4 表现出色，开放权重。图像清晰，感觉焕然一新。

MiniMax (official)@MiniMax_AI · 6月4日65

We are part of @nvidia and @Microsoft ’s Local LLM lineup at #GTC Taipei.🔥 The PC is being reinvented around local, agentic, open-weight models MiniMax-M3 is built exactly for this future: Open-weight. 1M context. Strong coding. Native multimodality. Excited for what comes next!

译我们已加入 @nvidia 和 @Microsoft 在 #GTC Taipei 的本地 LLM 阵容。🔥 PC 正围绕本地、智能体、开放权重模型重新定义。 MiniMax-M3 正是为此未来而打造：开放权重。 1M 上下文。强编码能力。原生多模态。对接下来的一切充满期待！

Sundar Pichai@sundarpichai · 6月4日73

Our new Gemma 4 12B model hits a sweet spot between size + performance: it can run locally on a laptop, while enabling powerful multi-step reasoning and agentic workflows. Can’t wait to see what the community does with this one!

译Gemma 4 系列累计下载量突破1.5亿次，Google随之推出新成员Gemma 4 12B。该模型仅12B参数，可在16GB VRAM笔记本上本地运行，兼顾尺寸与性能，支持多步推理和智能体工作流。采用Apache 2.0开源许可，供社区使用。

Chubby♨️@kimmonismus · 6月4日71

Gemma 4 12B shipped today under the label "encoder-free." A local 12b model that shows really good results. I'm a big fan of Gemma Gemma 4 12B is out: a dense, fully open model (Apache 2.0) that runs on a 16GB laptop and does agentic reasoning, vision and audio at a quality Google puts near its 26B model. The reason a 12B can pull this off: Google removed the separate vision and audio encoders and feeds both straight into the model, which keeps the memory footprint small enough for consumer GPUs. For on-device assistants and private coding agents, that lowers the bar a lot. always look forward to the updates. 12b is a good sweet spot in terms of size. a few facts: Vision: the 550M encoder (27 transformer layers) is now a 35M embedder, one matmul on 48x48 pixel patches. Roughly 15x smaller. Audio: the 300M encoder (12 conformer layers) is gone. Raw 16kHz audio cut into 40ms frames, projected straight into the LLM. So encoding didn't vanish, it collapsed into the backbone. The payoff is real: one shared set of weights, so you LoRA-tune vision, audio and text in a single pass.

译Google 开源 Gemma 4 12B（密集参数，Apache 2.0 许可），采用全新无编码器架构：移除独立的视觉（550M 参数、27 层 Transformer）和音频（300M 参数、12 层 Conformer）编码器。视觉改为 35M 嵌入层（约缩小 15 倍），音频以 40ms 帧直接投影到大语言模型。模型在 16GB VRAM 笔记本上即可运行智能体推理、视觉和音频任务，性能接近 26B 参数模型。共享权重支持一次 LoRA 调优覆盖视觉、音频和文本。

Demis Hassabis@demishassabis · 6月4日74

Celebrating the milestone of a massive 150+ million downloads of Gemma 4 with the release of the new Gemma 4 12B model! It's incredibly powerful for such a small model and it’s tiny enough to run locally on a laptop with just 16GB VRAM. Apache 2.0 license - happy building!

译Demis Hassabis 宣布 Gemma 4 系列下载量突破 1.5 亿，并正式发布新版 Gemma 4 12B 模型。该模型是一个统一的、无编码器的多模态模型，兼具边缘端效率与高级推理能力。尽管参数规模仅为 12B，但性能强劲，且足够小巧，可在仅需 16GB VRAM 的笔记本上本地运行。采用 Apache 2.0 开源许可证，方便开发者自由构建。

AYi@AYi_AInotes · 6月4日70

世界最好的开源图像模型，仅次于GPT－image-2和Nanobanana2

elvis@omarsar0 · 6月4日76

Another banger open-source release. Miso One is an 8B text-to-speech model with real emotional range, so voiceovers carry warmth, hesitation, and excitement instead of sounding flat. It's purpose-built for voiceover work like shorts, podcasts, and educational content, and it runs at 110ms latency, which is faster than human reaction time. The best part is that the weights are fully open source, so you can clone the repo, self-host, fine-tune, and keep your data private. Worth checking out if you're building voice into your tools and products: http://github.com/MisoLabsAI/MisoTTS

译Miso Labs 开源 8B 参数文本转语音模型 Miso One，专注于生成富有情感的表达，如温暖、犹豫或兴奋，告别机械音。模型专为短视频、播客和教育内容等旁白场景设计，推理延迟仅 110 毫秒，快于人类反应时间。模型权重完全开源，支持自托管、微调和数据私有化，API 即将开放。

🚨 AI News | TestingCatalog@testingcatalog · 6月4日74

Ideogram announced Ideogram 4.0, a new SOTA open image generation model! > Ideogram 4.0 lands in the 8th spot on LM Arena and the 5th spot on Design Arena in the text-to-image category, and is getting close to Nano Banana Pro's performance. > Ideogram 4.0 features dense, accurate text rendering, native 2K resolution, active background transparency, and precise layout control.

译Ideogram 4.0 开源图像生成模型发布，在 LM Arena 文生图类别排名第 8，Design Arena 第 5，评分 1204，成为该领域排名最高的开放模型，性能接近 Nano Banana Pro。主要特性包括密集准确的文本渲染、原生 2K 分辨率、活动背景透明度及精确布局控制。

Chubby♨️@kimmonismus · 6月4日75

Miso One is live: an open-weights voice model built to sound like a real person reading, with actual warmth and pacing where most TTS still goes flat. 8B params, free on GitHub, with one-shot voice cloning from a short sample at 110ms latency. Self-host it and your audio data never leaves your machine. No API needed, no lock-in. Type any line into the demo and hear it before you clone the repo.

译Miso One 正式发布，一个 8B 参数的开源权重语音模型（TTS），旨在模拟真实人类朗读的温暖与节奏。它支持一次语音克隆（只需短样本），推理延迟仅 110ms。模型权重已开源至 GitHub，无需 API 即可自托管，音频数据不离开本地。API 访问即将推出。演示已上线，可先试听再克隆仓库。

Google AI Developers@googleaidevs · 6月4日77

We’re launching Gemma 4 12B: Our unified, encoder-free model that brings powerful multimodal intelligence straight to your laptop 🚀 The model bridges the gap between our mobile E4B model and larger 26B MoE models, packaging frontier-class reasoning and native audio into a highly optimized footprint, all under a permissive Apache 2.0 license. Here’s what makes it unique: + Encoder-Less Architecture: We removed the multimodal encoders. The vision and audio inputs flow directly into the LLM backbone. + Agentic Performance (16GB VRAM): Run complex, multi-step workflows locally, with performance nearing our 26B model.

译Google发布Gemma 4 12B，一款无编码器的统一多模态模型，可直接将视觉和音频输入送入LLM主干，无需传统多模态编码器。该模型填补了移动端E4B模型与26B MoE模型之间的空白，封装前沿推理与原生音频能力，采用Apache 2.0许可。在16GB VRAM下即可本地运行复杂多步骤智能体工作流，性能接近26B模型。

Nathan Lambert@natolambert · 6月3日40

A key lesson of the last year of building open models, once it became so obvious the US is behind, is that talk is cheap. Many people say they're helping / want to help but actually don't do anything. Finding the few people who genuinely push open forward is crucial.

译过去一年构建开放模型的一个关键教训，当美国明显落后这一点已变得如此清晰时，就是空谈是廉价的。许多人说他们在帮助/想要帮助，但实际上什么都没做。找到那些真正推动开放进步的人是至关重要的。

Alibaba Cloud@alibaba_cloud · 6月3日53

Dr. Feifei Li, CTO of Alibaba Cloud & Tommy Eastman, Head of Strategy, Nous Research As we orchestrate intelligence at scale, reshaping knowledge work, giving agents autonomy requires reproducible actions—the core secret behind Hermes agent's success.

译阿里云CTO李飞飞博士与Nous Research战略主管Tommy Eastman：当我们大规模编排智能、重塑知识工作时，赋予智能体自主性需要可复现的行动——这是Hermes agent成功背后的核心秘诀。

X.PIN@thexpin · 6月3日66

DeepSeek is launching a massive initial funding round! And the most surprising figure doesn't come from Tencent. According to Reuters, the company aims to raise ~$7.4B, bringing its post-money valuation to between $52B to $59B. This would mark the largest AI funding round in China to date. Among the investors: 🔹 Founder Liang Wenfeng contributes ~$3B personally 🔹 Tencent invests ~$1.5B 🔹 Battery manyfacturer CATL invests ~$0.7B as it expands into supplying power for AI data centers 🔹 NetEase, http://JD.com, and China's national AI fund are in talks to join 🔹 Hong Kong's IDG Capital and Cornerstone Capital are also among the intended investors The deal is expected to close in about two weeks. After operating as a self-funded research lab for years, DeepSeek is finally accepting outside capital, though Liang remains the largest single investor.

译DeepSeek启动首轮大规模融资，目标募资约74亿美元，投后估值520亿至590亿美元，创中国AI行业融资纪录。创始人梁文锋个人出资约30亿美元，腾讯投资约15亿美元，宁德时代投资约7亿美元（同时布局AI数据中心供电）。网易、京东、中国国家人工智能基金等也在洽谈中。交易预计两周内完成。长期自筹资金的DeepSeek首次接受外部资本，梁文锋仍是最大单一股东。（来源：Reuters报道）

Alibaba Cloud@alibaba_cloud · 6月3日28

Join the Qwen & @ModelScope2022 communities in Singapore on June 10! An evening for AI developers covering agent monetization, Qwen updates, the Global AI Hackathon, and sharing sessions from fellow builders. Apply: https://luma.com/4x2srooq #Qwen #ModelScope

译加入Qwen与@ModelScope2022社区，6月10日在新加坡！一场AI开发者晚间活动，涵盖智能体变现、Qwen更新、全球AI黑客马拉松以及来自同行构建者的分享环节。申请：https://luma.com/4x2srooq #Qwen #ModelScope

SiliconFlow@SiliconFlowAI · 6月3日67

@karpathy 's llm-wiki hit 5,000+ stars in weeks. The idea: stop re-discovering knowledge every session. Let an LLM build and maintain a wiki that gets smarter every time you use it. Here's how to build your own with @opencode + @justsisyphus OMO + SiliconFlow 🧵

译@karpathy 的 llm-wiki 在几周内获得了 5,000+ 颗星。其理念是：停止在每个会话中重新发现知识。让一个大语言模型构建并维护一个维基，每次使用时它都会变得更智能。以下是如何使用 @opencode + @justsisyphus OMO + SiliconFlow 构建你自己的版本 🧵

SiliconFlow@SiliconFlowAI · 6月3日71

The official Hermes Agent Desktop app is HERE!

译官方 Hermes Agent 桌面应用现已推出！

小互@xiaohu · 6月3日60

微软宣布将OpenClaw 引入 Microsoft 和 Windows 生态系统小龙虾现在可以在 Windows 上原生运行，使用了微软新推出的 MXC安全容器技术，node 和 gateway 都在容器内运行。 Windows 还提供了一个配套应用（companion app），可以直接设置和连接 Claws。同时微软在 Build 2026 上发布了 Microsoft Scout，这是一个基于 OpenClaw 的"始终在线"（always-on）个人 AI Agent 能连接 Teams、Outlook、OneDrive、SharePoint，在后台自动执行协调工作。微软把这类 Agent 称为"Autopilots"。微软没有自己另起炉灶做一个封闭的 Agent 框架，而是直接在 OpenClaw 仓库上构建 Scout，并承诺把企业级的策略控制能力贡献回上游开源项目。之前 OpenClaw 最大的企业落地障碍就是安全，公司不敢让一个开源 Agent 随便访问内部系统。现在微软把 Defender、Entra、Intune 这套企业安全栈全接上了，等于替 OpenClaw 补了最大的短板。

译微软宣布将OpenClaw引入Windows生态，使其可通过MXC安全容器技术原生运行，并提供配套应用进行设置。同时，微软在Build 2026上发布了基于OpenClaw的“始终在线”个人AI智能体Microsoft Scout，可连接Teams、Outlook等应用自动执行任务。微软没有构建封闭框架，而是承诺将企业级策略控制能力贡献回OpenClaw开源项目，并通过接入Defender、Entra等安全栈，解决了其在企业落地的安全障碍。

MiniMax (official)@MiniMax_AI · 6月3日80

MiniMax-M3 #6 overall on @ValsAI the new open-weight SOTA 🚀

译MiniMax-M3 在 @ValsAI 排名中位列第六新的开源权重 SOTA 🚀

Google AI Developers@googleaidevs · 6月3日74

Building autonomous agents for scientific discovery? 🧬🤖 @GoogleDeepMind Science Skills is now available on GitHub. We've open-sourced this specialized toolkit to accelerate your agentic workflows with scientific grounding and higher token efficiency. Download now ↓ https://github.com/google-deepmind/science-skills

译构建用于科学发现的自主智能体？🧬🤖 @GoogleDeepMind Science Skills 现已在 GitHub 上发布。我们已开源这个专用工具包，以科学基础和更高的 token 效率加速您的智能体工作流。立即下载 ↓ https://github.com/google-deepmind/science-skills

Microsoft Research@MSFTResearch · 6月3日44

Microsoft Research is at BUILD 2026 this week, giving developers a hands-on look at some of the many AI-based models and tools they can use to accelerate innovation, enhance their capabilities, and quickly transform ideas into prototypes. https://msft.it/6010vjBUe

译微软研究院本周参加BUILD 2026，让开发者亲身体验众多基于AI的模型和工具，以加速创新、增强能力，并快速将想法转化为原型。https://msft.it/6010vjBUe

AYi@AYi_AInotes · 6月3日57

Damn，这副眼镜里跑的是完整的 Linux！不是概念图，也不是 PPT，是 Buildroot Linux + Arm Cortex A7， SSH 进去就能跑你的 Claude Code、Codex、OpenClaw。而且整个系统 8 月前会开源到 GitHub。我觉得这副眼镜最狠的地方不是把电脑塞进眼镜里，而是它竟然把 vibe coding 从桌面拽到了你脸上。以前你写代码得坐在电脑前，现在你的 coding agent 就坐在你肩膀上，你眼睛看到什么，它实时拿到视觉上下文，骨传导麦克风里直接给你反馈。不是 AR 眼镜那种花活，是实打实的 Agent Terminal。说白了，这相当于把你的 Claude 从聊天框里拽出来，变成跟着你走的搭档。你走在路上突然想到一个 bug，不用掏手机、不用找电脑，眼镜里的 agent 已经在听着了。这种「计算跟着人走」的范式，可能才是第4类生产力计算机的真正形态。 laptop 是你去找电脑， Monako 是电脑跟着你。当 agents 成为主要工作伙伴时，计算形态会从「人追设备」变成「设备追人」。

译这副智能眼镜内置Arm Cortex A7处理器，运行完整的Buildroot Linux系统，可通过SSH直接运行Claude Code、Codex等编程工具。整个系统将于8月前开源至GitHub。其核心价值在于将编程智能体从桌面带到用户眼前，通过眼镜的视觉上下文和骨传导麦克风实现“计算跟人走”的实时协作，被视为一种新型的“Agent Terminal”。