Holy Sh*t: Seedance 2.5 coming early July. And still no text-to-video model has even come close to Seedance.

译Holy Sh*t: Seedance 2.5 七月初发布。并且仍然没有任何文生视频模型能接近 Seedance。

Artificial Analysis@ArtificialAnlys · 6月17日65

Soniox has released Soniox v5 Real-Time: a low latency streaming Speech to Text model on the Pareto frontier for accuracy and latency, at the lowest price of any proprietary model tested Soniox v5 Real-Time is @soniox_ai's latest streaming Speech to Text (STT) model, joining Soniox v5 Async, their non-streaming model released last week. On AA-WER Streaming it occupies the middle of the Pareto frontier: faster than the most accurate models (Cartesia Ink-2, ElevenLabs Scribe v2 Realtime) and more accurate than the fastest (Deepgram Flux, Nova-3), while at a lower price than all of them. AA-WER Streaming Overview AA-WER Streaming reports WER and latency as a pair, measured from Silero VAD-detected end of speech on the same ~8 hours of audio as our non-streaming STT benchmark, AA-WER v2.0. We report both at two points: First Final (first final-denoted transcript, best for accuracy) and First Partial (first transcript-bearing event, best for when speed matters most). Key takeaways ➤ First Final Transcription: Soniox v5 Real-Time achieves a 4.5% WER at 0.05s after end of speech, more accurate than the faster Deepgram Flux (7.4%, 0.02s) and Deepgram Nova-3 Realtime (6.7%, 0.06s), and faster than the more accurate Cartesia Ink-2 external endpoints (3.7%, 0.09s) and ElevenLabs Scribe v2 Realtime (3.6%, 0.14s) ➤ First Partial Transcription: The model achieves a 4.7% WER at 0.05s after end of speech, behind only Cartesia Ink-2 external endpoints (4.3%, 0.07s) and ElevenLabs Scribe v2 Realtime (3.6%, 0.13s) on accuracy, while faster than both ➤ Price: The model costs $2 per 1,000 minutes representing the lowest of any proprietary streaming model tested, below Cartesia Ink-2 ($4), Deepgram Nova-3 Realtime ($4.80) and ElevenLabs Scribe v2 Realtime ($6.50) ➤ Language support: The model supports over 60 languages, providing language identification and real-time translation across multilingual conversation. See more details below ⬇️

译Soniox 发布 v5 Real-Time 流式 STT 模型，在 AA-WER Streaming 基准上处于准确率与延迟的帕累托前沿。First Final 转录 WER 4.5%（延迟 0.05s），比 Deepgram Flux (7.4%, 0.02s) 和 Nova-3 Realtime (6.7%, 0.06s) 更准确，比 Cartesia Ink-2 (3.7%, 0.09s) 和 ElevenLabs Scribe v2 Realtime (3.6%, 0.14s) 更快。First Partial 转录 WER 4.7%（延迟 0.05s），准确率仅次于上述两款模型但速度更快。价格 $2/1000 分钟，为所有测试专有流式模型最低。支持 60+ 语言及实时翻译。

SiliconFlow@SiliconFlowAI · 6月17日72

Just dropped the entire War and Peace (~750K tokens) into GLM-5.2. Then asked it to analyze the book and build an interactive 3D character universe. The result: · 27 characters, 9 factions · ~50 relationships mapped across 66,000 lines No drift, no confusion, still had room to think GLM-5.2 is now live on SiliconFlow🔥 Time to give it a try and show us what you build👇

译智谱 GLM-5.2 已在硅基流动上线，完全开源。该模型将《战争与和平》（约750K tokens）完整输入后，成功分析并构建出包含27个角色、9个派系、约50组关系映射的交互式3D角色宇宙（66,000行代码），无漂移无混淆。GLM-5.2 在 CodeArena 排名第一的可用模型；支持1M上下文窗口，生产级编码能力与 Opus 4.8 相当；提供双思考模式（max 深度、high 质量-成本平衡）。定价：输入缓存/输入/输出分别为 $0.26/1.40/4.40 每百万 token。

歸藏(guizang.ai)@op7418 · 6月17日39

即梦上了 Seedance 2.0 Mini，便宜了不少可以玩玩了

🚨 AI News | TestingCatalog@testingcatalog · 6月17日59

XAI 🔥: Grok Imagine 1.5 Fast has been rolled out! It features a better quality and faster generation time. > 720p videos now render in about 25 seconds, down from 40+ in our previous model.

译XAI 🔥: Grok Imagine 1.5 Fast 已推出！它带来了更好的质量和更快的生成速度。 > 720p 视频现在只需约 25 秒即可渲染，而上一代模型需要 40 秒以上。

karminski-牙医@karminski3 · 6月17日73

GLM-5.2 刚刚正式发布! 给大家带来实测! 直接说结论本次测试中, 提升最大的是Agent能力, 而且是有质的变化! 测试中GLM-5.2 完全不用搜索附近的位置, 就能直接去想要到达的地方. 这一切竟然是它在一开始把地图背下来了! 这在我测试的20多个模型中之前是没有一个模型能做到的, 比如之前的模型想去换电站, 那么都要搜一下附近有哪些换电站(这就会浪费一次tool_call), 而GLM-5.2直接就知道换电站的位置! 从来没用过搜索函数. 这种一开始就把需要的数据内化到上下文中, 并且能够贯穿整个1M上下文进行推理的能力真的是叹为观止. 除此之外, 本次测试后端代码的 Agentic Coding 能力也有提升, 来到了总榜的第二名. 而本次测试暴露出最大的短板则是空间理解. 其实成也萧何败也萧何, 它虽然把换电站的位置都背下来了, 但是去的换电站却不是最近的, 所以虽然记住了, 但是记住了之后在用之前再根据自己当前所在位置推理一下, 他还是没有做到的, 这也是最大的短板了, 强烈建议官方优化一波. #GLM52 #智谱 #智谱AI #AgenticCoding #长上下文能力

译GLM-5.2 正式发布，实测显示其 Agent 能力有质的变化。该模型能将地图数据内化到 1M 上下文中，直接知道换电站位置，全程未调用搜索函数，在测试的 20 多个模型中唯一能做到。后端 Agentic Coding 能力提升至总榜第二名。短板是空间理解：虽记住换电站位置，但无法根据当前位置推理最近站点。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日80

ZAI 🔥: GLM-5.2 by @Zai_org scored 51 point on Artificial Analysis Intelligence Index and got placed on the 4th spot! This made GLM-5.2 a new SOTA open-weight model. Besides that, GLM-5.2 got ranked second on Frontend Code Arena, after currently unavailable Claude Fable 5. Should be ZOTA! 👀

译Z ai 推出 GLM-5.2，在 Artificial Analysis Intelligence Index 上得 51 分排名第四，成为开源权重 SOTA。模型规模同 GLM-5.1（744B 总/40B 活跃参数），智能指数 v4.1 提升 11 分。科学推理显著增强：CritPt +16% 至 21%，HLE +12% 至 40%，GPQA Diamond +3% 至 89%。上下文窗口升至 1M tokens。API 定价 $1.4/$4.4/$0.26 每 1M 输入/输出/缓存命中 token，每任务成本约 $0.46，处智能 vs 成本帕累托前沿。MIT 许可证，已上线 DeepInfra 等第三方平台。

数字生命卡兹克@Khazix0918 · 6月17日56

智谱 YYDS！官方评分也终于出来了，真是真的可以跟 Opus 4.8 掰掰手腕了

译智谱发布GLM-5.2，开源模型（MIT许可），在编码和智能体任务上有显著提升，支持1M上下文窗口。提供两种推理努力级别：GLM-5.2 (max) 极限模式、GLM-5.2 (high) 性能与token效率平衡。API定价与GLM-5.1保持不变。官方评测显示其性能已可与Opus 4.8竞争。

DogeDesigner@cb_doge · 6月17日49

Grok Imagine Video 1.5 Fast nearly doubles video generation speed. It can create a 6-second, 720p video in around 25 seconds, down from over 40 seconds with the previous model. That’s a massive speed upgrade. Here's the comparison:

译Grok Imagine Video 1.5 Fast 的视频生成速度几乎翻倍。它可在约25秒内生成一段6秒720p视频，而上一代模型需要40秒以上。这是一次巨大的速度升级。以下是对比：

Orange AI@oran_ge · 6月17日71

智谱发布的 GLM 5.2 今日正式开源它的的意义在于 GLM 5.2 是首个编程 coding 能力达到 Opus 水平的开源模型我们已经在第一时间将其接入 Cola，作为 beta 模型供大家测试。模型定价与官方相同欢迎大家体验和反馈

译智谱今日正式开源 GLM 5.2，这是首个编程 coding 能力达到 Opus 水平的开源模型。目前该模型已接入 Cola 作为 beta 模型开放测试，定价与官方一致，欢迎体验和反馈。

DogeDesigner@cb_doge · 6月17日45

All these videos were created using Grok Imagine 1.5 Big upgrade. Huge jump in quality. 🚀

译所有这些视频都是用 Grok Imagine 1.5 创建的。重大升级。质量大幅跃升。🚀

歸藏(guizang.ai)@op7418 · 6月17日72

智谱 GLM-5.2 可以在 Codepilot 模型管理里面自行添加哈

译智谱 GLM-5.2 正式发布并开源，定位处理长周期任务。模型具备稳定的100万上下文窗口，并引入思考力度控制。架构上采用 IndexShare 机制，每四层稀疏注意力共享同一个 indexer，在百万 token 上下文中将每 token 计算量降低约 2.9 倍。用户现可在 Codepilot 模型管理中添加使用 GLM-5.2。

SiliconFlow@SiliconFlowAI · 6月17日42

Code like a real G😎 Congrats to @Zai_org 's GLM 5.2 ranks #1 as available model on CodeArena 💪 SiliconFlow is proud to be T+0 launch partner🔥 💰 Input Cache/Input/Output: $ 0.26/1.40/4.40 per 1M tokens 📚 Usable 1M context for entire codebases and project-scale workflows ⚙️ Reliable long-horizon execution that stays on track through complex tasks 💪 Production-grade coding on par with Opus 4.8 🧠 Dual thinking modes: max for depth, high for quality-cost balance And it's still fully open-source. Big shoutout to @Zai_org for keeping frontier model accessible to builders and the community 🙌 Get started today 👇

译智谱 GLM 5.2 在编码评测 CodeArena 的可用模型中排名第一。硅基流动同步首发，定价 Input Cache/Input/Output 分别为 $0.26/1.40/4.40 每百万 token，支持 1M 上下文，具备可靠的长时间任务执行能力，编码性能与 Opus 4.8 持平。提供双思考模式：max 侧重深度，high 侧重质量成本平衡。模型完全开源。

Andrew Milich@milichab · 6月17日44

Imagine Video 1.5 delivers true motion, realistic environments, and consistent text across frames

译Imagine Video 1.5 提供真实运动、逼真环境以及跨帧一致文本

Elon Musk@elonmusk · 6月17日56

Grok Imagine 1.5 is now in wide release

译Grok Imagine 1.5 现已广泛发布

xAI@xai · 6月17日52

Grok Imagine Video 1.5 is here Our new image-to-video model with sharper realism, better physics and faster generations 🧵 http://grok.com/imagine

译Grok Imagine Video 1.5 来了我们新的图像转视频模型，具有更清晰的真实感、更好的物理效果和更快的生成🧵 http://grok.com/imagine

karminski-牙医@karminski3 · 6月17日67

GLM-5.2正式发布啦！一会给大家带来评测视频~

译智谱（Z.ai）发布GLM-5.2模型，编程与智能体任务显著改进，支持1M上下文窗口。提供两种推理模式：GLM-5.2（max）追求极限性能，GLM-5.2（high）平衡性能与token效率。模型权重以MIT许可开源，API定价与GLM-5.1保持一致。

歸藏(guizang.ai)@op7418 · 6月17日79

智谱 GLM-5.2 正式发布和开源了，基准测试成绩相当吓人核心定位是处理长周期任务，并且有稳定的 100 万上下文，模型还引入了思考力度控制。架构层面，GLM-5.2 提出了 IndexShare 机制，每四层稀疏注意力共享同一个 indexer，从而在百万 token 上下文下将每 token 的计算量降低约 2.9 倍。

译智谱发布并开源 GLM-5.2，定位长周期任务，支持 100 万 token 稳定上下文。引入思考力度控制：GLM-5.2 max 追求极限性能，GLM-5.2 high 兼顾效率。架构采用 IndexShare 机制，每四层稀疏注意力共享 indexer，百万 token 下每 token 计算量降低约 2.9 倍。编码与智能体任务表现显著提升。模型权重以 MIT 许可证开源，API 定价与 GLM-5.1 一致。

Orange AI@oran_ge · 6月17日76

GLM 5.2 的意义在于开源模型的 Coding 能力第一次达到了 Opus 水平

译GLM-5.2 开源模型发布，其编程（Coding）能力首次达到Opus级别。该模型在编程与智能体（Agentic）任务上显著提升，支持1M上下文窗口，提供两级推理难度——GLM-5.2 (max) 追求极限性能，GLM-5.2 (high) 平衡性能与token效率。采用MIT许可证开源，API定价与GLM-5.1保持一致。

Berryxia.AI@berryxia · 6月17日73

兄弟们，Claude Fable5 连夜下架！ GLM-5.2 直接宣布免费开源啊！而且直接把GLM-5.2的权重都开源了，而且还是MIT协议，1M上下文，在Coding和Agent任务上还有大幅提升。他们这次重点强化了长程任务的Agent能力，覆盖大规模代码实现、自动化研究、性能优化和复杂调试。实际表现上，Coding、Tool use、Reasoning都比GLM-5.1有明显进步，尤其在需要长时间规划和多步执行的场景里更稳。还提供了Max和High两种推理模式，让你能在极致性能和Token效率之间自由切换。 API价格和上一代一样，同时在Slide生成、长文档处理、长文写作和长上下文角色扮演上也都有明显进步。最关键的是权重彻底开源，社区已经在DeepSWE等基准上验证了它的实力。这意味着有条件的开发者现在就能在本地或者自建环境里，跑起以前只有闭源大模型才能比较稳的长上下文Agent。以前我们总觉得开源模型在真正硬核的Coding和Agent场景还差一截，现在这个差距又被拉近了一大步。 PS：我就希望可以算力更充足点… 😂 别无他求了 ……

译GLM-5.2 以 MIT 协议开源权重，支持 1M 上下文窗口。相比 GLM-5.1，在 Coding、Tool use、Reasoning 上明显提升，尤其在长程 Agent 任务（大规模代码实现、自动化研究、性能优化、复杂调试）中更稳定。提供 Max 和 High 两种推理模式，分别侧重极致性能与 Token 效率平衡。API 价格与上一代相同。社区已在 DeepSWE 等基准上验证其能力。此外，Slide 生成、长文档处理和角色扮演等任务也有进步。

Rohan Paul@rohanpaul_ai · 6月17日65

Catnip just dropped MaineCoon, a 22B real-time audio-visual foundation model that turns text prompts into a live character stream with synced speech, motion, and expression. The first streaming-native model of its kind. sub-second first frame, 47.5FPS on one H100, 30FPS on one RTX Pro 6000, and about 7x faster throughput than comparable audio-visual systems in its internal tests. The big deal is that a normal video generator can wait, revise, and render a finished clip, but a social interface has to move causally, remember its own imperfect past, and stay ahead of playback without breaking identity, voice, or rhythm.

译Catnip推出MaineCoon，一个22B参数的实时音频-视觉基础模型，能将文本提示词转化为带同步语音、动作和表情的实时角色流，支持无限时长交互。作为首个流式原生模型，MaineCoon实现亚秒级首帧，单张H100上达47.5FPS，单张RTX Pro 6000上达30FPS，内部测试吞吐量比同类音频-视觉系统快约7倍。与被动视频生成不同，它能因果性地实时响应，记住自身不完美的过去，并保持角色身份、声音和节奏的连贯一致，让AI从轮次式应答变为“与你同在”的实时存在。

elvis@omarsar0 · 6月17日56

Impressive if true! Better than Claude Fable 5? Wow! Design is really lacking in these frontier models, so I'm very curious to test GLM-5.2 myself. Testing this already on a few internal use cases and will report back on findings.

译智谱发布GLM-5.2，在Design Arena评测中跃居第1名，Elo评分1360，超过已下架的Claude Fable 5，提升4个名次和27 Elo分。该模型为开源权重。DAIR.AI创始人Elvis Saravia表示若属实则令人印象深刻，并称已在内部用例测试，后续将汇报结果。

elvis@omarsar0 · 6月17日70

No time wasting on the frontier of open-weight models. GLM-5.2 looks impressive based on the results I've seen. Very curious to see how it holds on long-horizon tasks.

译Z.AI 发布 GLM-5.2，采用 MIT 许可证开源权重。模型在编码与智能体任务上显著提升，支持 1M 上下文窗口，具备长时能力。提供两种推理力度：GLM-5.2 (max) 与 GLM-5.2 (high)，后者平衡性能与 token 效率。API 定价与 GLM-5.1 相同。DAIR.AI 的 Elvis Saravia 评价其在前沿开放权重模型中表现令人印象深刻，并关注其长时任务表现。

Chubby♨️@kimmonismus · 6月17日83

Lets go, GLM-5.2 released as Open Weights model. tl;dr -1M context window -MIT-licensed open weights -Stronger long-horizon coding agents -Two reasoning modes: max and high -Same API pricing as GLM-5.1 Zai says GLM-5.2 was trained specifically for large-scale implementation, automated research, performance optimization, and complex debugging. Open Source got a serious upgrade today!

译GLM-5.2 作为开放权重模型发布，采用 MIT 许可，拥有 1M 上下文窗口。提供两种推理模式：max（极限推理）和 high（平衡性能与 token 效率）。在编码和智能体任务上有显著提升，专为大规模实现、自动化研究、性能优化和复杂调试训练。API 定价与 GLM-5.1 保持一致。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日77

ZAI 🔥: GLM-5.2 is now available on huggingface! > It comes with a 1M context window and 2 levels of reasoning effort, max and high. MIT license and same pricing as GLM-5.1. > GLM-5.2 scores 46.2% on DeepSWE, the SOTA score among open-weight models.

译ZAI 在 Hugging Face 上发布 GLM-5.2，采用 MIT 开源许可，API 定价与 GLM-5.1 相同。模型支持 1M 上下文窗口，提供两种推理努力级别：max（极致性能）和 high（平衡性能与 token 效率）。在编程和 AI 智能体任务上有显著提升，具备长程任务能力。DeepSWE 基准得分 46.2%，创下开源权重模型的 SOTA 纪录。

Z.ai@Zai_org · 6月17日73

Introducing GLM-5.2: Frontier Intelligence, Open Weights - Significant improvements in coding and agentic tasks - Strong long-horizon capabilities with a 1M context window - Two levels of reasoning effort: GLM-5.2 (max) pushes the limits, while GLM-5.2 (high) strikes a strong balance between performance and token efficiency - MIT-licensed open weights - Same API pricing as GLM-5.1 Tech Blog: http://z.ai/blog/glm-5.2 Weights: http://huggingface.co/zai-org/GLM-5.2 API: http://docs.z.ai/guides/llm/glm-5.2 Coding Plan: http://z.ai/subscribe Chat: http://chat.z.ai

译智谱（Z.ai）正式发布GLM-5.2，采用MIT开源协议开放模型权重。相比前代，在编码和智能体任务上有显著提升，支持1M上下文窗口。提供两种推理努力级别：GLM-5.2（max）追求极致性能，GLM-5.2（high）在效果与token效率间取得平衡。API定价与GLM-5.1保持一致。技术博客、权重及API文档均已上线。

OpenRouter@OpenRouter · 6月17日53

GLM-5.2 from @Zai_org is live on OpenRouter! http://Z.ai's flagship for long-horizon tasks, now with a 1M-token context window capable of being reliable across long, messy coding-agent work.

译来自 @Zai_org 的 GLM-5.2 已在 OpenRouter 上线！ Z.ai 的旗舰模型，专为长期任务设计，现在拥有 1M token 上下文窗口，能够在冗长杂乱的编码智能体工作中保持可靠。

🚨 AI News | TestingCatalog@testingcatalog · 6月17日34

OPENAI 🔥: ChatGPT is about to get a voice mode upgrade as a new “gpt-bidi-1” model has been spotted along with announcement updates. Soon 👀 h/t @M1Astra via DevMode

译OPENAI 🔥: ChatGPT 即将迎来语音模式升级，新模型 “gpt-bidi-1” 已被发现，同时还有公告更新。敬请期待 👀 鸣谢 @M1Astra 来自 DevMode

🚨 AI News | TestingCatalog@testingcatalog · 6月17日41

MISTRAL 🔥: A new “fat” model family has been teased to arrive this summer! The model will be open-weight and initially released in early access for key partners. > This will be the start of a new family of models, fat indeed, but sparse. We're opening up an early access program in July for key partners in research, government and the industry. > This model and upcoming ones will be open-weight. We believe this is critical for our customer confidence and for the research and developer communities. Le Chaton Fat soon? 👀

译Mistral 预告将在今年夏季推出一个新的“fat”模型系列，模型为 open-weight，7 月面向研究、政府和行业关键合作伙伴开放早期访问。官方称该系列“fat indeed, but sparse”（大但稀疏），并强调开放权重对客户信任和开发者社区至关重要。后续模型也将保持开源。此外，推文还提及了“Le Chaton Fat”的代号。

StepFun@StepFun_ai · 6月17日51

Excited to see Step 3.7 Flash live via @novita_labs on @OpenRouter. Built for high-efficiency agent workloads, Step 3.7 Flash combines native multimodal understanding, strong agentic coding capabilities, reliable tool use, and web & visual search workflows for production AI agents. Thanks to the Novita team for helping expand the StepFun ecosystem.

译阶跃星辰的 Step 3.7 Flash 已通过 Novita 在 OpenRouter 上线。该模型专为高效智能体工作负载设计，具备原生多模态理解、强智能体编码能力、可靠工具使用，以及网页与视觉搜索工作流。引用信息强调其高效多模态推理和多步工具使用能力，主要面向编码与智能体应用场景。

SiliconFlow@SiliconFlowAI · 6月16日65

Better Coding with Less Overthinking K2.7 Code takes K2.6's strong base and goes deep Meet @MoonshotAI Kimi K2.7 Code on SiliconFlow — coding-focused, agentic, purpose-built on K2.6. 💰 Cache Input/Input/Output: 0.19/0.94/4.00 per 1M tokens 💪Improved coding & agentic performance, approaches GPT5.5 & Opus 4.8 🧠Less overthinking: 30% lower reasoning-token usage vs K2.6 ⚙️Long-horizon coding: better instruction following, higher end-to-end task completion rates 32B Activated/ 1T Params | VLM | Interleaved Thinking | Multi-Step Tool Call Try it on SiliconFlow ⬇️

译硅基流动上线月之暗面Kimi K2.7 Code模型。基于K2.6改进，专注编码与智能体任务。32B激活/1T总参，VLM多模态，支持交错思考与多步工具调用。相比K2.6，推理token使用减少30%，减少过度思考；长程编码任务指令遵循和完成率提升。性能接近GPT5.5与Opus 4.8。价格：缓存输入0.19/输入0.94/输出4.00每百万token。

Ant Ling@AntLingAGI · 6月16日77

Ling & Ring 2.6 technical report is out, with two open-weight base models. We co-design model + system across architecture, training, and agentic capability: • 7:1 hybrid linear attention • KPop for stable agentic RL: SWE-bench Verified 76.28% • ~4× token efficiency

译Ling & Ring 2.6 技术报告发布，带来两款开放权重基座模型。我们通过架构、训练和智能体能力的协同设计，共同优化模型与系统： • 7:1 混合线性注意力 • 用于稳定智能体强化学习的 KPop：SWE-bench Verified 76.28% • 约 4 倍 token 效率

Alibaba Cloud@alibaba_cloud · 6月16日70

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence. 🧭 Qwen-RobotNav — the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving • Controllable observation protocol • Tool interface for agentic systems 🤖 Qwen-RobotManip — the foundation of interaction. • Unified state-action space across heterogeneous robots • Camera-frame delta poses for coherent cross-embodiment training • Pretrained on a 38,100+ hour open-source corpus 🌍 Qwen-RobotWorld — infinite worlds for physical agents. • Single world model, 20+ embodiments • Natural-language action interface • Predicts physically grounded futures across manipulation, driving, and navigation Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it. 📷 Blog: https://qwen.ai/blog?id=qwen-robotsuite 📖 Report： Qwen-RobotNav: https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotNav.pdf Qwen-RobotManip: https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotManip.pdf Qwen-RobotWorld： https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotWorld.pdf

译阿里云推出 Qwen-Robot 套件，包含三个基础模型：Qwen-RobotNav 统一指令跟随、点目标、对象目标、目标追踪和自动驾驶 5 种导航任务，提供可控观测协议和智能体工具接口；Qwen-RobotManip 统一异构机器人状态-动作空间，基于 38,100+ 小时开源语料预训练；Qwen-RobotWorld 单个世界模型支持 20+ 种具身形态，通过自然语言动作接口预测操控、驾驶、导航等物理可行未来。三者可独立使用，也可组合为通用智能体系统的底层物理世界工具。

Qwen@Alibaba_Qwen · 6月16日72

📣 Introducing the Qwen-Robot Suite — Qwen-RobotNav, Qwen-RobotManip, Qwen-RobotWorld, three foundation models, a full stack for embodied intelligence. 🧭 Qwen-RobotNav — the gateway to mobility. • Unifies 5 navigation tasks in one model: instruction following, point-goal, object-goal, target tracking, autonomous driving • Controllable observation protocol • Tool interface for agentic systems 🤖 Qwen-RobotManip — the foundation of interaction. • Unified state-action space across heterogeneous robots • Camera-frame delta poses for coherent cross-embodiment training • Pretrained on a 38,100+ hour open-source corpus 🌍 Qwen-RobotWorld — infinite worlds for physical agents. • Single world model, 20+ embodiments • Natural-language action interface • Predicts physically grounded futures across manipulation, driving, and navigation Each model is independently useful, and could be composed as physical-world tools.Together, they form the low-level toolkit for general-purpose agentic systems that don't just see the world, but act in it. 📷 Blog: https://qwen.ai/blog?id=qwen-robotsuite 📖 Report： Qwen-RobotNav: https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotNav.pdf Qwen-RobotManip: https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotManip.pdf Qwen-RobotWorld： https://qianwen-res.oss-accelerate.aliyuncs.com/qwenrobot/papers/Qwen_RobotWorld.pdf

译通义千问推出Qwen-Robot Suite，包含三个基础模型：Qwen-RobotNav统一5种导航任务（指令跟随、点目标、物体目标、目标追踪、自动驾驶），具备可控观测协议和智能体工具接口；Qwen-RobotManip实现异构机器人统一状态-动作空间，基于38,100+小时开源语料预训练；Qwen-RobotWorld是单一世界模型，支持20+具身形态，通过自然语言动作接口预测物理世界未来（涵盖操作、驾驶、导航）。三个模型可独立使用或组合，构成通用智能体的底层工具包。

🚨 AI News | TestingCatalog@testingcatalog · 6月16日75

Cartesia shipped Sonic 3.5 and Ink 2, two models built to run as a single real-time voice stack, with transcription on one side and speech on the other. > Ink 2 ranks first for accuracy on Artificial Analysis's streaming speech-to-text board. > Sonic 3.5 places at the top of the real-time text-to-speech view at around 82ms to first audio.

译Cartesia 推出 Sonic 3.5 和 Ink 2 两个模型，作为单一实时语音栈，分别负责文本转语音和语音转文本。Ink 2 在 Artificial Analysis 的流式语音转文字排行榜上排名第一。Sonic 3.5 在实时文本转语音中位列榜首，首音频延迟约 82ms。Cartesia 成为目前唯一同时拥有 #1 听与说模型的提供商。

小互@xiaohu · 6月16日69

字节跳动推出了 Seedance 2.0 的精简版：Seedance 2.0 Mini 相比原版 Seedance 2.0，Mini 版是：价格便宜约 30% 速度是 Seedance 2.0 Fast 版的 2 倍画质和 Fast 版差不多 API 定价大约 $0.073/秒，一条 30 秒的广告视频，用 Mini 生成成本大约 $2.19，比原版 Seedance 2.0 便宜约 30%... 现在在哪能用支持文生视频和图生视频两种方式通过 CapCut App、Dreamina 网页端、桌面端可以使用限时优惠政策叠加活动后，Mini 最高能比原版 Seedance 2.0 便宜 55%，具体分两层： Pro 用户（现有用户或 6 月 21 日前新订阅）：6 月 15 日至 7 月 22 日期间，用 Mini 生成 720P 视频，消耗积分减少 33% 通过 CapCut App 购买 Pro 套餐：最高打 4 折（60% off）

译字节跳动推出Seedance 2.0精简版Mini，价格比原版便宜约30%，速度是Fast版2倍，画质接近。API定价约$0.073/秒，30秒广告视频成本约$2.19。支持文生/图生视频，可通过CapCut App、Dreamina使用。限时优惠：Pro用户生成720P视频积分减33%，CapCut App购买Pro套餐最高4折，叠加后比原版最多便宜55%。

Rohan Paul@rohanpaul_ai · 6月16日58

Pythagoras-Prover just made Lean theorem proving look far less dependent on giant models, with a 4B prover beating DeepSeek-Prover-V2-671B at MiniF2F Pass@32. Shows in formal reasoning, better data geometry can buy back an astonishing amount of scale. A theorem prover is not just a language model writing clever math; it is a machine trying to produce text that survives a compiler with no patience for style, confidence, or almost-right reasoning. The main trick is data efficiency: the team built about 800K Lean-verified examples, trained from easy to hard, then used LoRA so the model learned without updating every parameter.

译Pythagoras-Prover 团队发布最小定理证明器 4B 版本及首个扩散模型概念验证版，均仅 4B 参数。在 MiniF2F 测试中，4B 模型以 86.1% Pass@32 超越 DeepSeek-Prover-V2-671B；32B 版本达 89.8% Pass@32 和 92.6% Pass@2024，创当前最佳成绩。核心在于数据效率：构造约 80 万 Lean 验证示例，按易到难训练，并采用 LoRA 微调避免全参数更新。模型上下文窗口为 8192 tokens。模型、数据及训练流水线将陆续开源。

Chubby♨️@kimmonismus · 6月16日31

Who needs Fable 5 when you got Le Chaton Fat by Mistral

译有了 Mistral 的 Le Chaton Fat，谁还需要 Fable 5？

小互@xiaohu · 6月15日60

兄弟们这个牛P啊 Agentic Detection：一个视觉检测模型用一句话描述，AI 就在图里精确圈出目标你只需要给它一张照片，描述你要找什么，它要把里面的东西用方框圈出来，再告诉你每个框里都是什么。而且不需要你提前训练它... 它还能处理需要物理推理的检测，例如：你说"烟的来源"在哪，它会推理整个画面，定位到森林火灾的起火点你说"需要维修的电线杆"，它能挑出变形的电力设施你说"空着的停车位有哪些"，它能找出来并标记

译Perceptron推出Agentic Detection视觉检测模型，用户只需提供一张图片并用自然语言描述目标，即可自动框出并分类，无需预先训练。该模型还能处理物理推理检测任务，例如定位森林火灾的起火点（“烟的来源”）、挑出变形电线杆（“需要维修的电线杆”）、标记空车位等。引用推文指出，该模型支持用自然语言或示例描述任意物体进行定位。

Berryxia.AI@berryxia · 6月15日60

一个12B的本地模型，直接把Fable 5的推理链条蒸馏进去了，现在你能在消费级显卡上离线跑顶级coding能力。这个Gemma 4 12B Coder GGUF是基于Google的gemma-4-12B-it微调的，专门针对代码生成和复杂推理。训练数据里用了Composer 2.5的真实通过案例，还让Fable 5帮着补全那些难搞的case，结果就是每一步推理都导向能真正跑通的代码。最爽的是它走GGUF格式，12GB显卡就能顺畅跑，甚至CPU也能用。调试、补全代码、生成复杂算法、做链式思考提示，全都本地搞定，不用交API费、不用担心导出管制。以前大家觉得前沿模型要么云端用要么根本跑不了，现在开源社区直接把Fable 5的思考方式打包成能塞进你笔记本的版本。模型还在快速迭代，下载量已经破六千，社区反馈它在本地coding场景里特别能打。这波操作把“强大但受限”和“本地可用”之间的鸿沟给填上了。真正的AI生产力，从来不是等大厂放行，而是社区自己动手把能力解放出来。

译Berry Xia 介绍了基于 Google gemma-4-12B-it 微调的 Gemma 4 12B Coder GGUF 模型。它将 Fable 5 的推理链条蒸馏进 12B 参数模型，训练数据使用 Composer 2.5 真实通过案例并由 Fable 5 辅助补全。GGUF 格式让模型在 12GB 消费级显卡即可本地运行，甚至支持 CPU。模型专为代码生成、调试、复杂算法、链式思考提示等任务优化，无需 API 费用且无导出限制。该模型基于 Google 最新 gemma-4 架构，目前下载量已破六千，社区反馈其在本地 coding 场景表现出色，填补了云端模型与本地可用之间的鸿沟。