Another first for our AI fleet... a supercomputing cluster of NVIDIA GB300s with 4600+ GPUs and featuring next gen InfiniBand. First of many as we scale to hundreds of thousands of GB300s across our DCs, and rethink every layer of the stack across silicon, systems, and software to support next gen AI workloads.

译首座 NVIDIA GB300 超级计算集群投入使用，集成 4600 余块 GPU 与新一代 InfiniBand 网络。这只是开始，后续计划将规模扩展至数十万台 GB300，并重构从芯片、系统到软件的全栈架构，以支撑下一代 AI 工作负载。

Sundar Pichai@sundarpichai · 10月7日

We’re investing $4B in cloud and AI infrastructure in Arkansas through 2027, which includes a new data center in West Memphis, our first in the state. All part of our overall investments to help the US continue to lead the world in AI innovation. https://blog.google/inside-google/company-announcements/google-american-innovation-arkansas

译Google 计划在 2027 年前向阿肯色州投资 40 亿美元用于云与 AI 基础设施建设，包括在 West Memphis 建设该州首个数据中心，以支持美国在全球 AI 创新领域保持领先。

Satya Nadella@satyanadella · 10月3日

Our approach to AI infra is simple: build the most fungible and flexible fleet to meet the real world's needs across inference and training as @scottgu shared with @Kantrowitz. And we are already doing it at scale today, as we power the biggest AI workloads like Copilot and ChatGPT, APIs that power 3P products & enterprise workloads and high scale training.

译以构建高流动性、高灵活性的 AI 基础设施集群为核心策略，同时满足推理与训练需求。目前已规模化支撑 Copilot、ChatGPT 及第三方企业工作负载。

Lilian Weng@lilianweng · 10月2日

GPUs are expensive and setting up the infrastructure to make GPUs work for you properly is complex, making experimentation on cutting-edge models challenging for researchers and ML practitioners. Providing high quality research tooling is one of the most effective ways to improve research productivity of the wider community and Tinker API is one step towards our mission there. Tinker API is built on top of our experimental results on fine-tuning with LoRA: https://thinkingmachines.ai/blog/lora/ Beta starts and you can join the waitlist today: https://thinkingmachines.ai/tinker/

译GPUs 价格昂贵，且搭建让 GPUs 正常工作的基础设施十分复杂，这使得研究人员和机器学习从业者难以对前沿模型进行实验。

Satya Nadella@satyanadella · 10月2日

AI Economics depends on efficient token factories and highly performant agent frameworks that deliver enterprise outcomes! That is why we are excited about Microsoft Agent Framework. You can now build, orchestrate, and scale multi-agent systems in Azure AI Foundry using this framework. It brings together our best-in-class runtime from AutoGen with the enterprise foundations of Semantic Kernel, with compliance, observability, and deep integration out of the box.

译微软发布 Microsoft Agent Framework，支持在 Azure AI Foundry 中构建、编排和扩展多智能体系统。该框架整合 AutoGen 运行时与 Semantic Kernel 的企业级基础，提供开箱即用的合规性、可观测性和深度集成能力。

Satya Nadella@satyanadella · 9月30日

Welcome Grok 4 to Azure AI Foundry!

译xAI 发布的 Grok 4 大模型现已正式入驻 Azure AI Foundry 平台，提供高级推理、实时洞察和增强记忆功能，由 Azure 提供底层算力支持。

Sam Altman@sama · 9月24日

Progress at our datacenter in Abilene. Fun to visit yesterday!

译阿比林数据中心建设取得新进展，昨日前往实地参观，整体体验愉快。项目现场推进顺利，实地考察令人印象深刻，对当前建设进度表示满意。

Hao AI Lab@haoailab · 9月24日

[1/N]🚀New decoding paradigm drop!🚀 Introducing Lookahead Reasoning(LR): step-level speculation that stacks with Speculative Decoding(SD). It has been accepted to #NeurIPS2025 🎉 📖 Blog: https://hao-ai-lab.github.io/blogs/lookaheadreasoning/ 💻 Code: https://github.com/hao-ai-lab/LookaheadReasoning 📄 Paper: https://arxiv.org/abs/2506.19830

译[1/N]🚀新的解码范式发布！🚀

Satya Nadella@satyanadella · 9月23日

An important breakthrough from our teams: a new approach to liquid cooling that uses microfluidics, opening the door to more efficient, sustainable, and power-dense datacenters than conventional methods. https://news.microsoft.com/source/features/innovation/microfluidics-liquid-cooling-ai-chips/

译微软团队开发出基于 microfluidics 的液冷新方案，较传统方式更高效、可持续，可支持更高功率密度的数据中心，为AI芯片散热提供新路径。

Sam Altman@sama · 9月23日

Grateful to Jensen for the almost-decade of partnership!

译OpenAI 与 NVIDIA 宣布合作建设 gigascale AI 工厂，将部署数百万 NVIDIA GPU，提供 10 gigawatts 算力支持 OpenAI 数据中心扩张。双方合作关系已持续近十年。

Hao AI Lab@haoailab · 9月22日

🚀 Thrilled to share that our lab has THREE papers accepted at #NeurIPS2025 on AI efficiency from reasoning to video generation. Come hang out with us, it's going to be a lot of fun this year here local to UCSD! 😎 📊 Efficiently Scaling LLM Reasoning with Certaindex Introduces Certaindex, an algorithm-agnostic metric measuring evolving stability that signals when further computation won't change results, plus Dynasor serving system achieving up to 50% compute savings and 3.3x higher efficiency 📎 https://arxiv.org/abs/2412.20993 @FuYichao123 @Junda_Chen_ ⚡ Scaling Speculative Decoding with Lookahead Reasoning Exploits step-level parallelism to overcome token-level speculative decoding limitations, boosting speedup from 1.4x to 2.1x on GSM8K 📎 https://arxiv.org/abs/2506.19830 @FuYichao123 🎥 VSA: Faster Video Diffusion with Trainable Sparse Attention is a hardware-efficient sparse attention for video DiTs that cuts training FLOPS by 2.53× with zero loss in diffusion quality 📎 https://arxiv.org/abs/2505.13389 @PY_Z001 @BrianChen112900 Congrats to all collaborators! 🎉

译🚀 很高兴分享我们实验室有三篇论文被 #NeurIPS2025 接收，主题是从推理到视频生成的 AI 效率。来和我们一起玩吧，今年在 UCSD 本地举办，一定会很有趣！😎

Satya Nadella@satyanadella · 9月3日

Our breakthrough work on an analog optical computer points to new ways to solve complex real-world problems with much greater efficiency. Super to see this published today in @Nature. https://news.microsoft.com/source/features/innovation/microsoft-analog-optical-computer-cracks-two-practical-problems-shows-ai-promise/

译微软模拟光学计算机成果今日发表于 Nature，能以更高效率解决复杂现实问题，已验证破解两个实际问题并展现 AI 应用前景。

Hao AI Lab@haoailab · 8月5日67

Try FastWan at https://fastwan.fastvideo.org/!

译FastVideo团队推出FastWan系列快速视频生成模型。该模型采用名为“稀疏蒸馏”的新训练方法，能将视频去噪速度提升70倍。在单块H200 GPU上，仅需5秒即可生成一段5秒的视频。团队提供了在线演示，并依据Apache-2.0许可证完全开源了模型、代码和数据。

Hao AI Lab@haoailab · 8月5日

(1/n) 🚀 With FastVideo, you can now generate a 5-second video in 5 seconds on a single H200 GPU! Introducing FastWan series, a family of fast video generation models trained via a new recipe we term as “sparse distillation”, to speed up video denoising time by 70X! 🖥️ Live demo: https://fastwan.fastvideo.org/ (Thanks to @gmicloud for the support!) 🔗 Blog: https://hao-ai-lab.github.io/blogs/fastvideo_post_training/ 🔓 We fully open-source our models, code, and data with Apache-2.0 licenses

译(1/n) 🚀 借助 FastVideo，你现在可以在单张 H200 GPU 上用 5 秒生成一段 5 秒视频！

Yann LeCun@ylecun · 7月19日

Hardware independent LLM inference engine from ZML.

译ZML 发布 LLMD 技术预览版，提供硬件无关的 LLM 推理方案。单容器同时支持 NVIDIA 与 AMD GPU，镜像仅 2.4GB，支持挂载即运行的高性能部署。

DeepSeek@deepseek_ai · 2月28日

🚀 Day 5 of #OpenSourceWeek: 3FS, Thruster for All DeepSeek Data Access Fire-Flyer File System (3FS) - a parallel file system that utilizes the full bandwidth of modern SSDs and RDMA networks. ⚡ 6.6 TiB/s aggregate read throughput in a 180-node cluster ⚡ 3.66 TiB/min throughput on GraySort benchmark in a 25-node cluster ⚡ 40+ GiB/s peak throughput per client node for KVCache lookup 🧬 Disaggregated architecture with strong consistency semantics ✅ Training data preprocessing, dataset loading, checkpoint saving/reloading, embedding vector search & KVCache lookups for inference in V3/R1 📥 3FS → https://github.com/deepseek-ai/3FS ⛲ Smallpond - data processing framework on 3FS → https://github.com/deepseek-ai/smallpond

译DeepSeek发布开源并行文件系统3FS（Fire-Flyer File System），专为现代SSD和RDMA网络优化。180节点集群可实现6.6 TiB/s聚合读取吞吐量，25节点GraySort测试达3.66 TiB/min，单节点KVCache查找峰值超40 GiB/s。采用分离式架构与强一致性语义，支持训练数据预处理、检查点存取及V3/R1推理的KVCache查找。同步开源Smallpond数据处理框架。

DeepSeek@deepseek_ai · 2月27日59

🚀 Day 4 of #OpenSourceWeek: Optimized Parallelism Strategies ✅ DualPipe - a bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training. 🔗 https://github.com/deepseek-ai/DualPipe ✅ EPLB - an expert-parallel load balancer for V3/R1. 🔗 https://github.com/deepseek-ai/eplb 📊 Analyze computation-communication overlap in V3/R1. 🔗 https://github.com/deepseek-ai/profile-data

译🚀 #开源周第4天：优化的并行策略 ✅ DualPipe - 一种用于V3/R1训练中计算-通信重叠的双向流水线并行算法。 🔗 https://github.com/deepseek-ai/DualPipe ✅ EPLB - 适用于V3/R1的专家并行负载均衡器。 🔗 https://github.com/deepseek-ai/eplb 📊 分析V3/R1中的计算-通信重叠情况。 🔗 https://github.com/deepseek-ai/profile-data

DeepSeek@deepseek_ai · 2月26日

🚨 Off-Peak Discounts Alert! Starting today, enjoy off-peak discounts on the DeepSeek API Platform from 16:30–00:30 UTC daily: 🔹 DeepSeek-V3 at 50% off 🔹 DeepSeek-R1 at a massive 75% off Maximize your resources smarter — save more during these high-value hours!

译🚨 非高峰折扣提醒！即日起，每日 UTC 16:30–00:30，DeepSeek API 平台享受非高峰折扣： 🔹 DeepSeek-V3 五折 🔹 DeepSeek-R1 高达 75% 折扣更智能地最大化资源利用——在这些高价值时段节省更多！

DeepSeek@deepseek_ai · 2月26日

🚀 Day 3 of #OpenSourceWeek: DeepGEMM Introducing DeepGEMM - an FP8 GEMM library that supports both dense and MoE GEMMs, powering V3/R1 training and inference. ⚡ Up to 1350+ FP8 TFLOPS on Hopper GPUs ✅ No heavy dependency, as clean as a tutorial ✅ Fully Just-In-Time compiled ✅ Core logic at ~300 lines - yet outperforms expert-tuned kernels across most matrix sizes ✅ Supports dense layout and two MoE layouts 🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM

译🚀 #OpenSourceWeek 第三天：DeepGEMM 推出 DeepGEMM - 一个支持 dense 和 MoE GEMM 的 FP8 GEMM 库，为 V3/R1 的训练和推理提供支持。 ⚡ 在 Hopper GPU 上可达 1350+ FP8 TFLOPS ✅ 无繁重依赖，简洁如教程 ✅ 完全 Just-In-Time 编译 ✅ 核心逻辑仅约 300 行 - 却在大多数矩阵尺寸上超越专家调优的 kernel ✅ 支持 dense 布局及两种 MoE 布局 🔗 GitHub: https://github.com/deepseek-ai/DeepGEMM

DeepSeek@deepseek_ai · 2月25日

🚀 Day 2 of #OpenSourceWeek: DeepEP Excited to introduce DeepEP - the first open-source EP communication library for MoE model training and inference. ✅ Efficient and optimized all-to-all communication ✅ Both intranode and internode support with NVLink and RDMA ✅ High-throughput kernels for training and inference prefilling ✅ Low-latency kernels for inference decoding ✅ Native FP8 dispatch support ✅ Flexible GPU resource control for computation-communication overlapping 🔗 GitHub: https://github.com/deepseek-ai/DeepEP

译DeepSeek开源周第二日推出DeepEP，这是首个面向MoE模型训练与推理的开源EP通信库。该库针对专家并行场景优化，支持NVLink和RDMA的all-to-all通信，既提供高吞吐kernel用于训练与推理预填充，也提供低延迟kernel用于解码阶段。同时原生支持FP8精度，并允许灵活的GPU资源控制以实现计算与通信重叠，显著提升MoE模型效率。

DeepSeek@deepseek_ai · 2月24日

🚀 Day 1 of #OpenSourceWeek: FlashMLA Honored to share FlashMLA - our efficient MLA decoding kernel for Hopper GPUs, optimized for variable-length sequences and now in production. ✅ BF16 support ✅ Paged KV cache (block size 64) ⚡ 3000 GB/s memory-bound & 580 TFLOPS compute-bound on H800 🔗 Explore on GitHub: https://github.com/deepseek-ai/FlashMLA

译🚀 #OpenSourceWeek 第一天：FlashMLA 很荣幸分享 FlashMLA —— 我们针对 Hopper GPU 的高效 MLA 解码内核，针对变长序列优化，现已投入生产。 ✅ 支持 BF16 ✅ 分页 KV 缓存（块大小 64） ⚡ 在 H800 上达 3000 GB/s 内存受限与 580 TFLOPS 计算受限 🔗 在 GitHub 上探索：https://github.com/deepseek-ai/FlashMLA

DeepSeek@deepseek_ai · 2月21日

🚀 Day 0: Warming up for #OpenSourceWeek! We're a tiny team @deepseek_ai exploring AGI. Starting next week, we'll be open-sourcing 5 repos, sharing our small but sincere progress with full transparency. These humble building blocks in our online service have been documented, deployed and battle-tested in production. As part of the open-source community, we believe that every line shared becomes collective momentum that accelerates the journey. Daily unlocks are coming soon. No ivory towers - just pure garage-energy and community-driven innovation.

译DeepSeek AI 预告开源周活动，将于下周起陆续开源 5 个代码仓库。作为探索 AGI 的小团队，他们计划透明分享那些已在生产环境中实战验证的代码模块。团队相信开源社区的集体力量能加速行业进步，强调此次发布将摒弃象牙塔式的封闭开发，以"车库能量"和社区驱动创新的形式呈现。

DeepSeek@deepseek_ai · 2月18日

🚀 Introducing NSA: A Hardware-Aligned and Natively Trainable Sparse Attention mechanism for ultra-fast long-context training & inference! Core components of NSA: • Dynamic hierarchical sparse strategy • Coarse-grained token compression • Fine-grained token selection 💡 With optimized design for modern hardware, NSA speeds up inference while reducing pre-training costs—without compromising performance. It matches or outperforms Full Attention models on general benchmarks, long-context tasks, and instruction-based reasoning. 📖 For more details, check out our paper here: https://arxiv.org/abs/2502.11089

译NSA是一种硬件对齐且原生可训练的稀疏注意力机制，专为超快速长上下文训练与推理设计。其核心采用动态分层稀疏策略，结合粗粒度token压缩与细粒度token选择。通过针对现代硬件的优化，NSA在加速推理、降低预训练成本的同时不损失性能，在通用基准、长上下文任务及指令推理中匹配或超越Full Attention模型。