Jeff Dean 等 Google 同事发布论文,回顾 TPU v2 到 Ironwood 五代训练超算的演进,将于 2026 年 7/8 月发表于 IEEE Micro。关键变化:TPU v2 采用气冷,v3 起改为水冷;互联从 2D 升级为 3D torus;每 pod 芯片数从 256 增至 9216;每 flop 能效提升约 30 倍。此外,Google 内部工作负载已大幅转向基于 Transformer 的模型。
My @Google colleagues @NormJouppi, Sridhar Lakshmanamurthy, Cliff Young, and David Patterson recently wrote a paper that will appear in the July/August 2026 edition of @ieeemicro titled "Google's Training Supercomputers from TPU v2 to Ironwood: Architectural Stability, Scale, Resilience, Power Efficiency, and Sustainability Across Five Generations". It's chock full of interesting data about the evolution of TPU chip generations, as well as how workloads at Google have transformed over time (hint: lots more transformer-based models!), and how the generations have gotten ~30X more energy efficient per flop.
Lots of changes over these generations: Air cooling in TPUv2 to water cooling in TPUv3 onwards 2D to 3D torus-based interconnects 30X improvement TFLOPS/Watt 256 chips (TPUv2) to 9216 chips (Ironwood) per pod
Read the full paper: https://arxiv.org/abs/2606.15870