谷歌在Cloud Next 2026上首次将TPU v8拆分为训练芯片TPU 8t和推理芯片TPU 8i,宣称训练速度提升2.8倍,推理性价比提高80%,并通过自研Arm架构Axion CPU实现全栈垂直控制。同时,DeepSeek V4-Pro成为首个在华为昇腾NPU平台上完成训练与推理验证的前沿大模型,其定价与昇腾950芯片量产计划挂钩,输出成本远低于主流西方模型。这标志着美国制裁试图阻止的硬件脱钩可能已不可逆转,全球AI芯片竞争进入新阶段。
Google's TPU v8 and Huawei's Ascend NPU platform: the global Chipwar just began
At Cloud Next 2026, Google unveiled its eighth-generation TPU as two separate chips for the first time: the TPU 8t for training and the TPU 8i for inference, claiming up to 2.8x faster training and 80% higher performance per dollar for inference compared to last year's Ironwood.
The 8t was designed by Broadcom, the 8i by MediaTek, applying mobile-edge efficiency logic to inference while maximizing raw throughput on training. The 8t connects up to 9,600 accelerators via optical-circuit switches, dwarfing NVIDIA's 576-GPU NVLink domain, and a new Virgo network fabric scales beyond one million chips for a single training job.