# 英伟达分层内存方案应对大模型KV缓存存储压力

- 来源：SemiAnalysis (@SemiAnalysis_)
- 发布时间：2026-05-21 21:01
- AIHOT 分数：18
- AIHOT 链接：https://aihot.virxact.com/items/cmpfj151p06v5sljw1wwjuk8y
- 原文链接：https://x.com/SemiAnalysis_/status/2057446718740799857

## AI 摘要

针对现代AI智能体与长上下文窗口带来的大模型KV缓存存储瓶颈，英伟达提出了分层内存扩展方案。该方案将高速但容量有限的HBM（G1）作为基础，依次扩展至通过PCIe访问的主机DRAM（G2）、节点共享的SSD/NVMe（G3），以及提供近乎无限容量的网络存储（G4）。在GTC 2026上，英伟达更宣布与SpaceX及AnthropicAI合作，提出了通过Starlink连接的近地轨道HDD阵列这一概念性G5层级，旨在将存储边界进一步推向分布式网络架构。

## 正文

With modern agentic workloads and long context windows， a common bottleneck in serving LLMs at scale is where to store all the KV cache. Luckily， KV cache can be extended beyond HBM into other tiers of memory.

Nvidia uses the following naming convention to describe the tiers：
🟠 G1 （HBM）： fastest bandwidth but （relatively） small
🟠 G2 （host DRAM）： still quite fast （traverses PCIe） and an order of magnitude larger than G1
🟠 G3 （SSD/NVMe）： slower， shared across entire node
🟠 G4 （shared network storage）： slowest， effectively unlimited in size

At GTC 2026， in a historic partnership with SpaceXAnthropicAI， Jensen announced the newest tier， G5： a Starlink-attached HDD array in low earth orbit.

Excited to see what G6 will be.