# RNG：规模化部署的扁平数据中心网络

- 来源：Rohan Paul (@rohanpaul_ai)
- 发布时间：2026-05-30 18:35
- AIHOT 分数：69
- AIHOT 链接：https://aihot.virxact.com/items/cmps87jqo07otsllj3ehw0b3w
- 原文链接：https://x.com/rohanpaul_ai/status/2060671309940396252

## AI 摘要

亚马逊推出了名为“Resilient Network Graphs”(RNG) 的新数据中心网络架构。该设计以扁平的准随机图替代了传统的树形网络，并通过Spraypoint路由系统和ShuffleBox布线设备在多个独立路径上分散流量。测试显示，RNG在性能上与传统fat-tree网络持平，但硬件需求减少69%，吞吐量提升33%，并估算成本可降低9%至45%。该架构现已成为大多数AWS工作负载的默认网络，其分散负载的能力有助于提升AI集群训练效率。

## 正文

Amazon unveiled "Resilient Network Graphs，" （RNG） a data center network that reduces hardware needs by 69% and raises throughput by 33%. It is now default for most AWS workloads.

They revealed that it has been quietly deploying the design across its data centers since last year， and it is now the default data center network for most AWS workloads.

It replaced tree-shaped datacenter networks with flatter random ones that waste less capacity.

For decades， fat-tree networks worked because they were predictable， but their layered shape can concentrate traffic at choke points while other links sit underused.
So the problem is that fat-tree networks are easy to run， but their hierarchy can trap traffic on a few links while other links sit unused.

"Resilient Network Graphs，" （RNG） fixes this by connecting routers in a flat quasi-random graph， so many different paths exist between servers instead of a few fixed routes through upper layers.

RNG attacks the problem by flattening the fabric into a quasi-random graph， where many small independent paths replace a few privileged routes.

Its routing system， Spraypoint， spreads traffic across many separate paths， while its ShuffleBox cabling device makes the random-looking wiring practical to build and expand.

Instead of asking every packet to chase the shortest path， Spraypoint fans traffic outward and then guides it back through distributed waypoints， creating many edge-disjoint paths without requiring exotic switch memory.

The authors tested RNG in 2 real Amazon production fabrics and compared it with fat-tree networks using transport and storage workloads.

The main result is that RNG matched fat-tree application performance， found far more separate paths than common routing methods， and was estimated to cost 9% to 45% less.

The hard part is not the idea， but the engineering， because routing in a random mesh needs smarter path selection and the physical system must manage millions of fiber connections without becoming impossible to operate.

This is important for AI clusters because training traffic is huge， synchronized， and sensitive to congestion， so a network that spreads load better can make expensive GPUs spend less time waiting.

----

Link - arxiv. org/abs/2604.15261

Title： "RNG： Flat Datacenter Networks at Scale"
