亚马逊推出了名为“Resilient Network Graphs”(RNG) 的新数据中心网络架构。该设计以扁平的准随机图替代了传统的树形网络,并通过Spraypoint路由系统和ShuffleBox布线设备在多个独立路径上分散流量。测试显示,RNG在性能上与传统fat-tree网络持平,但硬件需求减少69%,吞吐量提升33%,并估算成本可降低9%至45%。该架构现已成为大多数AWS工作负载的默认网络,其分散负载的能力有助于提升AI集群训练效率。
Amazon unveiled "Resilient Network Graphs," (RNG) a data center network that reduces hardware needs by 69% and raises throughput by 33%. It is now default for most AWS workloads.
They revealed that it has been quietly deploying the design across its data centers since last year, and it is now the default data center network for most AWS workloads.
It replaced tree-shaped datacenter networks with flatter random ones that waste less capacity.
For decades, fat-tree networks worked because they were predictable, but their layered shape can concentrate traffic at choke points while other links sit underused. So the problem is that fat-tree networks are easy to run, but their hierarchy can trap traffic on a few links while other links sit unused.