Artificial intelligence is redefining the scale, architecture, and performance expectations of modern data centers. Training large ML models demand infrastructure capable of moving massive data sets through highly parallel, compute-intensive environments—where traditional data center designs simply can’t keep up.
AI Data Center Network Design and Technologies is the first comprehensive, vendor-agnostic guide to the design principles, architectures, and technologies that power AI training and inference clusters. Written by leading experts in AI Data center design, this book helps engineers, architects, and technology leaders understand how to design and scale networks purpose-built for the AI era.
INSIDE, YOU’LL LEARN HOW TO
- Architect scalable, high-radix network fabrics to support xPU (GPE, TPU)-based AI clusters
- Integrate lossless Ethernet/IP fabrics for high-throughput, low-latency data movement
- Align network design with AI/ML workload characteristics and server architectures
- Address challenges in cooling, power, and interconnect design for AI-scale computing
- Evaluate emerging technologies from the Ultra Ethernet Consortium (UEC) and their affect on future AI data centers
- Apply best practices for deployment, validation, and performance measurement in AI/ML environments
With broad coverage of both foundational concepts and emerging innovations, this book bridges the gap between network engineering and AI infrastructure design. It empowers readers to understand not only how AI data centers work—but why they must evolve.
Foreword.. . . . . . . . . . . . . . . . xv
Preface.. . . . . . . . . . . . . . . . . xvii
Acknowledgments.. . . . . . . . . . . . . . xix
About the Authors.. . . . . . . . . . . . . . xxi
1 Wonders in the Workload. . . . . . . . . . . . 1
Whats New in AI Data Center Workloads.. . . . . . . . 1
The Life Cycle of an AI Model.. . . . . . . . . . . 2
Training an AI Model. . . . . . . . . . . . 3
Parallelism. . . . . . . . . . . . . . 4
Job Completion Time (JCT). . . . . . . . . . . 6
Tail Latency.. . . . . . . . . . . . . . 7
Summary. . . . . . . . . . . . . . 16
Test Your Knowledge. . . . . . . . . . . . 17
2 The Common-Man View of AI Data Center Fabrics.. . . . . 19
Training vs. Inference AI Data Centers. . . . . . . . . 19
InfiniBand vs. Ethernet for AI Training Data Centers.. . . . . . 21
Ethernet Hardware Switches and Advanced Software Features.. . . . 22
Handling Elephant Flows.. . . . . . . . . . . 24
Load-Balancing Techniques. . . . . . . . . . . 25
Congestion Management and Mitigation Techniques.. . . . . . 26
Summary. . . . . . . . . . . . . . 28
Test Your Knowledge. . . . . . . . . . . . 29
3 Network Design Considerations. . . . . . . . . . 31
Background Introduction.. . . . . . . . . . . 31
Training Data Center Architecture. . . . . . . . . . 33
Rail-Optimized Design (ROD).. . . . . . . . . . 34
Rail-Unified Design (RUD).. . . . . . . . . . . 42
Rack Design. . . . . . . . . . . . . . 45
Scheduled Fabric. . . . . . . . . . . . . 49
Topologies. . . . . . . . . . . . . . 50
Inference Data Center Architecture. . . . . . . . . 56
Multi-Planar Scale-Out Architectures.. . . . . . . . . 56
Summary. . . . . . . . . . . . . . 63
Test Your Knowledge. . . . . . . . . . . . 64
References. . . . . . . . . . . . . . 66
4 Optics and Cable Management.. . . . . . . . . . 67
Scaling Optics for AI Clusters.. . . . . . . . . . 67
Challenges in Optical Innovation.. . . . . . . . . . 70
Packet Flow. . . . . . . . . . . . . . 70
Transmission Modes.. . . . . . . . . . . . 73
Transceiver Types.. . . . . . . . . . . . . 76
Cable and Connector Types. . . . . . . . . . . 78
Standards.. . . . . . . . . . . . . . 79
Further Innovations in Optics.. . . . . . . . . . 82
Summary. . . . . . . . . . . . . . 83
Test Your Knowledge. . . . . . . . . . . . 85
References. . . . . . . . . . . . . . 86
5 Thermal and Power Efficiency Considerations. . . . . . . 87
Thermal Footprints in AI Data Centers.. . . . . . . . . 87
Airflow Options. . . . . . . . . . . . . 88
Liquid Cooling. . . . . . . . . . . . . 89
Summary. . . . . . . . . . . . . . 93
Test Your Knowledge. . . . . . . . . . . . 94
References. . . . . . . . . . . . . . 95
6 Efficient Load Balancing. . . . . . . . . . . . 97
Per-Flow Load Balancing. . . . . . . . . . . 99
Per-Packet Load Balancing.. . . . . . . . . . . 115
Load-Balancing Mechanism Comparison.. . . . . . . . 117
Summary. . . . . . . . . . . . . . 118
Test Your Knowledge. . . . . . . . . . . . 119
7 RoCEv2 Transport and Congestion Management.. . . . . . 123
Congestion Points. . . . . . . . . . . . 123
Explicit Congestion Notification (ECN).. . . . . . . . 127
Data Center Quantized Congestion Notification (DCQCN).. . . . . 134
Source Flow Control (SFC). . . . . . . . . . . 136
Congestion Signaling.. . . . . . . . . . . . 137
Summary. . . . . . . . . . . . . . 139
Test Your Knowledge. . . . . . . . . . . . 140
8 IP Routing for AI/ML Fabrics.. . . . . . . . . . 143
Dynamic IP Routing Options. . . . . . . . . . 144
eBGP Underlay for Three-Stage/Five-Stage Fabric for an AI Data Center..
. 145
Multi-tenancy for an AI/ML Cluster Data Center Network. . . . . 171
Microsegmentation and Multi-tenancy for an AI/ML Data Center.. . . 177
Extending IP Routing to the Server. . . . . . . . . 177
Traffic Engineering in the AI Data Center Fabric.. . . . . . . 178
Segment Routing and SRv6 for AI/ML Fabrics. . . . . . . 179
Summary. . . . . . . . . . . . . . 184
Test Your Knowledge. . . . . . . . . . . . 185
References. . . . . . . . . . . . . . 187
9 Storage Network Design and Technologies.. . . . . . . 189
The AI Data Center Life Cycle and Storage Networks.. . . . . . 191
Storage Network Design Types. . . . . . . . . . 193
Block, Object, and File Storage Systems.. . . . . . . . 198
NVMe-oF for Block-Level Access.. . . . . . . . . . 199
NVMe-o-RDMA/RoCEv2 State Machine. . . . . . . . 206
High-Performance File Systems. . . . . . . . . . 208
GPUDirect Storage.. . . . . . . . . . . . 211
Summary. . . . . . . . . . . . . . 217
Test Your Knowledge. . . . . . . . . . . . 218
References. . . . . . . . . . . . . . 219
10 AI Network Performance KPIs. . . . . . . . . . 221
Significance of Performance Benchmarking. . . . . . . 221
MLCommons for AI Data Centers.. . . . . . . . . 223
MLCommons Initiatives. . . . . . . . . . . 224
MLCommons Benchmarking Suites.. . . . . . . . . 224
Benchmarking a Data Center for Machine Learning. . . . . . 225
Summary. . . . . . . . . . . . . . 226
Test Your Knowledge. . . . . . . . . . . . 227
References. . . . . . . . . . . . . . 228
11 Monitoring and Telemetry.. . . . . . . . . . . 229
Exploring Monitoring Options.. . . . . . . . . . 229
Network Monitoring in an AI/ML Data Center Network.. . . . . 231
In-Band Flow Analyzer (IFA). . . . . . . . . . . 234
Corrective Actions. . . . . . . . . . . . 237
Summary. . . . . . . . . . . . . . 238
Reference.. . . . . . . . . . . . . . 238
12 Ultra Ethernet Consortium (UEC). . . . . . . . . 239
UEC Developments and Working Groups.. . . . . . . . 241
UEC Key Terminology.. . . . . . . . . . . . 244
The UEC and Network Architectures. . . . . . . . . 246
A New Protocol Stack.. . . . . . . . . . . . 247
Data Plan: Packet Forwarding Options.. . . . . . . . 252
Packet Delivery Modes.. . . . . . . . . . . 257
Congestion Management (CM) in the UEC Specification.. . . . . 261
Packet Trimming and Fast Retransmissions. . . . . . . . 264
Link Layer Reliability (LLR) Mechanism.. . . . . . . . 265
In-Network Collectives (INC) and xCCL.. . . . . . . . 266
Management and Orchestration. . . . . . . . . . 268
Interoperability and Backward Compatibility.. . . . . . . 269
Compliance and Certification.. . . . . . . . . . 269
UEC Challenges and Future Directions.. . . . . . . . 269
Comparing UEC to InfiniBand and RoCEv2. . . . . . . . 270
Summary. . . . . . . . . . . . . . 271
Test Your Knowledge. . . . . . . . . . . . 272
References. . . . . . . . . . . . . . 273
13 Scale-Up Systems.. . . . . . . . . . . . . 275
Key Building Blocks of Scale-Up Systems.. . . . . . . . 278
Scale-Up Ethernet Transport (SUE-T). . . . . . . . . 281
Ultra Accelerator Link (UALink).. . . . . . . . . . 286
Memory Coherence in Scale-Up Systems.. . . . . . . . 291
Scale-Up Systems: Key Differences and Similarities.. . . . . . 292
Summary. . . . . . . . . . . . . . 294
Test Your Knowledge. . . . . . . . . . . . 295
References. . . . . . . . . . . . . . 297
14 Conclusion.. . . . . . . . . . . . . . 299
DC Network Role for AI.. . . . . . . . . . . 299
Caveats and Challenges.. . . . . . . . . . . 300
Future Developments.. . . . . . . . . . . . 302
Final Remarks.. . . . . . . . . . . . . 304
References. . . . . . . . . . . . . . 305
Appendix A Questions and Answers.. . . . . . . . . . 307
Appendix B Acronyms.. . . . . . . . . . . . . 329
9780135436288, TOC, 1/8/2026
Mahesh Subramaniam is a proven leader in AI data centers and next-generation networking technologies. He played a key role in defining the advanced software roadmap for AI fabrics, which are now deployed in production networks across various AI data centers worldwide. As the Senior Director of Product Management for AI Data Centers at HPE Juniper Networks, he leads cutting-edge innovations in AI infrastructure and cloud-scale solutions, optimized for both scale-up and scale-out architectures. Mahesh is also an inventor with several technology patents and a recognized speaker at global forums, including the UEC Summit, OCP, and Tokyo MPLS forum. His work has earned him accolades, including the CEO Excellence Award, the Record High Business Award, and the Star Award for the Cloud DC Reference Architecture. With a remarkable history in the networking industry, Mahesh has a strong track record of leading products and managing technical and business strategies across cross-functional teams.
Michal Styszynski is a Product Management Director in the Data Center Networks Business Unit (DC BU) at HPE Juniper Networking. Michal has been with Juniper Networks for more than 13 years. Before his current role, he was a Technical Marketing Engineer (TME) in the DC BU and a Technical Solution Consultant at Juniper. In these roles, he handled data center projects for large-scale enterprises and federal networks and worked closely with Tier 2 cloud and telco-cloud service providers. Before joining Juniper, he spent around 10 years working at Orange, FT R&D, and TPSA Polpak engineering. Michal graduated from the Electronics & Telecommunications department at Wroclaw University of Science & Technology with a masters degree in engineering. He also holds an MBA from Paris Sorbonne Business School and is a JNCIE-DC#523, as well as PEC, PLC, and PMC certified from the Product School in San Francisco.
Himanshu Tambakuwala is a highly accomplished networking expert and certified technical architect whose experience spans the entire product lifecycle[ md]from hands-on engineering to product strategy. He is a JNCIE holder in Data Center and Service Provider technologies and an inventor with four granted technology patents and two additional patents currently filed. As a Product Manager at Juniper Networks, Himanshu was instrumental in defining the feature roadmap for network fabrics that power cutting-edge AI/ML data centers.