Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully analyze performance. We demonstrate techniques for simple cabling and cabling validation as well as a novel high-performance routing architecture for InfiniBand-based low-diameter topologies. Our real-world benchmarks show SF’s strong performance for many modern workloads such as deep neural network training, graph analytics, or linear algebra kernels. SF outperforms non-blocking Fat Trees in scalability while offering comparable or better performance and lower cost for large network sizes. Our work can facilitate deploying SF while the associated (open-source)1 routing architecture is fully portable and applicable to accelerate any low-diameter interconnect.

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network / Blach, N.; Besta, M.; De Sensi, D.; Domke, J.; Harake, H.; Li, S.; Iff, P.; Konieczny, M.; Lakhotia, K.; Kubicek, A.; Ferrari, M.; Petrini, F.; Hoefler, T.. - (2024), pp. 1025-1044. ( Symposium on Networked Systems, Design and Implementation Santa Clara, USA ).

A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network

De Sensi D.;
2024

Abstract

Novel low-diameter network topologies such as Slim Fly (SF) offer significant cost and power advantages over the established Fat Tree, Clos, or Dragonfly. To spearhead the adoption of low-diameter networks, we design, implement, deploy, and evaluate the first real-world SF installation. We focus on deployment, management, and operational aspects of our test cluster with 200 servers and carefully analyze performance. We demonstrate techniques for simple cabling and cabling validation as well as a novel high-performance routing architecture for InfiniBand-based low-diameter topologies. Our real-world benchmarks show SF’s strong performance for many modern workloads such as deep neural network training, graph analytics, or linear algebra kernels. SF outperforms non-blocking Fat Trees in scalability while offering comparable or better performance and lower cost for large network sizes. Our work can facilitate deploying SF while the associated (open-source)1 routing architecture is fully portable and applicable to accelerate any low-diameter interconnect.
2024
Symposium on Networked Systems, Design and Implementation
network topology;
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A High-Performance Design, Implementation, Deployment, and Evaluation of The Slim Fly Network / Blach, N.; Besta, M.; De Sensi, D.; Domke, J.; Harake, H.; Li, S.; Iff, P.; Konieczny, M.; Lakhotia, K.; Kubicek, A.; Ferrari, M.; Petrini, F.; Hoefler, T.. - (2024), pp. 1025-1044. ( Symposium on Networked Systems, Design and Implementation Santa Clara, USA ).
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1753560
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? ND
social impact