The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design CANARY, the first congestion-aware in-network allreduce algorithm. CANARY uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 CANARY prototype and evaluate it on a Tofino switch. We then validate CANARY through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art.

Canary: Congestion-aware in-network allreduce using dynamic trees / De Sensi, D.; Costa Molero, E.; Di Girolamo, S.; Vanbever, L.; Hoefler, T.. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 152:(2024), pp. 70-82. [10.1016/j.future.2023.10.010]

Canary: Congestion-aware in-network allreduce using dynamic trees

De Sensi D.
;
2024

Abstract

The allreduce operation is an essential building block for many distributed applications, ranging from the training of deep learning models to scientific computing. In an allreduce operation, data from multiple hosts is aggregated together and then broadcasted to each host participating in the operation. Allreduce performance can be improved by a factor of two by aggregating the data directly in the network. Switches aggregate data coming from multiple ports before forwarding the partially aggregated result to the next hop. In all existing solutions, each switch needs to know the ports from which it will receive the data to aggregate. However, this forces packets to traverse a predefined set of switches, making these solutions prone to congestion. For this reason, we design CANARY, the first congestion-aware in-network allreduce algorithm. CANARY uses load balancing algorithms to forward packets on the least congested paths. Because switches do not know from which ports they will receive the data to aggregate, they use timeouts to aggregate the data in a best-effort way. We develop a P4 CANARY prototype and evaluate it on a Tofino switch. We then validate CANARY through simulations on large networks, showing performance improvements up to 40% compared to the state-of-the-art.
2024
Allreduce; In-network compute; Load balancing
01 Pubblicazione su rivista::01a Articolo in rivista
Canary: Congestion-aware in-network allreduce using dynamic trees / De Sensi, D.; Costa Molero, E.; Di Girolamo, S.; Vanbever, L.; Hoefler, T.. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - 152:(2024), pp. 70-82. [10.1016/j.future.2023.10.010]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1695378
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact