Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. Through extensive data-driven simulations, we aim to highlight the key features of workload traffic traces that influence response time performance under simple yet representative dispatching policies. For a given computational power budget, we vary the cluster size, i.e., the number of available servers. A job-level analysis reveals that Join Idle Queue (JIQ) and Least Work Left (LWL) exhibit an optimal working point for a fixed utilization coefficient as the number of servers is varied, whereas Round Robin (RR) demonstrates monotonously worsening performance. Additionally, we explore the accuracy of simple G/G queue approximations. When decomposing jobs into tasks, interesting results emerge; notably, the simpler, non-size-based policy JIQ appears to outperform the more “powerful” size-based LWL policy. Complementing these findings, we present preliminary results on a two-stage scheduling approach that partitions tasks based on service thresholds, illustrating that modest architectural modifications can further enhance performance under realistic workload conditions. We provide insights into these results and suggest promising directions for fully explaining the observed phenomena.

Dispatching Odyssey. Exploring performance in computing clusters under real-world workloads / Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea. - (2025). (Intervento presentato al convegno 2025 36th International Teletraffic Congress (ITC 36) tenutosi a Trondheim; Norway) [10.23919/ITC-3665175.2025.11078624].

Dispatching Odyssey. Exploring performance in computing clusters under real-world workloads

Mert Yildiz
Primo
;
Alexey Rolich;Andrea Baiocchi
2025

Abstract

Recent workload measurements in Google data centers provide an opportunity to challenge existing models and, more broadly, to enhance the understanding of dispatching policies in computing clusters. Through extensive data-driven simulations, we aim to highlight the key features of workload traffic traces that influence response time performance under simple yet representative dispatching policies. For a given computational power budget, we vary the cluster size, i.e., the number of available servers. A job-level analysis reveals that Join Idle Queue (JIQ) and Least Work Left (LWL) exhibit an optimal working point for a fixed utilization coefficient as the number of servers is varied, whereas Round Robin (RR) demonstrates monotonously worsening performance. Additionally, we explore the accuracy of simple G/G queue approximations. When decomposing jobs into tasks, interesting results emerge; notably, the simpler, non-size-based policy JIQ appears to outperform the more “powerful” size-based LWL policy. Complementing these findings, we present preliminary results on a two-stage scheduling approach that partitions tasks based on service thresholds, illustrating that modest architectural modifications can further enhance performance under realistic workload conditions. We provide insights into these results and suggest promising directions for fully explaining the observed phenomena.
2025
2025 36th International Teletraffic Congress (ITC 36)
dispatching; scheduling; multiple parallel servers; real-world workload
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Dispatching Odyssey. Exploring performance in computing clusters under real-world workloads / Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea. - (2025). (Intervento presentato al convegno 2025 36th International Teletraffic Congress (ITC 36) tenutosi a Trondheim; Norway) [10.23919/ITC-3665175.2025.11078624].
File allegati a questo prodotto
File Dimensione Formato  
Yildiz_Dispatching_2025.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 921.94 kB
Formato Adobe PDF
921.94 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1741141
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact