Dispatching policies shape delay and throughput in multi-server data centers, yet the fidelity of classical queueing models under production workloads remains unclear. We combine analytical modeling with trace-driven simulation to reassess Round Robin (RR), Join-Idle-Queue (JIQ), and Least-Work-Left (LWL) using job-level and task-level views of Google ClusterData v3 and Alibaba Cluster Trace v2018. Under controlled Poisson arrivals with Weibull service times, the analytical models match the simulation closely. We then examine model-trace discrepancies through controlled manipulations: shuffling inter-arrival times, replacing arrivals with a Poisson process, shuffling task Central Processing Unit (CPU) times, and trimming the top 0.1% of service demands. Hidden dependence and rare very large jobs explain most gaps; when both sequences are randomized and outliers removed, job-level predictions align with simulation. At the task level, where jobs decompose into independently dispatched tasks, policy ordering may change: in a production trace case, JIQ often matches or surpasses LWL, while RR remains weakest. We also introduce a simple analytical approximation for JIQ that is easy to evaluate and accurate in the controlled setting. Overall, the study clarifies when analytical models hold, identifies workload features that break them, and informs dispatcher choice under production conditions.
Dispatching policies in data center clusters: Insights from Google and Alibaba workloads / Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea. - In: PERFORMANCE EVALUATION. - ISSN 0166-5316. - 172:(2026), pp. 1-18. [10.1016/j.peva.2026.102551]
Dispatching policies in data center clusters: Insights from Google and Alibaba workloads
Yildiz, Mert;Rolich, Alexey;Baiocchi, Andrea
2026
Abstract
Dispatching policies shape delay and throughput in multi-server data centers, yet the fidelity of classical queueing models under production workloads remains unclear. We combine analytical modeling with trace-driven simulation to reassess Round Robin (RR), Join-Idle-Queue (JIQ), and Least-Work-Left (LWL) using job-level and task-level views of Google ClusterData v3 and Alibaba Cluster Trace v2018. Under controlled Poisson arrivals with Weibull service times, the analytical models match the simulation closely. We then examine model-trace discrepancies through controlled manipulations: shuffling inter-arrival times, replacing arrivals with a Poisson process, shuffling task Central Processing Unit (CPU) times, and trimming the top 0.1% of service demands. Hidden dependence and rare very large jobs explain most gaps; when both sequences are randomized and outliers removed, job-level predictions align with simulation. At the task level, where jobs decompose into independently dispatched tasks, policy ordering may change: in a production trace case, JIQ often matches or surpasses LWL, while RR remains weakest. We also introduce a simple analytical approximation for JIQ that is easy to evaluate and accurate in the controlled setting. Overall, the study clarifies when analytical models hold, identifies workload features that break them, and informs dispatcher choice under production conditions.| File | Dimensione | Formato | |
|---|---|---|---|
|
Yildiz_dispatching policies_2026.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.62 MB
Formato
Adobe PDF
|
1.62 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


