“Two-Stagification”: Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea

doi:10.1109/MedComNet65822.2025.11103543

A continuing effort is devoted to devising effective dispatching policies for clusters of First Come First Served servers. Although the optimal solution for dispatchers aware of both job size and server state remains elusive, lower bounds and strong heuristics are known. In this paper, we introduce a two-stage cluster architecture that applies classical Round Robin, Join Idle Queue, and Least Work Left dispatching schemes, coupled with an optimized service-time threshold to separate large jobs from shorter ones. Using both synthetic (Weibull) workloads and real Google data center traces, we demonstrate that our two-stage approach greatly improves upon the corresponding single-stage policies and closely approaches the performance of advanced sizeand state-aware methods. Our results highlight that careful architectural design—rather than increased complexity at the dispatcher—can yield significantly better mean response times in large-scale computing environments.

“Two-Stagification”: Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture / Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea. - (2025). ( 23rd Mediterranean Communication and Computer Networking Conference (MedComNet 2025) Cagliari, Italy ) [10.1109/MedComNet65822.2025.11103543].

“Two-Stagification”: Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture

mert yildiz^Primo;alexey rolich;andrea baiocchi

2025

Abstract

A continuing effort is devoted to devising effective dispatching policies for clusters of First Come First Served servers. Although the optimal solution for dispatchers aware of both job size and server state remains elusive, lower bounds and strong heuristics are known. In this paper, we introduce a two-stage cluster architecture that applies classical Round Robin, Join Idle Queue, and Least Work Left dispatching schemes, coupled with an optimized service-time threshold to separate large jobs from shorter ones. Using both synthetic (Weibull) workloads and real Google data center traces, we demonstrate that our two-stage approach greatly improves upon the corresponding single-stage policies and closely approaches the performance of advanced sizeand state-aware methods. Our results highlight that careful architectural design—rather than increased complexity at the dispatcher—can yield significantly better mean response times in large-scale computing environments.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2025
			
	Nome convegno
	
				23rd Mediterranean Communication and Computer Networking Conference (MedComNet 2025)
			
	Parole chiave
	
				Data centers; Dispatching; Scheduling; Multiple parallel servers; Real-world workload; Large-Scale multiserver system; Workload traffic measurements
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				“Two-Stagification”: Job Dispatching in Large-Scale Clusters via a Two-Stage Architecture / Yildiz, Mert; Rolich, Alexey; Baiocchi, Andrea. - (2025). ( 23rd Mediterranean Communication and Computer Networking Conference (MedComNet 2025) Cagliari, Italy ) [10.1109/MedComNet65822.2025.11103543].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Yildiz_Two-Stagification_2025.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 382.01 kB Formato Adobe PDF Contatta l'autore	382.01 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1741139

Citazioni

ND

0

ND

Catalogo dei prodotti della ricerca