A large dataset of workload measurements has been released by Google. The wealth of disclosed data allows a deep dive into real workload patterns. With the aim of providing tools to generate realistic workloads in a simple way, we have extracted from Google's dataset job arrival times, number of tasks per job, required computation time, and memory of tasks. We define a statistical fitting of the relevant probability distribution, providing a simple tool to build artificial workload traces that mimic real traffic as represented by Google measurements. The workload generation algorithm is assessed by comparison of its mean response time on a test dispatching/scheduling system against the real traffic traces. In spite of being only a first-order generation model, it is shown that the proposed artificial generation can reproduce faithfully the performance of real workload in the case of large server clusters.
Data-driven workload generation based on google data center measurements / Yildiz, Mert; Baiocchi, Andrea. - (2024), pp. 143-148. (Intervento presentato al convegno 2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR) tenutosi a Pisa; Italy) [10.1109/hpsr62440.2024.10635925].
Data-driven workload generation based on google data center measurements
Yildiz, Mert
Primo
;Baiocchi, Andrea
Secondo
2024
Abstract
A large dataset of workload measurements has been released by Google. The wealth of disclosed data allows a deep dive into real workload patterns. With the aim of providing tools to generate realistic workloads in a simple way, we have extracted from Google's dataset job arrival times, number of tasks per job, required computation time, and memory of tasks. We define a statistical fitting of the relevant probability distribution, providing a simple tool to build artificial workload traces that mimic real traffic as represented by Google measurements. The workload generation algorithm is assessed by comparison of its mean response time on a test dispatching/scheduling system against the real traffic traces. In spite of being only a first-order generation model, it is shown that the proposed artificial generation can reproduce faithfully the performance of real workload in the case of large server clusters.File | Dimensione | Formato | |
---|---|---|---|
Yildiz_Data-driven_2024.pdf
solo gestori archivio
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
890.81 kB
Formato
Adobe PDF
|
890.81 kB | Adobe PDF | Contatta l'autore |
Yildiz_Frontespizio_Data-driven_2024.pdf
solo gestori archivio
Note: Frontespizio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
79.55 kB
Formato
Adobe PDF
|
79.55 kB | Adobe PDF | Contatta l'autore |
Yildiz_Indice_Data-driven_2024.pdf
solo gestori archivio
Note: Indice
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
120.54 kB
Formato
Adobe PDF
|
120.54 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.