A large dataset of workload measurements has been released by Google. The wealth of disclosed data allows a deep dive into real workload patterns. With the aim of providing tools to generate realistic workloads in a simple way, we have extracted from Google's dataset job arrival times, number of tasks per job, required computation time, and memory of tasks. We define a statistical fitting of the relevant probability distribution, providing a simple tool to build artificial workload traces that mimic real traffic as represented by Google measurements. The workload generation algorithm is assessed by comparison of its mean response time on a test dispatching/scheduling system against the real traffic traces. In spite of being only a first-order generation model, it is shown that the proposed artificial generation can reproduce faithfully the performance of real workload in the case of large server clusters.

Data-driven workload generation based on google data center measurements / Yildiz, Mert; Baiocchi, Andrea. - (2024), pp. 143-148. (Intervento presentato al convegno 2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR) tenutosi a Pisa; Italy) [10.1109/hpsr62440.2024.10635925].

Data-driven workload generation based on google data center measurements

Yildiz, Mert
Primo
;
Baiocchi, Andrea
Secondo
2024

Abstract

A large dataset of workload measurements has been released by Google. The wealth of disclosed data allows a deep dive into real workload patterns. With the aim of providing tools to generate realistic workloads in a simple way, we have extracted from Google's dataset job arrival times, number of tasks per job, required computation time, and memory of tasks. We define a statistical fitting of the relevant probability distribution, providing a simple tool to build artificial workload traces that mimic real traffic as represented by Google measurements. The workload generation algorithm is assessed by comparison of its mean response time on a test dispatching/scheduling system against the real traffic traces. In spite of being only a first-order generation model, it is shown that the proposed artificial generation can reproduce faithfully the performance of real workload in the case of large server clusters.
2024
2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR)
workload modeling; data centers; traffic measurements; data fitting; large server clusters
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Data-driven workload generation based on google data center measurements / Yildiz, Mert; Baiocchi, Andrea. - (2024), pp. 143-148. (Intervento presentato al convegno 2024 IEEE 25th International Conference on High Performance Switching and Routing (HPSR) tenutosi a Pisa; Italy) [10.1109/hpsr62440.2024.10635925].
File allegati a questo prodotto
File Dimensione Formato  
Yildiz_Data-driven_2024.pdf

solo gestori archivio

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 890.81 kB
Formato Adobe PDF
890.81 kB Adobe PDF   Contatta l'autore
Yildiz_Frontespizio_Data-driven_2024.pdf

solo gestori archivio

Note: Frontespizio
Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 79.55 kB
Formato Adobe PDF
79.55 kB Adobe PDF   Contatta l'autore
Yildiz_Indice_Data-driven_2024.pdf

solo gestori archivio

Note: Indice
Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 120.54 kB
Formato Adobe PDF
120.54 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1722406
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact