Catalogo dei prodotti della ricerca

Key grouping is a technique used by stream processing frameworks to simplify the development of parallel stateful operators. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values contained in the tuples themselves. Each operator instance target of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to implement key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distribution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the mapping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.

Efficient key grouping for near-optimal load balancing in stream processing systems / Rivetti Di Val Cervo, Nicoló; Querzoni, Leonardo; Anceaume, Emmanuelle; Busnel, Yann; Sericola, Bruno. - STAMPA. - (2015), pp. 80-91. ( 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015 Oslo; Norway ) [10.1145/2675743.2771827].

Efficient key grouping for near-optimal load balancing in stream processing systems

Rivetti Di Val Cervo, Nicoló;QUERZONI, Leonardo;Anceaume, Emmanuelle;Busnel, Yann;Sericola, Bruno

2015

Abstract

Key grouping is a technique used by stream processing frameworks to simplify the development of parallel stateful operators. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values contained in the tuples themselves. Each operator instance target of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to implement key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distribution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the mapping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2015
			
	Nome convegno
	
				9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015
			
	Parole chiave
	
				Stream Processing; Data Streaming; Key Grouping; Load Balancing;
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Efficient key grouping for near-optimal load balancing in stream processing systems / Rivetti Di Val Cervo, Nicoló; Querzoni, Leonardo; Anceaume, Emmanuelle; Busnel, Yann; Sericola, Bruno. - STAMPA. - (2015), pp. 80-91. ( 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015 Oslo; Norway ) [10.1145/2675743.2771827].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Rivetti_Efficient-key-grouping_2015.pdf solo gestori archivio Note: Articolo principale Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.13 MB Formato Adobe PDF Contatta l'autore	1.13 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/787479

Citazioni

ND

49

ND

social impact