Key grouping is a technique used by stream processing frameworks to simplify the development of parallel stateful operators. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values contained in the tuples themselves. Each operator instance target of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to implement key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distribution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the mapping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.

Efficient key grouping for near-optimal load balancing in stream processing systems / Rivetti Di Val Cervo, Nicoló; Querzoni, Leonardo; Anceaume, Emmanuelle; Busnel, Yann; Sericola, Bruno. - STAMPA. - (2015), pp. 80-91. (Intervento presentato al convegno 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015 tenutosi a Oslo; Norway) [10.1145/2675743.2771827].

Efficient key grouping for near-optimal load balancing in stream processing systems

QUERZONI, Leonardo
;
2015

Abstract

Key grouping is a technique used by stream processing frameworks to simplify the development of parallel stateful operators. Through key grouping a stream of tuples is partitioned in several disjoint sub-streams depending on the values contained in the tuples themselves. Each operator instance target of one sub-stream is guaranteed to receive all the tuples containing a specific key value. A common solution to implement key grouping is through hash functions that, however, are known to cause load imbalances on the target operator instances when the input data stream is characterized by a skewed value distribution. In this paper we present DKG, a novel approach to key grouping that provides near-optimal load distribution for input streams with skewed value distribution. DKG starts from the simple observation that with such inputs the load balance is strongly driven by the most frequent values; it identifies such values and explicitly maps them to sub-streams together with groups of less frequent items to achieve a near-optimal load balance. We provide theoretical approximation bounds for the quality of the mapping derived by DKG and show, through both simulations and a running prototype, its impact on stream processing applications.
2015
9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015
Stream Processing; Data Streaming; Key Grouping; Load Balancing;
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Efficient key grouping for near-optimal load balancing in stream processing systems / Rivetti Di Val Cervo, Nicoló; Querzoni, Leonardo; Anceaume, Emmanuelle; Busnel, Yann; Sericola, Bruno. - STAMPA. - (2015), pp. 80-91. (Intervento presentato al convegno 9th ACM International Conference on Distributed Event-Based Systems, DEBS 2015 tenutosi a Oslo; Norway) [10.1145/2675743.2771827].
File allegati a questo prodotto
File Dimensione Formato  
Rivetti_Efficient-key-grouping_2015.pdf

solo gestori archivio

Note: Articolo principale
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/787479
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 43
  • ???jsp.display-item.citation.isi??? ND
social impact