Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a Round-Robin policy, a simple solution that fares well as long as the tuple execution time is almost the same for all the tuples. However, such an assumption rarely holds in real cases where execution time strongly depends on tuple content. As a consequence, parallel stateless operators within stream processing applications may experience unpredictable unbalance that, in the end, causes undesirable increase in tuple completion times. In this paper we propose Online Shuffle Grouping (OSG), a novel approach to shuffle grouping aimed at reducing the overall tuple completion time. OSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load. We provide a probabilistic analysis and illustrate, through both simulations and a running prototype, its impact on stream processing applications.

Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems / Rivetti, Nicoló; Anceaume, Emmanuelle; Busnel, Yann; Querzoni, Leonardo; Sericola, Bruno. - ELETTRONICO. - (2016). (Intervento presentato al convegno 17th International Middleware Conference, Middleware 2016 tenutosi a Trento; Italy) [10.1145/2988336.2988347].

Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems

QUERZONI, Leonardo
;
2016

Abstract

Shuffle grouping is a technique used by stream processing frameworks to share input load among parallel instances of stateless operators. With shuffle grouping each tuple of a stream can be assigned to any available operator instance, independently from any previous assignment. A common approach to implement shuffle grouping is to adopt a Round-Robin policy, a simple solution that fares well as long as the tuple execution time is almost the same for all the tuples. However, such an assumption rarely holds in real cases where execution time strongly depends on tuple content. As a consequence, parallel stateless operators within stream processing applications may experience unpredictable unbalance that, in the end, causes undesirable increase in tuple completion times. In this paper we propose Online Shuffle Grouping (OSG), a novel approach to shuffle grouping aimed at reducing the overall tuple completion time. OSG estimates the execution time of each tuple, enabling a proactive and online scheduling of input load to the target operator instances. Sketches are used to efficiently store the otherwise large amount of information required to schedule incoming load. We provide a probabilistic analysis and illustrate, through both simulations and a running prototype, its impact on stream processing applications.
2016
17th International Middleware Conference, Middleware 2016
Data streaming; Shuffle grouping; Stream processing
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Online Scheduling for Shuffle Grouping in Distributed Stream Processing Systems / Rivetti, Nicoló; Anceaume, Emmanuelle; Busnel, Yann; Querzoni, Leonardo; Sericola, Bruno. - ELETTRONICO. - (2016). (Intervento presentato al convegno 17th International Middleware Conference, Middleware 2016 tenutosi a Trento; Italy) [10.1145/2988336.2988347].
File allegati a questo prodotto
File Dimensione Formato  
Rivetti_Online-Scheduling_2016.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 824.25 kB
Formato Adobe PDF
824.25 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/928311
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 10
social impact