Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the propagation delay of events in query pipelines. However, for queries containing commonly used blocking operators such as windows, this scheduling approach can be inefficient. Watermarks are events popularly utilized by SPEs to correctly process window operators. Watermarks are injected into the stream to signify that no events preceding their timestamp should be further expected. Through the design and development of Klink, we leverage these watermarks to robustly infer stream progress based on window deadlines and network delay, and to schedule query pipeline execution that reflects stream progress. Klink aims to unblock window operators and to rapidly propagate events to output operators while performing judicious memory management. We integrate Klink into the popular open source SPE Apache Flink and demonstrate that Klink delivers significant performance gains over existing scheduling policies on benchmark workloads for both scale-up and scale-out deployments.

Klink: Progress-Aware Scheduling for Streaming Data Systems / Farhat, Omar; Daudjee, Khuzaima; Querzoni, Leonardo. - (2021), pp. 485-498. (Intervento presentato al convegno SIGMOD/PODS '21: International Conference on Management of Data tenutosi a Xi'an, Shaanxi; China) [10.1145/3448016.3452794].

Klink: Progress-Aware Scheduling for Streaming Data Systems

Querzoni, Leonardo
2021

Abstract

Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the propagation delay of events in query pipelines. However, for queries containing commonly used blocking operators such as windows, this scheduling approach can be inefficient. Watermarks are events popularly utilized by SPEs to correctly process window operators. Watermarks are injected into the stream to signify that no events preceding their timestamp should be further expected. Through the design and development of Klink, we leverage these watermarks to robustly infer stream progress based on window deadlines and network delay, and to schedule query pipeline execution that reflects stream progress. Klink aims to unblock window operators and to rapidly propagate events to output operators while performing judicious memory management. We integrate Klink into the popular open source SPE Apache Flink and demonstrate that Klink delivers significant performance gains over existing scheduling policies on benchmark workloads for both scale-up and scale-out deployments.
2021
SIGMOD/PODS '21: International Conference on Management of Data
stream processing; scheduling; distributed systems
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Klink: Progress-Aware Scheduling for Streaming Data Systems / Farhat, Omar; Daudjee, Khuzaima; Querzoni, Leonardo. - (2021), pp. 485-498. (Intervento presentato al convegno SIGMOD/PODS '21: International Conference on Management of Data tenutosi a Xi'an, Shaanxi; China) [10.1145/3448016.3452794].
File allegati a questo prodotto
File Dimensione Formato  
Farhat_Klink_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.79 MB
Formato Adobe PDF
1.79 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1555552
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact