Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the propagation delay of events in query pipelines. However, for queries containing commonly used blocking operators such as windows, this scheduling approach can be inefficient. Watermarks are events popularly utilized by SPEs to correctly process window operators. Watermarks are injected into the stream to signify that no events preceding their timestamp should be further expected. Through the design and development of Klink, we leverage these watermarks to robustly infer stream progress based on window deadlines and network delay, and to schedule query pipeline execution that reflects stream progress. Klink aims to unblock window operators and to rapidly propagate events to output operators while performing judicious memory management. We integrate Klink into the popular open source SPE Apache Flink and demonstrate that Klink delivers significant performance gains over existing scheduling policies on benchmark workloads for both scale-up and scale-out deployments.
Klink: progress-aware scheduling for streaming data systems / Farhat, Omar; Daudjee, Khuzaima; Querzoni, Leonardo. - (2021), pp. 485-498. (Intervento presentato al convegno ACM Special Interest Group on Management of Data Conference tenutosi a Xi'an, Shaanxi; China) [10.1145/3448016.3452794].
Klink: progress-aware scheduling for streaming data systems
Querzoni, Leonardo
2021
Abstract
Modern stream processing engines (SPEs) process large volumes of events propagated at high velocity through multiple queries. To improve performance, existing SPEs generally aim to minimize query output latency by minimizing, in turn, the propagation delay of events in query pipelines. However, for queries containing commonly used blocking operators such as windows, this scheduling approach can be inefficient. Watermarks are events popularly utilized by SPEs to correctly process window operators. Watermarks are injected into the stream to signify that no events preceding their timestamp should be further expected. Through the design and development of Klink, we leverage these watermarks to robustly infer stream progress based on window deadlines and network delay, and to schedule query pipeline execution that reflects stream progress. Klink aims to unblock window operators and to rapidly propagate events to output operators while performing judicious memory management. We integrate Klink into the popular open source SPE Apache Flink and demonstrate that Klink delivers significant performance gains over existing scheduling policies on benchmark workloads for both scale-up and scale-out deployments.File | Dimensione | Formato | |
---|---|---|---|
Farhat_Klink_2021.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.79 MB
Formato
Adobe PDF
|
1.79 MB | Adobe PDF | Contatta l'autore |
Farhat_postprint_Klink_2021.pdf.pdf
accesso aperto
Note: https://doi.org/10.1145/3448016.3452794
Tipologia:
Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
986.64 kB
Formato
Adobe PDF
|
986.64 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.