Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low. In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.

High frequency batch-oriented computations over large sliding time windows / Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - STAMPA. - 43-44:(2015), pp. 1-11. [10.1016/j.future.2014.09.008]

High frequency batch-oriented computations over large sliding time windows

ANIELLO, LEONARDO
;
QUERZONI, Leonardo
;
BALDONI, Roberto
2015

Abstract

Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low. In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.
2015
Event processing; Batch processing; Time window based computations; Data analytics; Big data
01 Pubblicazione su rivista::01a Articolo in rivista
High frequency batch-oriented computations over large sliding time windows / Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - STAMPA. - 43-44:(2015), pp. 1-11. [10.1016/j.future.2014.09.008]
File allegati a questo prodotto
File Dimensione Formato  
Aiello_High-frequency-batch-oriente_2015.pdf

solo gestori archivio

Note: Articolo
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.13 MB
Formato Adobe PDF
1.13 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/655415
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact