Catalogo dei prodotti della ricerca

Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low. In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.

High frequency batch-oriented computations over large sliding time windows / Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - STAMPA. - 43-44:(2015), pp. 1-11. [10.1016/j.future.2014.09.008]

High frequency batch-oriented computations over large sliding time windows

ANIELLO, LEONARDO;QUERZONI, Leonardo;BALDONI, Roberto

2015

Abstract

Today’s business workflows are very likely to include batch computations that periodically analyze subsets of data within specific time ranges to provide strategic information for stakeholders and other interested parties. The frequency of these batch computations provides an effective measure of data analytics freshness available to decision makers. Nevertheless, the typical amounts of data to elaborate in a batch are so large that a computation can take very long. Considering that usually a new batch starts when the previous one has completed, the frequency of such batches can thus be very low. In this paper we propose a model for batch processing based on overlapping sliding time windows that allows to increase the frequency of batches. The model is well suited to scenarios (e.g., financial, security etc.) characterized by large data volumes, observation windows in the order of hours (or days) and frequent updates (order of seconds). The model introduces multiple metrics whose aim is reducing the latency between the end of a computation time window and the availability of results, increasing thus the frequency of the batches. These metrics specifically take into account the organization of input data to minimize its impact on such latency. The model is then instantiated on the well-known Hadoop platform, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2015
			
	Parole chiave
	
				Event processing; Batch processing; Time window based computations; Data analytics; Big data
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				High frequency batch-oriented computations over large sliding time windows / Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - In: FUTURE GENERATION COMPUTER SYSTEMS. - ISSN 0167-739X. - STAMPA. - 43-44:(2015), pp. 1-11. [10.1016/j.future.2014.09.008]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Aiello_High-frequency-batch-oriente_2015.pdf solo gestori archivio Note: Articolo Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.13 MB Formato Adobe PDF Contatta l'autore	1.13 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/655415

Citazioni

ND

2

2

social impact