Applications based on event processing are often designed to continuously evaluate set of events defined by sliding time windows. Solutions employing long-running continuous queries executed in-memory show their limits in applications characterized by a staggering growth of available sources that continuously produce new events at high rates (e.g. intrusion detection systems and algorithmic trading). Problems arise due to the complexities in maintaining large amounts of events in memory for continuous elaboration, and due to the difficulties in managing at run-time the network of elaborating nodes. A batch approach to this kind of computation provides a viable solution for scenarios characterized by non frequent computations of very large time windows. In this paper we propose a model for batch processing in time window event computations that allows the definition of multiple metrics for performance optimization. These metrics specifically take into account the organization of input data to minimize its impact on computation latency. The model is then instantiated on Hadoop, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated. Copyright 2013 ACM.
Input data organization for batch processing in time window based computations / Aniello, Leonardo; Querzoni, Leonardo; Baldoni, Roberto. - (2013), pp. 363-370. (Intervento presentato al convegno 28th Annual ACM Symposium on Applied Computing, SAC 2013 tenutosi a Coimbra, Portugal nel 18 March 2013 through 22 March 2013) [10.1145/2480362.2480437].
Input data organization for batch processing in time window based computations
ANIELLO, LEONARDO;QUERZONI, Leonardo;BALDONI, Roberto
2013
Abstract
Applications based on event processing are often designed to continuously evaluate set of events defined by sliding time windows. Solutions employing long-running continuous queries executed in-memory show their limits in applications characterized by a staggering growth of available sources that continuously produce new events at high rates (e.g. intrusion detection systems and algorithmic trading). Problems arise due to the complexities in maintaining large amounts of events in memory for continuous elaboration, and due to the difficulties in managing at run-time the network of elaborating nodes. A batch approach to this kind of computation provides a viable solution for scenarios characterized by non frequent computations of very large time windows. In this paper we propose a model for batch processing in time window event computations that allows the definition of multiple metrics for performance optimization. These metrics specifically take into account the organization of input data to minimize its impact on computation latency. The model is then instantiated on Hadoop, a batch processing engine based on the MapReduce paradigm, and a set of strategies for efficiently arranging input data is described and evaluated. Copyright 2013 ACM.File | Dimensione | Formato | |
---|---|---|---|
VE_2013_11573-515812.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
758.58 kB
Formato
Adobe PDF
|
758.58 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.