Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper. In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.

An efficient implementation of the algorithm by Lukas et al. on Hadoop / Cattaneo, Giuseppe; Ferraro Petrillo, Umberto; Nappi, Michele; Narducci, Fabio; Roscigno, Gianluca. - STAMPA. - 10232:(2017), pp. 475-489. (Intervento presentato al convegno 12th International Conference on Green, Pervasive and Cloud Computing, GPC 2017 tenutosi a ita nel 2017) [10.1007/978-3-319-57186-7_35].

An efficient implementation of the algorithm by Lukas et al. on Hadoop

Cattaneo, Giuseppe;Ferraro Petrillo, Umberto
;
Nappi, Michele;
2017

Abstract

Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper. In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.
2017
12th International Conference on Green, Pervasive and Cloud Computing, GPC 2017
distributed computing; hadoop; source camera identification; theoretical computer science; computer science (all)
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An efficient implementation of the algorithm by Lukas et al. on Hadoop / Cattaneo, Giuseppe; Ferraro Petrillo, Umberto; Nappi, Michele; Narducci, Fabio; Roscigno, Gianluca. - STAMPA. - 10232:(2017), pp. 475-489. (Intervento presentato al convegno 12th International Conference on Green, Pervasive and Cloud Computing, GPC 2017 tenutosi a ita nel 2017) [10.1007/978-3-319-57186-7_35].
File allegati a questo prodotto
File Dimensione Formato  
Cattaneo_Efficient-Implementation_2017.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.6 MB
Formato Adobe PDF
1.6 MB Adobe PDF   Contatta l'autore
Cattaneo_copertina_indice_2017.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 118.25 kB
Formato Adobe PDF
118.25 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1020096
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 6
social impact