Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper. In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.
An efficient implementation of the algorithm by Lukas et al. on Hadoop / Cattaneo, Giuseppe; Ferraro Petrillo, Umberto; Nappi, Michele; Narducci, Fabio; Roscigno, Gianluca. - STAMPA. - 10232:(2017), pp. 475-489. (Intervento presentato al convegno 12th International Conference on Green, Pervasive and Cloud Computing, GPC 2017 tenutosi a ita nel 2017) [10.1007/978-3-319-57186-7_35].
An efficient implementation of the algorithm by Lukas et al. on Hadoop
Cattaneo, Giuseppe;Ferraro Petrillo, Umberto
;Nappi, Michele;
2017
Abstract
Apache Hadoop offers the possibility of coding full-fledged distributed applications with very low programming efforts. However, the resulting implementations may suffer from some performance bottlenecks that nullify the potential of a distributed system. An engineering methodology based on the implementation of smart optimizations driven by a careful profiling activity may lead to a much better experimental performance as shown in this paper. In particular, we take as a case study the algorithm by Lukáš et al. used to solve the Source Camera Identification problem (i.e., recognizing the camera used for acquiring a given digital image). A first implementation has been obtained, with little effort, using the default facilities available with Hadoop. A deep profiling allowed us to pinpoint some serious performance issues affecting the initial steps of the algorithm and related to a bad usage of the cluster resources. Optimizations were then developed and their effects were measured by accurate experimentation. The improved implementation is able to optimize the usage of the underlying cluster resources as well as of the Hadoop framework, thus resulting in a much better performance than the original naive implementation.File | Dimensione | Formato | |
---|---|---|---|
Cattaneo_Efficient-Implementation_2017.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.6 MB
Formato
Adobe PDF
|
1.6 MB | Adobe PDF | Contatta l'autore |
Cattaneo_copertina_indice_2017.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
118.25 kB
Formato
Adobe PDF
|
118.25 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.