The increasing number of cybersecurity threats we are facing nowadays is fueling the development of new detection and contrast techniques based on the analysis of Big data[?]. In such a setting, the MapReduce paradigm has quickly become the de facto standard for carrying out this processing. This has led to a surge in the number of job offerings involving this skill. Moreover, we are experiencing a significant increase in the number of computer science courses covering this paradigm as well as its most popular implementations, Apache Spark and Apache Hadoop. In this paper, it is presented a solution for supporting the teaching of MapReduce through the use of software visualization. The proposed solution has two main goals. The first is to help students in understanding how the MapReduce paradigm succeeds in solving a complex problem by decomposing it in simpler sub problems, where each of these is solved by means of a map and/or a reduce operation. The second is about the capability of showing the way an input dataset is partitioned in blocks and processed in parallel by the different computing units of a distributed computing system. In both cases, the use of software visualization techniques with proper graphical metaphors helps the students in understanding what is going on, by providing them with a graphical representation that, on a side, describes how the considered algorithm works on a real dataset while, on the other side, illustrating the speed-up achieved thanks to the distributed approach. Our solution is based on the Spark implementation of the MapReduce paradigm. It allows a user (either a teacher or a student) to assemble a MapReduce distributed computation by interacting with a selection of supported distributed operations. Once an operation is selected, it is executed and visualized using a proper animation. This is assembled so as to reflect the distributed nature of the operation. The sequence of animations obtained by the execution of a flow of operations illustrates their behavior while showing how the input dataset is transformed along the time.

Using Software Visualization for Supporting the Teaching of Map Reduce / Ferraro-Petrillo, Umberto. - (2018). (Intervento presentato al convegno 12th International Conference on Network and System Security tenutosi a Hong Kong).

Using Software Visualization for Supporting the Teaching of Map Reduce

Ferraro-Petrillo Umberto
2018

Abstract

The increasing number of cybersecurity threats we are facing nowadays is fueling the development of new detection and contrast techniques based on the analysis of Big data[?]. In such a setting, the MapReduce paradigm has quickly become the de facto standard for carrying out this processing. This has led to a surge in the number of job offerings involving this skill. Moreover, we are experiencing a significant increase in the number of computer science courses covering this paradigm as well as its most popular implementations, Apache Spark and Apache Hadoop. In this paper, it is presented a solution for supporting the teaching of MapReduce through the use of software visualization. The proposed solution has two main goals. The first is to help students in understanding how the MapReduce paradigm succeeds in solving a complex problem by decomposing it in simpler sub problems, where each of these is solved by means of a map and/or a reduce operation. The second is about the capability of showing the way an input dataset is partitioned in blocks and processed in parallel by the different computing units of a distributed computing system. In both cases, the use of software visualization techniques with proper graphical metaphors helps the students in understanding what is going on, by providing them with a graphical representation that, on a side, describes how the considered algorithm works on a real dataset while, on the other side, illustrating the speed-up achieved thanks to the distributed approach. Our solution is based on the Spark implementation of the MapReduce paradigm. It allows a user (either a teacher or a student) to assemble a MapReduce distributed computation by interacting with a selection of supported distributed operations. Once an operation is selected, it is executed and visualized using a proper animation. This is assembled so as to reflect the distributed nature of the operation. The sequence of animations obtained by the execution of a flow of operations illustrates their behavior while showing how the input dataset is transformed along the time.
2018
12th International Conference on Network and System Security
big data security; MapReduce; Spark; software visualization
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Using Software Visualization for Supporting the Teaching of Map Reduce / Ferraro-Petrillo, Umberto. - (2018). (Intervento presentato al convegno 12th International Conference on Network and System Security tenutosi a Hong Kong).
File allegati a questo prodotto
File Dimensione Formato  
Ferraro Petrillo_Software-Visualization_2018.pdf

solo gestori archivio

Tipologia: Documento in Post-print (versione successiva alla peer review e accettata per la pubblicazione)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 326.34 kB
Formato Adobe PDF
326.34 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1169496
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact