In this work we propose two different parallel versions of the software package COMPSYN, devoted to the production of syntethic seismograms. The first version consists in the parallelization of the code to run on a cluster of multicore processors and is obtained by exploiting the MPI paradigm and OpenMP API to the end of maximizing the performance on multicore processors. The second version exploits the set of GPU associated to the multicore processor cluster and uses CUDA to take advantage of the GPU's computational power. We analyze the application performance of the two different implementations by using a real case study. In particular, we obtain for the GPU version a speedup of 10x over the parallelized version running on the cluster of multicore processors. Furthermore, we can estimate about at least 100x the speedup of the GPU version using a single node of the cluster with respect to the original sequential version. © 2012 IEEE.
Accelerating the production of synthetic seismograms by a multicore processor cluster with multiple GPUs / Ferdinando, Alessi; Massini, Annalisa; Roberto, Basili. - STAMPA. - (2012), pp. 434-441. (Intervento presentato al convegno 20th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, PDP 2012 tenutosi a Garching nel 15 February 2012 through 17 February 2012) [10.1109/pdp.2012.85].
Accelerating the production of synthetic seismograms by a multicore processor cluster with multiple GPUs
MASSINI, Annalisa;
2012
Abstract
In this work we propose two different parallel versions of the software package COMPSYN, devoted to the production of syntethic seismograms. The first version consists in the parallelization of the code to run on a cluster of multicore processors and is obtained by exploiting the MPI paradigm and OpenMP API to the end of maximizing the performance on multicore processors. The second version exploits the set of GPU associated to the multicore processor cluster and uses CUDA to take advantage of the GPU's computational power. We analyze the application performance of the two different implementations by using a real case study. In particular, we obtain for the GPU version a speedup of 10x over the parallelized version running on the cluster of multicore processors. Furthermore, we can estimate about at least 100x the speedup of the GPU version using a single node of the cluster with respect to the original sequential version. © 2012 IEEE.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.