We describe a parallel version of our tree-code for the simulation of self-gravitating systems in Astrophysics. It is based on a dynamic and adaptive method for the domain decomposition, which exploits the hierarchical data arrangement used by the tree-code. It shows low computational costs for the parallelization overhead - less than 4% of the total CPU-time in the tests done - because the domain decomposition is performed ``on the fly'' during the tree-construction and the portion of the tree that is local to each processor ``enriches'' itself of remote data only when they are actually needed. The performance of an implementation of the parallel code on a Cray T3E is presented and discussed. They exhibit a very good behaviour of the speedup (=15 with 16 processors and 105 particles) and a rather low load unbalancing (<10% using up to 16 processors), achieving a high computation speed in the forces evaluation (>104 particles/sec with 8 processors). Supported by CINECA (http://www.cineca.it) and CNAA (http://cnaa.cineca.it) under Grant cnarm12a.
An efficient parallel tree-code for the simulation of self-gravitating systems / Miocchi, Paolo; CAPUZZO DOLCETTA, Roberto Angelo. - In: ASTRONOMY & ASTROPHYSICS. - ISSN 0004-6361. - STAMPA. - 382:(2002), pp. 758-767. [10.1051/0004-6361:20011609]
An efficient parallel tree-code for the simulation of self-gravitating systems
MIOCCHI, PAOLO;CAPUZZO DOLCETTA, Roberto Angelo
2002
Abstract
We describe a parallel version of our tree-code for the simulation of self-gravitating systems in Astrophysics. It is based on a dynamic and adaptive method for the domain decomposition, which exploits the hierarchical data arrangement used by the tree-code. It shows low computational costs for the parallelization overhead - less than 4% of the total CPU-time in the tests done - because the domain decomposition is performed ``on the fly'' during the tree-construction and the portion of the tree that is local to each processor ``enriches'' itself of remote data only when they are actually needed. The performance of an implementation of the parallel code on a Cray T3E is presented and discussed. They exhibit a very good behaviour of the speedup (=15 with 16 processors and 105 particles) and a rather low load unbalancing (<10% using up to 16 processors), achieving a high computation speed in the forces evaluation (>104 particles/sec with 8 processors). Supported by CINECA (http://www.cineca.it) and CNAA (http://cnaa.cineca.it) under Grant cnarm12a.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.