Catalogo dei prodotti della ricerca

The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular applications as communication-avoiding algorithms for structured applications. We propose CLaMPI, a caching library layered on top of MPI-3 RMA, to automatically optimize code with minimum user intervention. We demonstrate how cached RMA improves the performance of a Barnes Hut simulation and a Local Clustering Coefficient computation up to a factor of 1.8x and 5x, respectively. Due to the low overheads in the cache miss case and the potential benefits, we expect that our ideas around transparent RMA caching will soon be an integral part of many MPI libraries

Transparent caching for RMA systems / Girolamo, S.D., Vella, F., Hoefler, T.. - (2017), pp. 1018-1027. (31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 Usa 2017) [10.1109/IPDPS.2017.92].

Transparent caching for RMA systems

Girolamo, Salvatore DI;Vella, Flavio;Hoefler, Torsten

2017

Abstract

The constantly increasing gap between communication and computation performance emphasizes the importance of communication-avoidance techniques. Caching is a well-known concept used to reduce accesses to slow local memories. In this work, we extend the caching idea to MPI3 Remote Memory Access (RMA) operations. Here, caching can avoid inter-node communications and achieve similar benefits for irregular applications as communication-avoiding algorithms for structured applications. We propose CLaMPI, a caching library layered on top of MPI-3 RMA, to automatically optimize code with minimum user intervention. We demonstrate how cached RMA improves the performance of a Barnes Hut simulation and a Local Clustering Coefficient computation up to a factor of 1.8x and 5x, respectively. Due to the low overheads in the cache miss case and the potential benefits, we expect that our ideas around transparent RMA caching will soon be an integral part of many MPI libraries

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2017
			
	Nome convegno
	
				31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017
			
	Parole chiave
	
				Barnes hut; caching; irregular applications; local clustering coefficient; MPI; nbody; remote transfer; RMA; Information Systems; Computer Networks and Communications; Hardware and Architecture
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Transparent caching for RMA systems / Girolamo, S.D., Vella, F., Hoefler, T.. - (2017), pp. 1018-1027. (31st IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017 Usa 2017) [10.1109/IPDPS.2017.92].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Vella_RMA-systems.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 409.74 kB Formato Adobe PDF Contatta l'autore	409.74 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1068895

Citazioni

ND

5

4

social impact