Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

Arrigoni, Viviana; Maggioli, Filippo; Massini, Annalisa; Rodolà, Emanuele

doi:10.1145/3472456.3473519

The multiplication of a matrix by its transpose, ATA, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (AtA) for computing this product, based upon the classical Strassen algorithm as a sub-routine. In particular, we decrease the computational cost to the time required by Strassen’s algorithm, amounting to floating point operations. AtA works for generic rectangular matrices, and exploits the peculiar symmetry of the resulting product matrix for saving memory. In addition, we provide an extensive implementation study of AtA in a shared memory system, and extend its applicability to a distributed environment. To support our findings, we compare our algorithm with state-of-the-art solutions specialized in the computation of ATA. Our experiments highlight good scalability with respect to both the matrix size and the number of involved processes, as well as favorable performance for both the parallel paradigms and the sequential implementation, when compared with other methods in the literature.

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose / Arrigoni, V., Maggioli, F., Massini, A., Rodolà, E.. - (2021), pp. 1-12. (International Conference on Parallel Processing (ICPP) online ) [10.1145/3472456.3473519].

Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose

Arrigoni, Viviana;Maggioli, Filippo;Massini, Annalisa;Rodolà, Emanuele

2021

Abstract

The multiplication of a matrix by its transpose, ATA, appears as an intermediate operation in the solution of a wide set of problems. In this paper, we propose a new cache-oblivious algorithm (AtA) for computing this product, based upon the classical Strassen algorithm as a sub-routine. In particular, we decrease the computational cost to the time required by Strassen’s algorithm, amounting to floating point operations. AtA works for generic rectangular matrices, and exploits the peculiar symmetry of the resulting product matrix for saving memory. In addition, we provide an extensive implementation study of AtA in a shared memory system, and extend its applicability to a distributed environment. To support our findings, we compare our algorithm with state-of-the-art solutions specialized in the computation of ATA. Our experiments highlight good scalability with respect to both the matrix size and the number of involved processes, as well as favorable performance for both the parallel paradigms and the sequential implementation, when compared with other methods in the literature.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2021
			
	Nome convegno
	
				International Conference on Parallel Processing (ICPP)
			
	Parole chiave
	
				Fast Linear Algebra, Cache-oblivious algorithm, C/C++., OpenMP, Matrix Multiplication, MPI
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose / Arrigoni, V., Maggioli, F., Massini, A., Rodolà, E.. - (2021), pp. 1-12. (International Conference on Parallel Processing (ICPP) online ) [10.1145/3472456.3473519].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1578314

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

2

Catalogo dei prodotti della ricerca