Catalogo dei prodotti della ricerca

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of nonconvex optimization, going under the general name of successive convex approximation techniques. The basic idea is to iteratively replace the original (nonconvex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Different from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the NN function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN’s weights. We experiment on several medium-sized benchmark problems and on a large-scale data set involving simulated physical data. The results show how the algorithm outperforms the state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.

Stochastic training of neural networks via successive convex approximations / Scardapane, Simone; DI LORENZO, Paolo. - In: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. - ISSN 2162-237X. - 29:(2018), pp. 4947-4956. [doi:10.1109/TNNLS.2017.2785361]

Stochastic training of neural networks via successive convex approximations

Scardapane Simone;Di Lorenzo Paolo

2018

Abstract

This paper proposes a new family of algorithms for training neural networks (NNs). These are based on recent developments in the field of nonconvex optimization, going under the general name of successive convex approximation techniques. The basic idea is to iteratively replace the original (nonconvex, highly dimensional) learning problem with a sequence of (strongly convex) approximations, which are both accurate and simple to optimize. Different from similar ideas (e.g., quasi-Newton algorithms), the approximations can be constructed using only first-order information of the NN function, in a stochastic fashion, while exploiting the overall structure of the learning problem for a faster convergence. We discuss several use cases, based on different choices for the loss function (e.g., squared loss and cross-entropy loss), and for the regularization of the NN’s weights. We experiment on several medium-sized benchmark problems and on a large-scale data set involving simulated physical data. The results show how the algorithm outperforms the state-of-the-art techniques, providing faster convergence to a better minimum. Additionally, we show how the algorithm can be easily parallelized over multiple computational units without hindering its performance. In particular, each computational unit can optimize a tailored surrogate function defined on a randomly assigned subset of the input variables, whose dimension can be selected depending entirely on the available computational power.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2018
			
	Parole chiave
	
				Neural networks (NNs); nonconvex optimization; parallel optimization; sparsity
			
	Tipologia
	
				01 Pubblicazione su rivista::01a Articolo in rivista
			
	Citazione
	
				Stochastic training of neural networks via successive convex approximations / Scardapane, Simone; DI LORENZO, Paolo. - In: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS. - ISSN 2162-237X. - 29:(2018), pp. 4947-4956. [doi:10.1109/TNNLS.2017.2785361]
			
	Appartiene alla tipologia:
	
				01a Articolo in rivista

File allegati a questo prodotto

File	Dimensione	Formato
Scardapane_preprint_Stochastic_2018.pdf solo gestori archivio Tipologia: Documento in Pre-print (manoscritto inviato all'editore, precedente alla peer review) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 681.36 kB Formato Adobe PDF Contatta l'autore	681.36 kB	Adobe PDF	Contatta l'autore
Scardapane_Stochastic-training_2018.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.25 MB Formato Adobe PDF Contatta l'autore	1.25 MB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1119415

Citazioni

ND

8

7

social impact