Deep Feedforward Neural Networks’ (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions to the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve the performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method able to effectively tackle difficulties due to the network’s depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using different deep networks, and we showed how the application of (online) BCD methods to the training phase of DFNNs permits to outperform standard batch/online algorithms leading to an improvement on both the training phase and the generalization performance of the networks.

ONLINE BLOCK LAYER DECOMPOSITION SCHEMES FOR TRAINING DEEP NEURAL NETWORKS / Palagi, Laura; Seccia, Ruggiero. - (2019).

ONLINE BLOCK LAYER DECOMPOSITION SCHEMES FOR TRAINING DEEP NEURAL NETWORKS

Laura Palagi;SECCIA, RUGGIERO
2019

Abstract

Deep Feedforward Neural Networks’ (DFNNs) weights estimation relies on the solution of a very large nonconvex optimization problem that may have many local (no global) minimizers, saddle points and large plateaus. Furthermore, the time needed to find good solutions to the training problem heavily depends on both the number of samples and the number of weights (variables). In this work, we show how Block Coordinate Descent (BCD) methods can be applied to improve the performance of state-of-the-art algorithms by avoiding bad stationary points and flat regions. We first describe a batch BCD method able to effectively tackle difficulties due to the network’s depth; then we further extend the algorithm proposing an online BCD scheme able to scale with respect to both the number of variables and the number of samples. We perform extensive numerical results on standard datasets using different deep networks, and we showed how the application of (online) BCD methods to the training phase of DFNNs permits to outperform standard batch/online algorithms leading to an improvement on both the training phase and the generalization performance of the networks.
2019
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1291201
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact