Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

How to escape nonconvex regions efficiently in large scale optimization problems / Tronci, EDOARDO MARIA. - (2022 May 20).

How to escape nonconvex regions efficiently in large scale optimization problems

TRONCI, EDOARDO MARIA
20/05/2022

Abstract

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.
20-mag-2022
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Tronci.pdf

accesso aperto

Tipologia: Tesi di dottorato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 33.09 MB
Formato Adobe PDF
33.09 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1635338
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact