Catalogo dei prodotti della ricerca

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

How to escape nonconvex regions efficiently in large scale optimization problems / Tronci, EDOARDO MARIA. - (2022 May 20).

How to escape nonconvex regions efficiently in large scale optimization problems

TRONCI, EDOARDO MARIA

20/05/2022

Abstract

Solving large scale optimization problems, such as neural networks training, can present many challenges. Among others, the proliferation of useless stationary points, that is points where the objective function value is quite far from that of a global minimum, can pose a serious drawback for the optimization algorithm used, which is attracted by them and therefore results inefficient. In this dissertation, we propose two algorithmic schemes along the following lines. First, extending the result proposed in [1] for shallow networks, that is networks with only one hidden layer, we propose a mathematical characterization of a class of such stationary points that arise in deep multilayer neural networks training, that is networks with more than one hidden layer. Availing such a description, we are able to define an incremental training approach that avoids getting stuck in the region of attraction of these undesirable stationary points. Then, exploiting the main properties of the nonmonotone truncated Newton’s method proposed in [2], we attempt to grasp the benefits of using second-order information by giving a preliminary numerical evidence of the potential of following directions of negative curvature during neural networks training in order to guarantee the skill of the optimization algorithm in escaping regions where the objective function is nonconvex. References: [1] Fukumizu, K., & Amari, S. I. (2000). Local minima and plateaus in hierarchical structures of multilayer perceptrons. Neural networks, 13(3), 317-327. [2] Fasano, G., & Lucidi, S. (2009). A nonmonotone truncated Newton–Krylov method exploiting negative curvature directions, for large scale unconstrained optimization. Optimization Letters, 3(4), 521-535.

Scheda breve

Scheda completa

Data di discussione

20-mag-2022

Appartiene alla tipologia:

07a Tesi di Dottorato

File allegati a questo prodotto

File	Dimensione	Formato
Tesi_dottorato_Tronci.pdf accesso aperto Tipologia: Tesi di dottorato Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 33.09 MB Formato Adobe PDF	33.09 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1635338

Citazioni

ND

ND

ND

social impact