Catalogo dei prodotti della ricerca

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic solvers employed by the procedure. We provide numerical experiments to support our theoretical analysis and to show the advantage of using stochastic hypergradients in practice.

Convergence Properties of Stochastic Hypergradients / Grazzi, R; Pontil, M; Salzo, S. - 130:(2021), pp. 3826-3834. (Intervento presentato al convegno International Conference on Artificial Intelligence and Statistics tenutosi a San Diego; USA).

Convergence Properties of Stochastic Hypergradients

Grazzi R;Pontil M;Salzo S

2021

Abstract

Bilevel optimization problems are receiving increasing attention in machine learning as they provide a natural framework for hyperparameter optimization and meta-learning. A key step to tackle these problems is the efficient computation of the gradient of the upper-level objective (hypergradient). In this work, we study stochastic approximation schemes for the hypergradient, which are important when the lower-level problem is empirical risk minimization on a large dataset. The method that we propose is a stochastic variant of the approximate implicit differentiation approach in (Pedregosa, 2016). We provide bounds for the mean square error of the hypergradient approximation, under the assumption that the lower-level problem is accessible only through a stochastic mapping which is a contraction in expectation. In particular, our main bound is agnostic to the choice of the two stochastic solvers employed by the procedure. We provide numerical experiments to support our theoretical analysis and to show the advantage of using stochastic hypergradients in practice.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2021
			
	Nome convegno
	
				International Conference on Artificial Intelligence and Statistics
			
	Parole chiave
	
				hyperparameter optimization; stochastic algorithms; rate of convergence
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Convergence Properties of Stochastic Hypergradients / Grazzi, R; Pontil, M; Salzo, S. - 130:(2021), pp. 3826-3834. (Intervento presentato al  convegno International Conference on Artificial Intelligence and Statistics tenutosi a San Diego; USA).
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Grazzi_Convergence_2021.pdf accesso aperto Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 3.93 MB Formato Adobe PDF	3.93 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1654514

Citazioni

ND

12

6

social impact