We present a framework based on bilevel opti- mization for learning multilayer, deep data rep- resentations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer, a fidelity term and a linear function depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the lin- ear functions to yield a linearly separable final representation. We show that, by choosing the fi- delity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feedforward neural network. Instead, by elaborating on Breg- man distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experi- ments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.

Bregman Neural Networks / Frecon, Jordan; Gasso, Gilles; Pontil, Massimiliano; Salzo, Saverio. - PMLR 162:(2022). (Intervento presentato al convegno International Conference on Machine Learning tenutosi a Baltimore; Maryland; USA).

Bregman Neural Networks

Saverio Salzo
2022

Abstract

We present a framework based on bilevel opti- mization for learning multilayer, deep data rep- resentations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer, a fidelity term and a linear function depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the lin- ear functions to yield a linearly separable final representation. We show that, by choosing the fi- delity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feedforward neural network. Instead, by elaborating on Breg- man distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experi- ments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.
2022
International Conference on Machine Learning
deep neural network; Bregman proximity operator
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Bregman Neural Networks / Frecon, Jordan; Gasso, Gilles; Pontil, Massimiliano; Salzo, Saverio. - PMLR 162:(2022). (Intervento presentato al convegno International Conference on Machine Learning tenutosi a Baltimore; Maryland; USA).
File allegati a questo prodotto
File Dimensione Formato  
Frecon_Bregman_2022.pdf

accesso aperto

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 739.51 kB
Formato Adobe PDF
739.51 kB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1675277
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact