We present a framework based on bilevel opti- mization for learning multilayer, deep data rep- resentations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer, a fidelity term and a linear function depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the lin- ear functions to yield a linearly separable final representation. We show that, by choosing the fi- delity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feedforward neural network. Instead, by elaborating on Breg- man distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experi- ments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.
Bregman Neural Networks / Frecon, Jordan; Gasso, Gilles; Pontil, Massimiliano; Salzo, Saverio. - PMLR 162:(2022). (Intervento presentato al convegno International Conference on Machine Learning tenutosi a Baltimore; Maryland; USA).
Bregman Neural Networks
Saverio Salzo
2022
Abstract
We present a framework based on bilevel opti- mization for learning multilayer, deep data rep- resentations. On the one hand, the lower-level problem finds a representation by successively minimizing layer-wise objectives made of the sum of a prescribed regularizer, a fidelity term and a linear function depending on the representation found at the previous layer. On the other hand, the upper-level problem optimizes over the lin- ear functions to yield a linearly separable final representation. We show that, by choosing the fi- delity term as the quadratic distance between two successive layer-wise representations, the bilevel problem reduces to the training of a feedforward neural network. Instead, by elaborating on Breg- man distances, we devise a novel neural network architecture additionally involving the inverse of the activation function reminiscent of the skip connection used in ResNets. Numerical experi- ments suggest that the proposed Bregman variant benefits from better learning properties and more robust prediction performance.File | Dimensione | Formato | |
---|---|---|---|
Frecon_Bregman_2022.pdf
accesso aperto
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
739.51 kB
Formato
Adobe PDF
|
739.51 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.