In this work we study high-probability bounds for stochastic subgradient methods under heavy tailed noise in Hilbert spaces. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoy high-probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to optimal anytime and finite horizon bounds for general averaging schemes of the iterates. We also show an application of our proposal to the case of kernel methods which gives an efficient and fully implementable algorithm for statistical supervised learning problems. Preliminary experiments are shown to support the validity of the method.
High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise / Angela Parletta, Daniela; Paudice, Andrea; Pontil, Massimiliano; Salzo, Saverio. - In: SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE. - ISSN 2577-0187. - 6:4(2024), pp. 953-977. [10.1137/22M1536558]
High Probability Bounds for Stochastic Subgradient Schemes with Heavy Tailed Noise
Saverio Salzo
2024
Abstract
In this work we study high-probability bounds for stochastic subgradient methods under heavy tailed noise in Hilbert spaces. In this setting the noise is only assumed to have finite variance as opposed to a sub-Gaussian distribution for which it is known that standard subgradient methods enjoy high-probability bounds. We analyzed a clipped version of the projected stochastic subgradient method, where subgradient estimates are truncated whenever they have large norms. We show that this clipping strategy leads both to optimal anytime and finite horizon bounds for general averaging schemes of the iterates. We also show an application of our proposal to the case of kernel methods which gives an efficient and fully implementable algorithm for statistical supervised learning problems. Preliminary experiments are shown to support the validity of the method.File | Dimensione | Formato | |
---|---|---|---|
Parletta_High-Probability_2024.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
656.61 kB
Formato
Adobe PDF
|
656.61 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.