In this paper, we focus on the linear functionals defining an approximate version of the gradient of a function. These functionals are often used when dealing with optimization problems where the computation of the gradient of the objective function is costly or the objective function values are affected by some noise. These functionals have been recently considered to estimate the gradient of the objective function by the expected value of the function variations in the space of directions. The expected value is then approximated by a sample average over a proper (random) choice of sample directions in the domain of integration. In this way, the approximation error is characterized by statistical properties of the sample average estimate, typically its variance. Therefore, while useful and attractive bounds for the error variance can be expressed in terms of the number of function evaluations, nothing can be said on the error of a single experiment that could be quite large. This work instead is aimed at deriving an approximation scheme for linear functionals approximating the gradient, whose error of approximation can be characterized by a deterministic point of view in the case of noise-free data. The previously mentioned linear functionals are no longer considered as expected values over the space of directions, but rather as the filtered derivative of the objective function by a Gaussian kernel. By using this new approach, a gradient estimation based on a suitable linear combination of central finite differences at different step sizes is proposed and deterministic bounds that do not depend on the particular sample of points considered are computed. In the noisy setting, on the other end, the variance of the estimation error of the proposed method is showed to be strictly lower than the one of the estimation error of the Central Finite Difference scheme. Numerical experiments on a set of test functions are encouraging, showing good performances compared to those of some methods commonly used in the literature, also in the noisy setting.
A mixed finite differences scheme for gradient approximation / Boresta, Marco; Colombo, Tommaso; DE SANTIS, Alberto; Lucidi, Stefano. - In: JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS. - ISSN 0022-3239. - (2022). [10.1007/s10957-021-01994-w]
A mixed finite differences scheme for gradient approximation
Marco BorestaPrimo
;Tommaso Colombo;Alberto De Santis
;Stefano Lucidi
2022
Abstract
In this paper, we focus on the linear functionals defining an approximate version of the gradient of a function. These functionals are often used when dealing with optimization problems where the computation of the gradient of the objective function is costly or the objective function values are affected by some noise. These functionals have been recently considered to estimate the gradient of the objective function by the expected value of the function variations in the space of directions. The expected value is then approximated by a sample average over a proper (random) choice of sample directions in the domain of integration. In this way, the approximation error is characterized by statistical properties of the sample average estimate, typically its variance. Therefore, while useful and attractive bounds for the error variance can be expressed in terms of the number of function evaluations, nothing can be said on the error of a single experiment that could be quite large. This work instead is aimed at deriving an approximation scheme for linear functionals approximating the gradient, whose error of approximation can be characterized by a deterministic point of view in the case of noise-free data. The previously mentioned linear functionals are no longer considered as expected values over the space of directions, but rather as the filtered derivative of the objective function by a Gaussian kernel. By using this new approach, a gradient estimation based on a suitable linear combination of central finite differences at different step sizes is proposed and deterministic bounds that do not depend on the particular sample of points considered are computed. In the noisy setting, on the other end, the variance of the estimation error of the proposed method is showed to be strictly lower than the one of the estimation error of the Central Finite Difference scheme. Numerical experiments on a set of test functions are encouraging, showing good performances compared to those of some methods commonly used in the literature, also in the noisy setting.File | Dimensione | Formato | |
---|---|---|---|
Boresta_A-Mixed_2022.pdf
accesso aperto
Note: DOI: 10.1007/s10957-021-01994-w
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Creative commons
Dimensione
372.28 kB
Formato
Adobe PDF
|
372.28 kB | Adobe PDF |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.