Recent developments in the interplay between Operational Research and Statistics allowed us to exploit advances in Mixed-Integer Optimisation (MIO) solvers to improve the quality of statistical analysis. In this work, we tackle Canonical Correlation Analysis (CCA), a dimensionality reduction method that jointly summarises multiple data sources while retaining their dependency structure. We propose a new technique for encoding sparsity in CCA by means of a mathematical programming formulation that allows one to obtain an exact solution using readily available solvers (such as Gurobi) or design solution algorithmic procedures based on it. Finally, we evaluate the performance of alternative solution strategies presented on multiple datasets from the literature. The results of the extensive comparison study highlight that the proposed approach is capable of finding the optimal correlation or finding good quality solutions, better than those provided by other conventional methods.
A Mathematical Programming Approach to Sparse Canonical Correlation Analysis / Amorosi, L.; Padellini, T.; Puerto, J.; Valverde, C.. - In: EXPERT SYSTEMS WITH APPLICATIONS. - ISSN 0957-4174. - 237:(2023), p. 121293. [10.1016/j.eswa.2023.121293]
A Mathematical Programming Approach to Sparse Canonical Correlation Analysis
Amorosi L.
;
2023
Abstract
Recent developments in the interplay between Operational Research and Statistics allowed us to exploit advances in Mixed-Integer Optimisation (MIO) solvers to improve the quality of statistical analysis. In this work, we tackle Canonical Correlation Analysis (CCA), a dimensionality reduction method that jointly summarises multiple data sources while retaining their dependency structure. We propose a new technique for encoding sparsity in CCA by means of a mathematical programming formulation that allows one to obtain an exact solution using readily available solvers (such as Gurobi) or design solution algorithmic procedures based on it. Finally, we evaluate the performance of alternative solution strategies presented on multiple datasets from the literature. The results of the extensive comparison study highlight that the proposed approach is capable of finding the optimal correlation or finding good quality solutions, better than those provided by other conventional methods.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.