n this work, variable stars (especially RR Lyrae) light curves and the pursuit of Intermediate Mass Black Holes (IMBHs) within globular clusters (GCs), are explored through a data-driven approach using interpretable machine learning (ML) techniques. We initiate the study with the development of an inherently interpretable classifier, utilizing L1 penalization to induce simplicity through sparsity in the model. This approach ensures a straightforward interpretation of astronomical data, providing a transparent model that facilitates the extraction of valuable insights. This penalized classifier, which reaches $90\%$ sparsity in the light-curve features, with a limited trade-off in accuracy performs well both on the Catalina Sky Survey validation set and, remarkably, also on the different ASAS/ASAS-SN light curve test set. Following this, we apply the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm to analyze light curves of RR Lyrae and $\delta$ Scuti stars, uncovering the underlying dynamical systems from observational light curves from the Catalina Sky Survey. This method yields a sparse and interpretable representation. The success rate depends systematically on variable type, with possible implications for variable star classification; however it does not obviously depend on amplitude or period. Successful models can be reduced to the generalized Lienard equation $\ddot{x} + (a + b x + c \dot{x})\dot{x} + x = 0$. For $a, b = 0$ the equation can be solved exactly, and it admits both periodic and non-periodic solutions. We find a condition on the coefficients of the general equation for the presence of a limit cycle, which is also observed numerically in several instances. In addition, we employ Dynamic Mode Decomposition (DMD) to investigate the modes of variable stars within the Omega Centauri cluster. This data-driven technique provides insights into the diverse modes of stellar variability, contributing to our understanding of the complex dynamics of these stars. In the context of IMBH detection in GCs, we apply two ML models: CORELS, an inherently interpretable model, and XGBoost, a black box model elucidated post hoc using local, model-agnostic explanation rules known as anchors. By training these models on simulated GC data and subsequently applying them to actual observational data, we emphasize the importance of interpretability in scientific investigations. Our results demonstrate that simpler, interpretable models can indeed attain accuracy comparable to their more complex counterparts (for the relevant metrics), an approach of significant importance in the field of astronomy where comprehending the model’s decision-making process is crucial for establishing trust and facilitating further scientific exploration for domain-field experts. Our findings challenge the prevalent assumption that complexity is a necessary condition for accuracy, highlighting the existence of interpretable models within the set of accurate predictive models. In the domains of variable stars and identifying intermediate-mass black holes within globular clusters, we demonstrate that the application of machine learning tools can be both reliable and insightful when guided by models that are both interpretable and straightforward. This aligns with the immediate call for transparency and human-understandability in ML applications, extending beyond astronomy and into the broader scientific community.

Case studies of interpretable machine learning in astrophysics / Trevisan, Piero. - (2023 Dec 21).

Case studies of interpretable machine learning in astrophysics

TREVISAN, PIERO
21/12/2023

Abstract

n this work, variable stars (especially RR Lyrae) light curves and the pursuit of Intermediate Mass Black Holes (IMBHs) within globular clusters (GCs), are explored through a data-driven approach using interpretable machine learning (ML) techniques. We initiate the study with the development of an inherently interpretable classifier, utilizing L1 penalization to induce simplicity through sparsity in the model. This approach ensures a straightforward interpretation of astronomical data, providing a transparent model that facilitates the extraction of valuable insights. This penalized classifier, which reaches $90\%$ sparsity in the light-curve features, with a limited trade-off in accuracy performs well both on the Catalina Sky Survey validation set and, remarkably, also on the different ASAS/ASAS-SN light curve test set. Following this, we apply the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm to analyze light curves of RR Lyrae and $\delta$ Scuti stars, uncovering the underlying dynamical systems from observational light curves from the Catalina Sky Survey. This method yields a sparse and interpretable representation. The success rate depends systematically on variable type, with possible implications for variable star classification; however it does not obviously depend on amplitude or period. Successful models can be reduced to the generalized Lienard equation $\ddot{x} + (a + b x + c \dot{x})\dot{x} + x = 0$. For $a, b = 0$ the equation can be solved exactly, and it admits both periodic and non-periodic solutions. We find a condition on the coefficients of the general equation for the presence of a limit cycle, which is also observed numerically in several instances. In addition, we employ Dynamic Mode Decomposition (DMD) to investigate the modes of variable stars within the Omega Centauri cluster. This data-driven technique provides insights into the diverse modes of stellar variability, contributing to our understanding of the complex dynamics of these stars. In the context of IMBH detection in GCs, we apply two ML models: CORELS, an inherently interpretable model, and XGBoost, a black box model elucidated post hoc using local, model-agnostic explanation rules known as anchors. By training these models on simulated GC data and subsequently applying them to actual observational data, we emphasize the importance of interpretability in scientific investigations. Our results demonstrate that simpler, interpretable models can indeed attain accuracy comparable to their more complex counterparts (for the relevant metrics), an approach of significant importance in the field of astronomy where comprehending the model’s decision-making process is crucial for establishing trust and facilitating further scientific exploration for domain-field experts. Our findings challenge the prevalent assumption that complexity is a necessary condition for accuracy, highlighting the existence of interpretable models within the set of accurate predictive models. In the domains of variable stars and identifying intermediate-mass black holes within globular clusters, we demonstrate that the application of machine learning tools can be both reliable and insightful when guided by models that are both interpretable and straightforward. This aligns with the immediate call for transparency and human-understandability in ML applications, extending beyond astronomy and into the broader scientific community.
21-dic-2023
Bono, Giuseppe; Pasquato, Mario
File allegati a questo prodotto
File Dimensione Formato  
Tesi_dottorato_Trevisan.pdf

accesso aperto

Note: Tesi completa
Tipologia: Tesi di dottorato
Licenza: Creative commons
Dimensione 27.93 MB
Formato Adobe PDF
27.93 MB Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1700595
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact