Catalogo dei prodotti della ricerca

Explanations in classification models play an important role in making machine learning methods more transparent, particularly those models that are not clearly defined, such as black-box models. The algorithm described in this thesis, belonging to the class of Automatic Subgroup Detection techniques (ASD), aims at giving an insight into the way results are obtained. ASD methods try to discover population subsets where model behavior is significantly different from the baseline. While traditional ASD methods often focus on identifying problematic subgroups, such as those with high divergence in false positive rates, this research introduces a novel non-perturbative method specifically designed to identify the largest possible data subsets that achieve an accuracy exceeding a predefined threshold. The methodology extends the class of ASD techniques by providing the ability to identify subspaces rather than subgroups, defining these regions through feature ranges for both numerical and categorical variables. This algorithm is particularly effective at identifying subsets that respond best to specific actions, ensuring a predefined accuracy threshold is maintained for prediction tasks. Specifically, the methodology focuses on finding subspaces characterized by a significantly better or worse performance in terms of classification accuracy without requiring the transformation of numerical attributes. The proposed algorithm can be evaluated against others in the same class, despite significant methodological differences. The benchmarking process involves comparing, across various datasets, execution times and subset cardinality, while maintaining consistent levels of accuracy and support. To delve deeper into our method, we create a grid search space to decompose the algorithmic complexity and present different versions of the algorithm designed to operate within each specific data partition. Across all experiments, the algorithm is evaluated to demonstrate its effectiveness in identifying valid subspaces as the accuracy threshold becomes increasingly stringent. This research was inspired by practical needs arising from the analysis of criminal recidivism data and, more broadly, from the examination of judicial records. The argument could serve as a primary application context for the proposed methodology and is examined in depth in a dedicated chapter of this dissertation.

Explaining Machine Learning models with subgroup analysis / Tagliafierro, F.. - (2026 May 27).

Explaining Machine Learning models with subgroup analysis

TAGLIAFIERRO, FLAVIA

27/05/2026

Abstract

Explanations in classification models play an important role in making machine learning methods more transparent, particularly those models that are not clearly defined, such as black-box models. The algorithm described in this thesis, belonging to the class of Automatic Subgroup Detection techniques (ASD), aims at giving an insight into the way results are obtained. ASD methods try to discover population subsets where model behavior is significantly different from the baseline. While traditional ASD methods often focus on identifying problematic subgroups, such as those with high divergence in false positive rates, this research introduces a novel non-perturbative method specifically designed to identify the largest possible data subsets that achieve an accuracy exceeding a predefined threshold. The methodology extends the class of ASD techniques by providing the ability to identify subspaces rather than subgroups, defining these regions through feature ranges for both numerical and categorical variables. This algorithm is particularly effective at identifying subsets that respond best to specific actions, ensuring a predefined accuracy threshold is maintained for prediction tasks. Specifically, the methodology focuses on finding subspaces characterized by a significantly better or worse performance in terms of classification accuracy without requiring the transformation of numerical attributes. The proposed algorithm can be evaluated against others in the same class, despite significant methodological differences. The benchmarking process involves comparing, across various datasets, execution times and subset cardinality, while maintaining consistent levels of accuracy and support. To delve deeper into our method, we create a grid search space to decompose the algorithmic complexity and present different versions of the algorithm designed to operate within each specific data partition. Across all experiments, the algorithm is evaluated to demonstrate its effectiveness in identifying valid subspaces as the accuracy threshold becomes increasingly stringent. This research was inspired by practical needs arising from the analysis of criminal recidivism data and, more broadly, from the examination of judicial records. The argument could serve as a primary application context for the proposed methodology and is examined in depth in a dedicated chapter of this dissertation.

Scheda breve

Scheda completa

Data di discussione

27-mag-2026

Appartiene alla tipologia:

07a Tesi di Dottorato

File allegati a questo prodotto

File	Dimensione	Formato
Tesi_dottorato_Tagliafierro.pdf accesso aperto Note: tesi completa Tipologia: Tesi di dottorato Licenza: Creative commons Dimensione 1.82 MB Formato Adobe PDF	1.82 MB	Adobe PDF

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1768845

Citazioni

ND

ND

ND

social impact