This paper introduces a Python toolkit to break down accuracy of classification models, by identifying high-accuracy portions of the test dataset and thus facilitating a deeper understanding of where the model performs best and where it falls short. Given a dataset, a classification model and an accuracy thresh- old, our system returns a range query that selects the largest sub-space of the test dataset on which the classifier achieves higher accuracy. The toolkit allows users to interactively explore such sub-space, by adjusting the returned ranges with graphical elements and observing the change in the model’s accuracy and data distributions. Ranges can be manually initialized to highlight the strengths and weaknesses of the model in different scenarios. The core of our method consists of a mixed-integer optimization algorithm. Demonstration on real-world datasets and a selection of models show that our toolkit can serve as an effective way to understand performance across different data segments.

Breaking Down Accuracy with Subspace Optimization / Firmani, Donatella; Grani, Giorgio; Tagliafierro, Flavia. - (2024). (Intervento presentato al convegno Extending Database Technology tenutosi a Paestum; Italy) [10.48786/edbt.2024.71].

Breaking Down Accuracy with Subspace Optimization

Donatella Firmani;Flavia Tagliafierro
2024

Abstract

This paper introduces a Python toolkit to break down accuracy of classification models, by identifying high-accuracy portions of the test dataset and thus facilitating a deeper understanding of where the model performs best and where it falls short. Given a dataset, a classification model and an accuracy thresh- old, our system returns a range query that selects the largest sub-space of the test dataset on which the classifier achieves higher accuracy. The toolkit allows users to interactively explore such sub-space, by adjusting the returned ranges with graphical elements and observing the change in the model’s accuracy and data distributions. Ranges can be manually initialized to highlight the strengths and weaknesses of the model in different scenarios. The core of our method consists of a mixed-integer optimization algorithm. Demonstration on real-world datasets and a selection of models show that our toolkit can serve as an effective way to understand performance across different data segments.
2024
Extending Database Technology
data management; explainable ai; classification; interpretability
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
Breaking Down Accuracy with Subspace Optimization / Firmani, Donatella; Grani, Giorgio; Tagliafierro, Flavia. - (2024). (Intervento presentato al convegno Extending Database Technology tenutosi a Paestum; Italy) [10.48786/edbt.2024.71].
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1707682
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact