Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.

An Information Granulation Approach Through m-Grams for Text Classification / DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - (2023), pp. 73-89. [10.1007/978-3-031-46221-4_4].

An Information Granulation Approach Through m-Grams for Text Classification

Enrico De Santis;Antonino Capillo;Emanuele Ferrandino;Fabio Massimo Frattale Mascioli;Antonello Rizzi
2023

Abstract

Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.
2023
Computational intelligence
978-3-031-46220-7
978-3-031-46221-4
conceptual spaces; explainable aI; granular computing; text categorization; text mining
02 Pubblicazione su volume::02a Capitolo o Articolo
An Information Granulation Approach Through m-Grams for Text Classification / DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - (2023), pp. 73-89. [10.1007/978-3-031-46221-4_4].
File allegati a questo prodotto
File Dimensione Formato  
De Santis_An information granulation_2023.pdf

solo gestori archivio

Note: Articolo principale
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 311.63 kB
Formato Adobe PDF
311.63 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696576
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact