Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.
An Information Granulation Approach Through m-Grams for Text Classification / DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - (2023), pp. 73-89. [10.1007/978-3-031-46221-4_4].
An Information Granulation Approach Through m-Grams for Text Classification
Enrico De Santis;Antonino Capillo;Emanuele Ferrandino;Fabio Massimo Frattale Mascioli;Antonello Rizzi
2023
Abstract
Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.File | Dimensione | Formato | |
---|---|---|---|
De Santis_An information granulation_2023.pdf
solo gestori archivio
Note: Articolo principale
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
311.63 kB
Formato
Adobe PDF
|
311.63 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.