An Information Granulation Approach Through m-Grams for Text Classification

DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello

doi:10.1007/978-3-031-46221-4_4

Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.

An Information Granulation Approach Through m-Grams for Text Classification / DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - (2023), pp. 73-89. [10.1007/978-3-031-46221-4_4].

An Information Granulation Approach Through m-Grams for Text Classification

Enrico De Santis;Antonino Capillo;Emanuele Ferrandino;Fabio Massimo Frattale Mascioli;Antonello Rizzi

2023

Abstract

Nowadays, researchers and practitioners are focusing not only on high-performance systems in terms of classification capabilities, but also in generating gray-box interpretable models in which the relations belonging to the input-output mapping is also explainable, in line with the“explainable AI" paradigm. In this paper is proposed a text mining system capable of classifying text excerpts belonging to a suitable corpus through the human-centric Granular Computing approach. The system is grounded on two granulation levels of text, i.e. the set of extracted m-grams and a suitable hard partition of the latter obtained through both a well-suited dissimilarity measure between m-grams along with a clustering algorithm. The procedure allows embedding a text document – exploiting the Symbolic Histogram technique – in a real-valued vector space where standard Machine Learning algorithms, such as SVM, can safely operate. An evolutionary metaheuristic based on a Genetic Algorithm is, then, in charge of optimizing the system metaparameters, as well as performing a wrapper-like feature selection. This procedure allows carrying out knowledge discovery on the obtained models selecting relevant and interpretable information granules (i.e. m-grams) related to the classification task. The current study presents a new evaluation framework of the performance of the entire system in terms of generalization capabilities and the obtained results show the reliability of the overall procedure. Furthermore, a new conceptual framework is also presented in term of the “representation" problem in automated Pattern Recognition systems.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2023
			
	Titolo del volume
	
				Computational intelligence
			
	ISBN
	
				978-3-031-46220-7
978-3-031-46221-4
			
	Parole chiave
	
				conceptual spaces; explainable aI; granular computing; text categorization; text mining
			
	Tipologia
	
				02 Pubblicazione su volume::02a Capitolo o Articolo
			
	Citazione
	
				An Information Granulation Approach Through m-Grams for Text Classification / DE SANTIS, Enrico; Capillo, Antonino; Ferrandino, Emanuele; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - (2023), pp. 73-89. [10.1007/978-3-031-46221-4_4].
			
	Appartiene alla tipologia:
	
				02a Capitolo o Articolo

File allegati a questo prodotto

File	Dimensione	Formato
De Santis_An information granulation_2023.pdf solo gestori archivio Note: Articolo principale Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 311.63 kB Formato Adobe PDF Contatta l'autore	311.63 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1696576

Citazioni

ND

1

ND

Nome	Dominio	Durata	Descrizione
s_.*	plu.mx	sessione	recupero grafico citazioni sociali da plumx
A_.*	core.ac.uk	7 giorni	recupero pubblicazioni consigliate per il pannello core-recommander
GS_.*	gstatic.com	richiesta http	visualizza grafico citazioni
CC_.*	creativecommons.org	richiesta http	visualizza licenza bitstream

Catalogo dei prodotti della ricerca

An Information Granulation Approach Through m-Grams for Text Classification

Enrico De Santis;Antonino Capillo;Emanuele Ferrandino;Fabio Massimo Frattale Mascioli;Antonello Rizzi

2023

Abstract

Scheda breve Scheda completa

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa