Catalogo dei prodotti della ricerca

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.

An ecology-based index for text embedding and classification / Martino, Alessio; DE SANTIS, Enrico; Rizzi, Antonello. - (2020), pp. 1-8. ( 2020 International Joint Conference on Neural Networks, IJCNN 2020 Glasgow (UK) ) [10.1109/IJCNN48605.2020.9207299].

An ecology-based index for text embedding and classification

Alessio Martino;Enrico De Santis;Antonello Rizzi

2020

Abstract

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2020
			
	Nome convegno
	
				2020 International Joint Conference on Neural Networks, IJCNN 2020
			
	Parole chiave
	
				embedding spaces; explainable artificial intelligence; granular computing; natural language processing; supervised learning; support vector machine; text classification
			
	Tipologia
	
				04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				An ecology-based index for text embedding and classification / Martino, Alessio; DE SANTIS, Enrico; Rizzi, Antonello. - (2020), pp. 1-8. ( 2020 International Joint Conference on Neural Networks, IJCNN 2020 Glasgow (UK) ) [10.1109/IJCNN48605.2020.9207299].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

File	Dimensione	Formato
Martino_Copertina-indice_Ecology-based_2020.pdf accesso aperto Tipologia: Altro materiale allegato Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 607.48 kB Formato Adobe PDF	607.48 kB	Adobe PDF
Martino_Ecology-based_2020.pdf solo gestori archivio Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 235.12 kB Formato Adobe PDF Contatta l'autore	235.12 kB	Adobe PDF	Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1453629

Citazioni

ND

11

3

social impact