Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.

An ecology-based index for text embedding and classification / Martino, Alessio; DE SANTIS, Enrico; Rizzi, Antonello. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a Glasgow (UK)) [10.1109/IJCNN48605.2020.9207299].

An ecology-based index for text embedding and classification

Alessio Martino;Enrico De Santis;Antonello Rizzi
2020

Abstract

Natural language processing and text mining applications have gained a growing attention and diffusion in the computer science and machine learning communities. In this work, a new embedding scheme is proposed for solving text classification problems. The embedding scheme relies on a statistical assessment of relevant words within a corpus using a compound index originally proposed in ecology: this allows to spot relevant parts of the overall text (e.g., words) on the top of which the embedding is performed following a Granular Computing approach. The employment of statistically meaningful words not only eases the computational burden and the embedding space dimensionality, but also returns a more interpretable model. Our approach is tested on both synthetic datasets and benchmark datasets against well-known embedding techniques, with remarkable results both in terms of performances and computational complexity.
2020
2020 International Joint Conference on Neural Networks, IJCNN 2020
embedding spaces; explainable artificial intelligence; granular computing; natural language processing; supervised learning; support vector machine; text classification
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
An ecology-based index for text embedding and classification / Martino, Alessio; DE SANTIS, Enrico; Rizzi, Antonello. - (2020), pp. 1-8. (Intervento presentato al convegno 2020 International Joint Conference on Neural Networks, IJCNN 2020 tenutosi a Glasgow (UK)) [10.1109/IJCNN48605.2020.9207299].
File allegati a questo prodotto
File Dimensione Formato  
Martino_Copertina-indice_Ecology-based_2020.pdf

accesso aperto

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 607.48 kB
Formato Adobe PDF
607.48 kB Adobe PDF
Martino_Ecology-based_2020.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 235.12 kB
Formato Adobe PDF
235.12 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1453629
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? 2
social impact