Catalogo dei prodotti della ricerca

Text categorization is an interesting application of machine learning covering a wide range of possible applications, from document management systems to web mining. In designing such a system it is mandatory to correctly define both a suited preprocessing procedure and an effective document representation as closely related as possible to the semantic nature of document categories. To this aim, relying on a Granular Computing approach and considering a document as an ordered sequence of words, we propose a system able to automatically mine frequent terms, considering as a term not only a single word, but also a subsequence of (a few) consecutive words. The whole classification system is tailored to process sequences of atomic elements (i.e., encoded words) by means of an embedding procedure based on clustering methods. However, when dealing with unbalanced data sets, i.e. when classes are not evenly represented in the data set, the frequent substructures search procedure must be caref

Automatic text categorization by a Granular Computing approach: Facing unbalanced data sets / POSSEMATO, F., RIZZI, A.. - (2013), pp. 1-8. (2013 International Joint Conference on Neural Networks, IJCNN 2013 Dallas; United States 4 August 2013 through 9 August 2013) [10.1109/ijcnn.2013.6707082].

Automatic text categorization by a Granular Computing approach: Facing unbalanced data sets

POSSEMATO, FRANCESCA;RIZZI, Antonello

2013

Abstract

Text categorization is an interesting application of machine learning covering a wide range of possible applications, from document management systems to web mining. In designing such a system it is mandatory to correctly define both a suited preprocessing procedure and an effective document representation as closely related as possible to the semantic nature of document categories. To this aim, relying on a Granular Computing approach and considering a document as an ordered sequence of words, we propose a system able to automatically mine frequent terms, considering as a term not only a single word, but also a subsequence of (a few) consecutive words. The whole classification system is tailored to process sequences of atomic elements (i.e., encoded words) by means of an embedding procedure based on clustering methods. However, when dealing with unbalanced data sets, i.e. when classes are not evenly represented in the data set, the frequent substructures search procedure must be caref

Scheda breve

Scheda completa

	Anno di pubblicazione
	
				2013
			
	Nome convegno
	
				2013 International Joint Conference on Neural Networks, IJCNN 2013
			
	Parole chiave
	
				frequent substructures mining; unbalanced data sets; granular computing; text categorization
			
	Tipologia
	
				Pubblicazione in atti di convegno::04b Atto di convegno in volume
			
	Citazione
	
				Automatic text categorization by a Granular Computing approach: Facing unbalanced data sets / POSSEMATO, F., RIZZI, A.. - (2013), pp. 1-8. (2013 International Joint Conference on Neural Networks, IJCNN 2013 Dallas; United States 4 August 2013 through 9 August 2013) [10.1109/ijcnn.2013.6707082].
			
	Appartiene alla tipologia:
	
				04b Atto di convegno in volume

File allegati a questo prodotto

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/526118

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

9

0

social impact