In the logic of automatic text classification, this study presents a procedure based on the analysis of clusters and the TFIDF index to generate a multi-class categorization of documents. Starting from a set of appropriately selected terms, we proceed to a non-supervised disjunctive classification into groups. The set of terms characterizing each cluster identifies the thematic dictionary of the group. For each document, we calculate the TFIDF associated with each dictionary. The multi-class categorization is obtained on the basis of the higher values of TFIDF for each document. An example of this procedure applied to a corpus of some one-hundred reviews of restaurants is proposed.
Multi-class categorization based on cluster analysis and TFIDF / Bolasco, Sergio; Pavone, P.. - STAMPA. - 1(2008), pp. 209-218.
Multi-class categorization based on cluster analysis and TFIDF
BOLASCO, Sergio;
2008
Abstract
In the logic of automatic text classification, this study presents a procedure based on the analysis of clusters and the TFIDF index to generate a multi-class categorization of documents. Starting from a set of appropriately selected terms, we proceed to a non-supervised disjunctive classification into groups. The set of terms characterizing each cluster identifies the thematic dictionary of the group. For each document, we calculate the TFIDF associated with each dictionary. The multi-class categorization is obtained on the basis of the higher values of TFIDF for each document. An example of this procedure applied to a corpus of some one-hundred reviews of restaurants is proposed.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.