In the last years, Social Sciences have been characterized by a significant change, coming from the availability of enormous quantity of highly informative data in every field. Most of them can be found on the Web and it is represented primarily by unstructured data, such as texts, videos, and photos. In spite of a preprocessing phase, textual data, although transformed into a structured data, are still characterized by a high dimensionality and noise. The aim of this paper is to apply a new procedure to classify words that takes in consideration not only the content of a corpus, but also the specificities among different documents, in order to obtain the majority of the information. Using different softwares, we propose a procedure to treat textual data using a co-clustering approach; over than six hundreds online hotels reviews were collected to test and validate the procedure.
Classifying textual data: a two-way approach / Celardo, Livia. - ELETTRONICO. - (2017), pp. 1-11.
Classifying textual data: a two-way approach
CELARDO, LIVIA
2017
Abstract
In the last years, Social Sciences have been characterized by a significant change, coming from the availability of enormous quantity of highly informative data in every field. Most of them can be found on the Web and it is represented primarily by unstructured data, such as texts, videos, and photos. In spite of a preprocessing phase, textual data, although transformed into a structured data, are still characterized by a high dimensionality and noise. The aim of this paper is to apply a new procedure to classify words that takes in consideration not only the content of a corpus, but also the specificities among different documents, in order to obtain the majority of the information. Using different softwares, we propose a procedure to treat textual data using a co-clustering approach; over than six hundreds online hotels reviews were collected to test and validate the procedure.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.