The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, χ2, PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.
A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures / Franzoni, Valentina; Milani, Alfredo. - STAMPA. - 9790:(2016), pp. 438-452. (Intervento presentato al convegno 16th International Conference on Computational Science and Its Applications, ICCSA 2016 tenutosi a Beijing; China nel 4 July 2016 through 7 July 2016) [10.1007/978-3-319-42092-9_34].
A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures
FRANZONI, VALENTINA
;
2016
Abstract
The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, χ2, PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.File | Dimensione | Formato | |
---|---|---|---|
Franzoni_A-Semantic-Comparison_2016.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Contatta l'autore |
Franzoni_Frontespizio-indice_A-Semantic-Comparison_2016.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
614.36 kB
Formato
Adobe PDF
|
614.36 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.