The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, χ2, PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.

A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures / Franzoni, Valentina; Milani, Alfredo. - STAMPA. - 9790:(2016), pp. 438-452. (Intervento presentato al convegno 16th International Conference on Computational Science and Its Applications, ICCSA 2016 tenutosi a Beijing; China nel 4 July 2016 through 7 July 2016) [10.1007/978-3-319-42092-9_34].

A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures

FRANZONI, VALENTINA
;
2016

Abstract

The Internet explosion and the massive diffusion of mobile devices lead to the creation of a worldwide collaborative system, daily used by millions of users through search engines and application interfaces. New paradigms permit to calculate the similarity of terms using only the statistical information returned by a query, or from additional features; also old algorithms and measures have been applied to new domains and scopes, to efficiently find words clusters from the Web. The problem of evaluating such techniques and algorithms in new domains emerges, and highlights a still open field of experimentation. In this paper, preliminary tests have been held on different semantic proximity measures (average confidence, NGD, PMI, χ2, PMING Distance), and different clustering algorithms among the most used in literature have been compared (e.g. k-means, Expectation-Maximization, spectral clustering) for evaluating such measures. The suitability of the considered measures and methods to calculate the semantic proximity was verified at the state-of-art, and problems were identified, comparing the results of measurements to a ground truth provided by models of contextualized knowledge, clustering and human perception of semantic relations, which data are already studied in literature.
2016
16th International Conference on Computational Science and Its Applications, ICCSA 2016
Data mining; Clustering; Semantic evaluation; Semantic similarity; Information retrieval
04 Pubblicazione in atti di convegno::04b Atto di convegno in volume
A semantic comparison of clustering algorithms for the evaluation of web-based similarity measures / Franzoni, Valentina; Milani, Alfredo. - STAMPA. - 9790:(2016), pp. 438-452. (Intervento presentato al convegno 16th International Conference on Computational Science and Its Applications, ICCSA 2016 tenutosi a Beijing; China nel 4 July 2016 through 7 July 2016) [10.1007/978-3-319-42092-9_34].
File allegati a questo prodotto
File Dimensione Formato  
Franzoni_A-Semantic-Comparison_2016.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.46 MB
Formato Adobe PDF
1.46 MB Adobe PDF   Contatta l'autore
Franzoni_Frontespizio-indice_A-Semantic-Comparison_2016.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 614.36 kB
Formato Adobe PDF
614.36 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/948029
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 18
  • ???jsp.display-item.citation.isi??? 14
social impact