The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.

Efficient approaches for solving the large-scale k-medoids problem: Towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - (2019), pp. 199-219. - STUDIES IN COMPUTATIONAL INTELLIGENCE. [10.1007/978-3-030-16469-0_11].

Efficient approaches for solving the large-scale k-medoids problem: Towards structured data

Martino, Alessio;Rizzi, Antonello;Frattale Mascioli, Fabio Massimo
2019

Abstract

The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.
2019
Computational Intelligence
978-3-030-16468-3
978-3-030-16469-0
cluster analysis; parallel and distributed computing; large-scale pattern recognition; unsupervised learning; big data mining; non-metric spaces analysis
02 Pubblicazione su volume::02a Capitolo o Articolo
Efficient approaches for solving the large-scale k-medoids problem: Towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - (2019), pp. 199-219. - STUDIES IN COMPUTATIONAL INTELLIGENCE. [10.1007/978-3-030-16469-0_11].
File allegati a questo prodotto
File Dimensione Formato  
Martino_Efficient-approaches_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 414.34 kB
Formato Adobe PDF
414.34 kB Adobe PDF   Contatta l'autore
Martino_Efficient-approaches_ProductFlyer_2019.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 636.77 kB
Formato Adobe PDF
636.77 kB Adobe PDF   Contatta l'autore
Martino_Efficient_Copertina_2019.pdf

solo gestori archivio

Tipologia: Altro materiale allegato
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 99.07 kB
Formato Adobe PDF
99.07 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1277072
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 26
  • ???jsp.display-item.citation.isi??? 19
social impact