The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.
Efficient approaches for solving the large-scale k-medoids problem: Towards structured data / Martino, Alessio; Rizzi, Antonello; Frattale Mascioli, Fabio Massimo. - (2019), pp. 199-219. - STUDIES IN COMPUTATIONAL INTELLIGENCE. [10.1007/978-3-030-16469-0_11].
Efficient approaches for solving the large-scale k-medoids problem: Towards structured data
Martino, Alessio;Rizzi, Antonello;Frattale Mascioli, Fabio Massimo
2019
Abstract
The possibility of clustering objects represented by structured data with possibly non-trivial geometry certainly is an interesting task in pattern recognition. Moreover, in the Big Data era, the possibility of clustering huge amount of (structured) data challenges computer science and pattern recognition researchers alike. The aim of this paper is to bridge the gap on large-scale structured data clustering. Specifically, following a previous work, in this paper a parallel and distributed k-medoids clustering implementation is proposed and tested on real-world biological structured data, namely pathway maps (graphs) and primary structure of proteins (sequences). Furthermore, two methods for medoids’ evaluation are proposed and compared in terms of scalability, based on exact and approximate procedures, respectively. Computational results show that the proposed implementation is flexible with respect to the dissimilarity measure and the input space adopted, with satisfactory results in terms of scalability.File | Dimensione | Formato | |
---|---|---|---|
Martino_Efficient-approaches_2019.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
414.34 kB
Formato
Adobe PDF
|
414.34 kB | Adobe PDF | Contatta l'autore |
Martino_Efficient-approaches_ProductFlyer_2019.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
636.77 kB
Formato
Adobe PDF
|
636.77 kB | Adobe PDF | Contatta l'autore |
Martino_Efficient_Copertina_2019.pdf
solo gestori archivio
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
99.07 kB
Formato
Adobe PDF
|
99.07 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.