In this paper, we consider the problem of distributed unsupervised learning where data to be clustered are partitioned over a set of agents having limited connectivity. In order to solve this problem, we consider a novel and extended ensemble clustering procedure in order to make it suitable to a fully distributed scenario. The proposed algorithm can deal with the case where each agent has a local and different dataset. Additionally, to reduce the total amount of exchanged information, only the local prototypes of clusters are forwarded among the neighbors. Cluster similarity indexes are adopted to solve conflicts among agents and to achieve a common structure at the end of the communication process. The experimental results prove the feasibility of this approach, which is able to reach an optimal performance when compared to a fully centralized implementation, that is where data is collected beforehand on a single clustering agent.

A decentralized algorithm for distributed ensemble clustering / Rosato, A.; Altilio, R.; Panella, M.. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - 578:(2021), pp. 417-434. [10.1016/j.ins.2021.07.028]

A decentralized algorithm for distributed ensemble clustering

Rosato A.;Altilio R.;Panella M.
2021

Abstract

In this paper, we consider the problem of distributed unsupervised learning where data to be clustered are partitioned over a set of agents having limited connectivity. In order to solve this problem, we consider a novel and extended ensemble clustering procedure in order to make it suitable to a fully distributed scenario. The proposed algorithm can deal with the case where each agent has a local and different dataset. Additionally, to reduce the total amount of exchanged information, only the local prototypes of clusters are forwarded among the neighbors. Cluster similarity indexes are adopted to solve conflicts among agents and to achieve a common structure at the end of the communication process. The experimental results prove the feasibility of this approach, which is able to reach an optimal performance when compared to a fully centralized implementation, that is where data is collected beforehand on a single clustering agent.
2021
cluster validity; data privacy; distributed learning; ensemble clustering; multiple data sources
01 Pubblicazione su rivista::01a Articolo in rivista
A decentralized algorithm for distributed ensemble clustering / Rosato, A.; Altilio, R.; Panella, M.. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - 578:(2021), pp. 417-434. [10.1016/j.ins.2021.07.028]
File allegati a questo prodotto
File Dimensione Formato  
Rosato_Decentralized-algorithm_2021.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.71 MB
Formato Adobe PDF
1.71 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1565715
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? 9
social impact