In this paper, we consider the problem of distributed unsupervised clustering, where training data is partitioned over a set of agents, whose interaction happens over a sparse, but connected, communication network. To solve this problem, we recast the well known Expectation Maximization method in a distributed setting, exploiting a recently proposed algorithmic framework for in-network non-convex optimization. The resulting algorithm, termed as Expectation Maximization Consensus, exploits successive local convexifications to split the computation among agents, while hinging on dynamic consensus to diffuse information over the network in real-time. Convergence to local solutions of the distributed clustering problem is then established. Experimental results on well-known datasets illustrate that the proposed method performs better than other distributed Expectation-Maximization clustering approaches, while the method is faster than a centralized Expectation-Maximization procedure and achieves a comparable performance in terms of cluster validity indexes. The latter ones achieve good values in absolute range scales and prove the quality of the obtained clustering results, which compare favorably with other methods in the literature.

Distributed data clustering over networks / Altilio, R.; Di Lorenzo, P.; Panella, M.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 93:(2019), pp. 603-620. [10.1016/j.patcog.2019.04.021]

Distributed data clustering over networks

Altilio R.;Di Lorenzo P.;Panella M.
2019

Abstract

In this paper, we consider the problem of distributed unsupervised clustering, where training data is partitioned over a set of agents, whose interaction happens over a sparse, but connected, communication network. To solve this problem, we recast the well known Expectation Maximization method in a distributed setting, exploiting a recently proposed algorithmic framework for in-network non-convex optimization. The resulting algorithm, termed as Expectation Maximization Consensus, exploits successive local convexifications to split the computation among agents, while hinging on dynamic consensus to diffuse information over the network in real-time. Convergence to local solutions of the distributed clustering problem is then established. Experimental results on well-known datasets illustrate that the proposed method performs better than other distributed Expectation-Maximization clustering approaches, while the method is faster than a centralized Expectation-Maximization procedure and achieves a comparable performance in terms of cluster validity indexes. The latter ones achieve good values in absolute range scales and prove the quality of the obtained clustering results, which compare favorably with other methods in the literature.
2019
Clustering; distributed learning; expectation maximization; Gaussian mixtures; non-convex optimization
01 Pubblicazione su rivista::01a Articolo in rivista
Distributed data clustering over networks / Altilio, R.; Di Lorenzo, P.; Panella, M.. - In: PATTERN RECOGNITION. - ISSN 0031-3203. - 93:(2019), pp. 603-620. [10.1016/j.patcog.2019.04.021]
File allegati a questo prodotto
File Dimensione Formato  
Altilio_Distributed-data_2019.pdf

solo gestori archivio

Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.83 MB
Formato Adobe PDF
1.83 MB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/1283382
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 22
  • ???jsp.display-item.citation.isi??? 19
social impact