In this paper, we consider the application of data mining methods in medical contexts, wherein the data to be analysed (e.g. records from different patients) is distributed among multiple clinical parties. Although inference procedures could provide meaningful medical information (such as optimal clustering of the subjects), each party is forbidden to disclose its local dataset to a centralized location, due to privacy concerns over sensible portions of the dataset. To this end, we propose a general framework enabling the parties involved to perform (in a decentralized fashion) any data mining procedure relying solely on the Euclidean distance among patterns, including kernel methods, spectral clustering, and so on. Specifically, the problem is recast as a decentralized matrix completion problem, whose proposed solution does not require the presence of a centralized coordinator, and full privacy of the original data can be ensured by the use of different strategies, including random multiplicative updates for secure computation of distances. Experimental results support our proposal as an efficient tool for performing clustering and classification in distributed medical contexts. As an example, on the known Pima Indians Diabetes dataset, we obtain a Rand-Index for clustering of 0.52 against 0.54 of the (unfeasible) centralized solution, while on the Parkinson speech database we increase from 0.45 to 0.50.
Privacy-preserving data mining for distributed medical scenarios / Scardapane, Simone; Altilio, Rosa; Ciccarelli, V.; Uncini, Aurelio; Panella, Massimo. - STAMPA. - 69(2018), pp. 119-128. - SMART INNOVATION, SYSTEMS AND TECHNOLOGIES. [10.1007/978-3-319-56904-8_12].
Privacy-preserving data mining for distributed medical scenarios
SCARDAPANE, SIMONE;ALTILIO, ROSA;UNCINI, Aurelio;PANELLA, Massimo
2018
Abstract
In this paper, we consider the application of data mining methods in medical contexts, wherein the data to be analysed (e.g. records from different patients) is distributed among multiple clinical parties. Although inference procedures could provide meaningful medical information (such as optimal clustering of the subjects), each party is forbidden to disclose its local dataset to a centralized location, due to privacy concerns over sensible portions of the dataset. To this end, we propose a general framework enabling the parties involved to perform (in a decentralized fashion) any data mining procedure relying solely on the Euclidean distance among patterns, including kernel methods, spectral clustering, and so on. Specifically, the problem is recast as a decentralized matrix completion problem, whose proposed solution does not require the presence of a centralized coordinator, and full privacy of the original data can be ensured by the use of different strategies, including random multiplicative updates for secure computation of distances. Experimental results support our proposal as an efficient tool for performing clustering and classification in distributed medical contexts. As an example, on the known Pima Indians Diabetes dataset, we obtain a Rand-Index for clustering of 0.52 against 0.54 of the (unfeasible) centralized solution, while on the Parkinson speech database we increase from 0.45 to 0.50.File | Dimensione | Formato | |
---|---|---|---|
Dichiarazione_conformità 18-11-2016.pdf
solo utenti autorizzati
Tipologia:
Altro materiale allegato
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.98 MB
Formato
Adobe PDF
|
1.98 MB | Adobe PDF | Contatta l'autore |
Scardapane_Privacy-preserving_2018.pdf
solo gestori archivio
Note: chapter 12
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
190.6 kB
Formato
Adobe PDF
|
190.6 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.