The possibility to face pattern recognition problems directly on structured domains (e.g., multimedia data, strings, graphs) is fundamental to the effective solution of many interesting applications. In this paper, we deal with a clustering problem defined in the string domain, focusing on the problem of cluster representation in data domains where only a dissimilarity measure can be fixed. To this aim, we adopt the MinSOD (Minimum Sum of Distances) cluster representation technique, which defines the representative as the element of the cluster minimizing the sum of dissimilarities from all the other elements in the considered set. Since the precise computation of the MinSOD have a high computational cost, we propose a suboptimal procedure consisting in computing the representative of the cluster considering only a reduced pool of samples, instead of the whole set of objects in the cluster. We have carried out some tests in order to ascertain the sensitivity of the clustering procedure with respect to the number of samples in the pool used to compute the MinSOD. Results show a good robustness of the proposed procedure. The implementations are available as part of the SPARE library, which is available as an open source project.

On the Problem of Modeling Structured Data with the MinSOD Representative / DEL VESCOVO, Guido; Livi, Lorenzo; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - In: INTERNATIONAL JOURNAL OF COMPUTER THEORY AND ENGINEERING. - ISSN 1793-8201. - 6:1(2014), pp. 9-14. [10.7763/ijcte.2014.v6.827]

On the Problem of Modeling Structured Data with the MinSOD Representative

DEL VESCOVO, Guido;LIVI, LORENZO;FRATTALE MASCIOLI, Fabio Massimo;RIZZI, Antonello
2014

Abstract

The possibility to face pattern recognition problems directly on structured domains (e.g., multimedia data, strings, graphs) is fundamental to the effective solution of many interesting applications. In this paper, we deal with a clustering problem defined in the string domain, focusing on the problem of cluster representation in data domains where only a dissimilarity measure can be fixed. To this aim, we adopt the MinSOD (Minimum Sum of Distances) cluster representation technique, which defines the representative as the element of the cluster minimizing the sum of dissimilarities from all the other elements in the considered set. Since the precise computation of the MinSOD have a high computational cost, we propose a suboptimal procedure consisting in computing the representative of the cluster considering only a reduced pool of samples, instead of the whole set of objects in the cluster. We have carried out some tests in order to ascertain the sensitivity of the clustering procedure with respect to the number of samples in the pool used to compute the MinSOD. Results show a good robustness of the proposed procedure. The implementations are available as part of the SPARE library, which is available as an open source project.
2014
software library; minsod representative; clustering strings
01 Pubblicazione su rivista::01a Articolo in rivista
On the Problem of Modeling Structured Data with the MinSOD Representative / DEL VESCOVO, Guido; Livi, Lorenzo; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - In: INTERNATIONAL JOURNAL OF COMPUTER THEORY AND ENGINEERING. - ISSN 1793-8201. - 6:1(2014), pp. 9-14. [10.7763/ijcte.2014.v6.827]
File allegati a questo prodotto
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/535292
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact