The possibility to face pattern recognition problems directly on structured domains (e.g., multimedia data, strings, graphs) is fundamental to the effective solution of many interesting applications. In this paper, we deal with a clustering problem defined in the string domain, focusing on the problem of cluster representation in data domains where only a dissimilarity measure can be fixed. To this aim, we adopt the MinSOD (Minimum Sum of Distances) cluster representation technique, which defines the representative as the element of the cluster minimizing the sum of dissimilarities from all the other elements in the considered set. Since the precise computation of the MinSOD have a high computational cost, we propose a suboptimal procedure consisting in computing the representative of the cluster considering only a reduced pool of samples, instead of the whole set of objects in the cluster. We have carried out some tests in order to ascertain the sensitivity of the clustering procedure with respect to the number of samples in the pool used to compute the MinSOD. Results show a good robustness of the proposed procedure. The implementations are available as part of the SPARE library, which is available as an open source project.
On the Problem of Modeling Structured Data with the MinSOD Representative / DEL VESCOVO, Guido; Livi, Lorenzo; FRATTALE MASCIOLI, Fabio Massimo; Rizzi, Antonello. - In: INTERNATIONAL JOURNAL OF COMPUTER THEORY AND ENGINEERING. - ISSN 1793-8201. - 6:1(2014), pp. 9-14. [10.7763/ijcte.2014.v6.827]
On the Problem of Modeling Structured Data with the MinSOD Representative
DEL VESCOVO, Guido;LIVI, LORENZO;FRATTALE MASCIOLI, Fabio Massimo;RIZZI, Antonello
2014
Abstract
The possibility to face pattern recognition problems directly on structured domains (e.g., multimedia data, strings, graphs) is fundamental to the effective solution of many interesting applications. In this paper, we deal with a clustering problem defined in the string domain, focusing on the problem of cluster representation in data domains where only a dissimilarity measure can be fixed. To this aim, we adopt the MinSOD (Minimum Sum of Distances) cluster representation technique, which defines the representative as the element of the cluster minimizing the sum of dissimilarities from all the other elements in the considered set. Since the precise computation of the MinSOD have a high computational cost, we propose a suboptimal procedure consisting in computing the representative of the cluster considering only a reduced pool of samples, instead of the whole set of objects in the cluster. We have carried out some tests in order to ascertain the sensitivity of the clustering procedure with respect to the number of samples in the pool used to compute the MinSOD. Results show a good robustness of the proposed procedure. The implementations are available as part of the SPARE library, which is available as an open source project.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.