In this paper we propose a classifier for generalized sequences that is conceived in the granular computing framework. The classification system processes the input sequences of objects by means of a suited interplay among dissimilarity and clustering based techniques. The core data mining engine retrieves information granules that are used to represent the input sequences as feature vectors. Such a representation allows to deal with the original sequence classification problem through standard pattern recognition tools. We have evaluated the generalization capability of the system in an interesting case study concerning the protein folding problem. In the considered dataset, the entire E. Coli proteome was screened as for the prediction of protein relative solubility on a pure amino acids sequence basis. We report the analysis of the dataset considering different settings, showing interesting test set classification accuracy results. The developed system consents also to extract knowl
A dissimilarity-based classifier for generalized sequences by a granular computing approach / Rizzi, Antonello; Possemato, Francesca; Livi, Lorenzo; Azzurra, Sebastiani; Alessandro, Giuliani; FRATTALE MASCIOLI, Fabio Massimo. - (2013), pp. 1-8. (Intervento presentato al convegno 2013 International Joint Conference on Neural Networks, IJCNN 2013 tenutosi a Dallas; United States nel 4 August 2013 through 9 August 2013) [10.1109/ijcnn.2013.6707041].
A dissimilarity-based classifier for generalized sequences by a granular computing approach
RIZZI, Antonello;POSSEMATO, FRANCESCA;LIVI, LORENZO;FRATTALE MASCIOLI, Fabio Massimo
2013
Abstract
In this paper we propose a classifier for generalized sequences that is conceived in the granular computing framework. The classification system processes the input sequences of objects by means of a suited interplay among dissimilarity and clustering based techniques. The core data mining engine retrieves information granules that are used to represent the input sequences as feature vectors. Such a representation allows to deal with the original sequence classification problem through standard pattern recognition tools. We have evaluated the generalization capability of the system in an interesting case study concerning the protein folding problem. In the considered dataset, the entire E. Coli proteome was screened as for the prediction of protein relative solubility on a pure amino acids sequence basis. We report the analysis of the dataset considering different settings, showing interesting test set classification accuracy results. The developed system consents also to extract knowlI documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.