Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.
Supervised approaches for protein function prediction by topological data analysis / Martino, Alessio; Rizzi, Antonello; Mascioli, Fabio Massimo Frattale. - 2018:(2018), pp. 1-8. (Intervento presentato al convegno International Joint Conference on Neural Networks (IJCNN) 2018 tenutosi a Rio de Janeiro; Brazil) [10.1109/IJCNN.2018.8489307].
Supervised approaches for protein function prediction by topological data analysis
Martino, Alessio;Rizzi, Antonello;Mascioli, Fabio Massimo Frattale
2018
Abstract
Topological Data Analysis is a novel approach, useful whenever data can be described by topological structures such as graphs. The aim of this paper is to investigate whether such tool can be used in order to define a set of descriptors useful for pattern recognition and machine learning tasks. Specifically, we consider a supervised learning problem with the final goal of predicting proteins' physiological function starting from their respective residue contact network. Indeed, folded proteins can effectively be described by graphs, making them a useful case-study for assessing Topological Data Analysis effectiveness concerning pattern recognition tasks. Experiments conducted on a subset of the Escherichia coli proteome using two different classification systems show that descriptors derived from Topological Data Analysis - namely, the Betti numbers sequence - lead to classification performances comparable with descriptors derived from widely-known centrality measures, as concerns the protein function prediction problem. Further benchmarking tests suggest the presence of some information despite the heavy compression intrinsic to the protein-to-Betti numbers casting.File | Dimensione | Formato | |
---|---|---|---|
Martino_Supervised_2018.pdf
solo gestori archivio
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
1.46 MB
Formato
Adobe PDF
|
1.46 MB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.