Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.
Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.
Toward a multilevel representation of protein molecules. Comparative approaches to the aggregation/folding propensity problem / Livi, Lorenzo; Giuliani, Alessandro; Rizzi, Antonello. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - STAMPA. - 326:(2016), pp. 134-145. [10.1016/j.ins.2015.07.043]
Toward a multilevel representation of protein molecules. Comparative approaches to the aggregation/folding propensity problem
LIVI, LORENZO;RIZZI, Antonello
2016
Abstract
Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.File | Dimensione | Formato | |
---|---|---|---|
Livi_Toward-multileve_2016.pdf
solo utenti autorizzati
Note: Toward a multilevel representation of protein molecules: Comparative approaches to the aggregation/folding propensity problem
Tipologia:
Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
362.57 kB
Formato
Adobe PDF
|
362.57 kB | Adobe PDF | Contatta l'autore |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.