Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.

Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.

Toward a multilevel representation of protein molecules. Comparative approaches to the aggregation/folding propensity problem / Livi, Lorenzo; Giuliani, Alessandro; Rizzi, Antonello. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - STAMPA. - 326:(2016), pp. 134-145. [10.1016/j.ins.2015.07.043]

Toward a multilevel representation of protein molecules. Comparative approaches to the aggregation/folding propensity problem

LIVI, LORENZO;RIZZI, Antonello
2016

Abstract

Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.
2016
Computational Intelligence methods are typically designed according to the assumption that the input space is essentially a vector space. When departing from vector-based pattern representations many theoretical and practical problems arise, which are mostly due to the absence of an intuitive geometric interpretation of the data. However, since such representations could offer additional insights when used in real-world applications of data-driven inference systems, their exploitation is also a practical and convenient choice. Here we apply several state-of-the-art classification methods for non-geometric data, with the aim to compare different representations of the proteins gathered from Niwa et al. (2009) [35]. Such representations include sequences of objects and labeled (contact) graphs enriched with chemico-physical attributes. The experiment performed by Niwa et al. provides the unique possibility to analyze the relative aggregation/folding propensity of the elements of the entire Escherichia coli (E. coli) proteome in a cell-free, standardized microenvironment. By this comparison, we are able to identify also some interesting general properties of proteins. Notably, (i) we suggest a threshold around 250 residues discriminating “easily foldable” from “hardly foldable” molecules consistent with other independent experiments, and (ii) we highlight the relevance of contact graph spectra for folding behavior discrimination and characterization of the E. coli solubility data. The soundness of the experimental results presented in this paper is proved by the statistically relevant relationships discovered among the chemico-physical description of proteins and the developed cost matrix of substitution that we used in the various discrimination systems.
Classification of structured data; protein aggregation; protein folding; sequence-structure relation; artificial intelligence; software; control and systems engineering; theoretical computer science; computer science applications; computer vision and pattern recognition
01 Pubblicazione su rivista::01a Articolo in rivista
Toward a multilevel representation of protein molecules. Comparative approaches to the aggregation/folding propensity problem / Livi, Lorenzo; Giuliani, Alessandro; Rizzi, Antonello. - In: INFORMATION SCIENCES. - ISSN 0020-0255. - STAMPA. - 326:(2016), pp. 134-145. [10.1016/j.ins.2015.07.043]
File allegati a questo prodotto
File Dimensione Formato  
Livi_Toward-multileve_2016.pdf

solo utenti autorizzati

Note: Toward a multilevel representation of protein molecules: Comparative approaches to the aggregation/folding propensity problem
Tipologia: Versione editoriale (versione pubblicata con il layout dell'editore)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 362.57 kB
Formato Adobe PDF
362.57 kB Adobe PDF   Contatta l'autore

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11573/847734
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 12
  • ???jsp.display-item.citation.isi??? 11
social impact