A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein−protein and protein−DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.
Structure-related statistical singularities along protein sequences: a correlation study / Colafranceschi, M; Colosimo, Alfredo; Zbilut, Jp; UVERSKY VN AND GIULIANI, A.. - In: JOURNAL OF CHEMICAL INFORMATION AND MODELING. - ISSN 1549-9596. - STAMPA. - 45:(2005), pp. 183-189. [10.1021/ci049838m]
Structure-related statistical singularities along protein sequences: a correlation study
COLOSIMO, Alfredo;
2005
Abstract
A data set composed of 1141 proteins representative of all eukaryotic protein sequences in the Swiss-Prot Protein Knowledge base was coded by seven physicochemical properties of amino acid residues. The resulting numerical profiles were submitted to correlation analysis after the application of a linear (simple mean) and a nonlinear (Recurrence Quantification Analysis, RQA) filter. The main RQA variables, Recurrence and Determinism, were subsequently analyzed by Principal Component Analysis. The RQA descriptors showed that (i) within protein sequences is embedded specific information neither present in the codes nor in the amino acid composition and (ii) the most sensitive code for detecting ordered recurrent (deterministic) patterns of residues in protein sequences is the Miyazawa-Jernigan hydrophobicity scale. The most deterministic proteins in terms of autocorrelation properties of primary structures were found (i) to be involved in protein−protein and protein−DNA interactions and (ii) to display a significantly higher proportion of structural disorder with respect to the average data set. A study of the scaling behavior of the average determinism with the setting parameters of RQA (embedding dimension and radius) allows for the identification of patterns of minimal length (six residues) as possible markers of zones specifically prone to inter- and intramolecular interactions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.