In this paper we present a general strategy designed to study the occurrence frequency distributions of oligonucleotides in DNA coding segments and to deal with the problem of detecting possible patterns of genomic compositional inhomogeneities and disuniformities. Identifying specific tendencies or peculiar deviations in the distributions of the effective occurrence frequencies of oligonucleotides, with respect to what can be a priori expected, is of the greatest importance in biology. Differences between expected and actual distributions may in fact suggest or confirm the existence of specific biological mechanisms related to them. Similarly, a marked deviation in the occurrence frequency of an oligonucleotide may suggest that it belongs to the class of so-called "DNA signal (target) sequences". The approach we have elaborated is innovative in various aspects. Firstly, the analysis of the genomic data is carried out in the light of the observation that the distribution of the four nucleotides along the coding regions of the genoma is biased by the existence of a well-defined "reading frame". Secondly, the "experimental" numbers found by counting the occurrences of the various oligonucleotide sequences are appropriately corrected for the many kinds of mistakes and redundancies present in the available genetic Data Bases. A methodologically significant further improvement of our approach over the existing searching strategies is represented by the fact that, in order to decide whether or not the (corrected) "experimental" value of the occurrence frequency of a given oligonucleotide is within statistical expectations, a measure of the strength of the selective pressure, having acted on it in the course of the evolution, is assigned to the sequence, in a way that takes into account both the value of the "experimental" occurrence frequency of the sequence and the magnitude of the probability that this number might be the result of statistical fluctuations. If the strength of the selective pressure evaluated in this fashion turns out to be sufficiently large, the corresponding sequence will be considered to have an occurrence frequency beyond expectations and, hence, to be statistically and biologically interesting.
A study of nucleotide occurrence distributions in DNA coding segments / Castrignano', T. .; Colosimo, Alfredo; Morante, S.; Parisi, V.; Rossi, G. C.. - In: JOURNAL OF THEORETICAL BIOLOGY. - ISSN 0022-5193. - STAMPA. - 184:(1997), pp. 461-469.
A study of nucleotide occurrence distributions in DNA coding segments
COLOSIMO, Alfredo;PARISI V.;
1997
Abstract
In this paper we present a general strategy designed to study the occurrence frequency distributions of oligonucleotides in DNA coding segments and to deal with the problem of detecting possible patterns of genomic compositional inhomogeneities and disuniformities. Identifying specific tendencies or peculiar deviations in the distributions of the effective occurrence frequencies of oligonucleotides, with respect to what can be a priori expected, is of the greatest importance in biology. Differences between expected and actual distributions may in fact suggest or confirm the existence of specific biological mechanisms related to them. Similarly, a marked deviation in the occurrence frequency of an oligonucleotide may suggest that it belongs to the class of so-called "DNA signal (target) sequences". The approach we have elaborated is innovative in various aspects. Firstly, the analysis of the genomic data is carried out in the light of the observation that the distribution of the four nucleotides along the coding regions of the genoma is biased by the existence of a well-defined "reading frame". Secondly, the "experimental" numbers found by counting the occurrences of the various oligonucleotide sequences are appropriately corrected for the many kinds of mistakes and redundancies present in the available genetic Data Bases. A methodologically significant further improvement of our approach over the existing searching strategies is represented by the fact that, in order to decide whether or not the (corrected) "experimental" value of the occurrence frequency of a given oligonucleotide is within statistical expectations, a measure of the strength of the selective pressure, having acted on it in the course of the evolution, is assigned to the sequence, in a way that takes into account both the value of the "experimental" occurrence frequency of the sequence and the magnitude of the probability that this number might be the result of statistical fluctuations. If the strength of the selective pressure evaluated in this fashion turns out to be sufficiently large, the corresponding sequence will be considered to have an occurrence frequency beyond expectations and, hence, to be statistically and biologically interesting.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.