A statistical analysis designed to deal with the problem of identifying rare or abundant "words" of arbitrary length in genomic fragments is presented. Our approach has the novelty of taking into account the statistical role of the presence of shorter words nested into longer ones and of introducing a Bayesian correction to minimize the effects of statistical fluctuations and of possible mistakes in genomic data. The method is successfully used in a thorough analysis of the abundance of short nucleotide sequences in the Escherichia coli genome.
An improved method for detection of words with unusual occurrence frequency in nucleotidic sequences / Colosimo, Alfredo; S., Morante; G. C., Rossi. - In: JOURNAL OF THEORETICAL BIOLOGY. - ISSN 0022-5193. - STAMPA. - 165:(1993), pp. 659-672. [10.1006/jtbi.1993.1212]
An improved method for detection of words with unusual occurrence frequency in nucleotidic sequences
COLOSIMO, Alfredo;V. Parisi;
1993
Abstract
A statistical analysis designed to deal with the problem of identifying rare or abundant "words" of arbitrary length in genomic fragments is presented. Our approach has the novelty of taking into account the statistical role of the presence of shorter words nested into longer ones and of introducing a Bayesian correction to minimize the effects of statistical fluctuations and of possible mistakes in genomic data. The method is successfully used in a thorough analysis of the abundance of short nucleotide sequences in the Escherichia coli genome.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.