Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Expected percentage differences


An expected percentage difference is the percentage of base mismatches between two aligned sequences, within a sampling window (w) of specified size, that one would expect if both sequences are free of errors (that is not anomalous).  It is, in effect, the expected evolutionary distance between two sequences within w.

If w is a sliding window, moving a fixed number of bases at a time along alignment Sqs (formed from sequences Sq and Ss) and n is the total number of sampling positions, the set of expected percentage differences Eqs = {ei : e1, e2, ..., en} can be viewed of as a summary of the local fluctations in base-mismatches between the two sequences that we would expect along the alignment.  In contrast, observed percentage differences Oqs = {oi : o1, o2, ..., on} summarises local fluctations that are observed.  Comparing expected with observed percentage differences, through generation of the Deviation from Expectation statistic, enables a decision to be made on whether both Sq and Ss are error-free or one (or both) is anomalous.

Generating expected percentage differences

To generate expected percentage differences for Sq and Ss one needs to know (i) the window size w and step size b used to generate the observed percentage differences Oqs, (ii) the overall evolutionary distance between Sq and Ss as represented by the mean of the observed percentage differences, and (iii) the location of the hypervariable regions within the 16S rRNA gene, as mapped by probability distribution Q. This information is used as follows:

  1. By sliding a window of size w with step b along the probability distribution Q, the average probability ai for each window wi is determined.  The resulting data set Qav = {ai : a1, a2, ..., an} is a set of average probabilities that can now be related directly to Oqs.
  2. Calculate fitting coefficient α as the mean of Oqs divided by the mean of Qav.
  3. Convert Qav to Eqs by multiplying each element of Qav by α (that is, ei = ai * α).  Multiplying each element of Qav by α has the effect of giving the resulting data set Eqs the same mean as Oqs.

Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University