Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Deviation from Expectation (DE)


The Deviation from Expectation (DE) statistic is the standard deviation of the expected percentage differences from the observed percentage differences as calculated from a pairwise comparison of two sequences using the Pintail algorithm.

Specifically; if Oqs = {o1,o2, .., on} is a series of n observed percentage differences between aligned sequences Sq and Ss, and Eqs = {e1, e2, ..,en} is an equivalent series of n expected percentage differences for the same two sequences, then:



Less formally, the DE value can be viewed as a measure of the extent to which the  phylogenetic variations observed between a pair of sequences along their length, vary from what might be expected if both sequences were free of anomalies.

The greater the DE value, the more likely it is that there is some abnormal phylogenetic difference, somewhere between the two sequences, caused by one (or both) of the sequences being in someway anomalous

For example: a pairwise comparison of chimeric sequence AY297986 with the error-free X96726 using the Pintail algorithm generates the following data:
Figure 1. Plot of observed percentage differences, alongside their corresponding expected percentage differences as generated from a comparison between sequences AY297986 and X96726. Differences between the two sequences were determined from a sliding window of 300 bases, moving 25 bases at a time along the sequences' length.
Plot generated by the Pintail program, also available from  www.cardiff.ac.uk/biosi/research/biosoft/.
Based on an overall evolutionary distance between the two sequences of 10.3%, we might expect differences between the two sequences when aligned to remain fairly constant around this figure along the alignment, as represented by the expected percentage differences (solid gray line in figure).  

However in reality, we find that differences between the two sequences vary greatly with base position (observed percentage differences; red line).  A DE value of 8.5 summarises this deviation.  To put this figure into perspective, 99.9% of type-stain comparisons have a DE value of 5.1 or lower, at this level of overall evolutionary distance, so a DE value of 8.5 is highly unlikely to occur naturally.

This is unsurprising as AY297986 is a known two-fragment chimera with the 5' end derived from a Firmicutes bacterium, and the 3' end of Nitrospira origin; the breakpoint occurring somewhere around base position 700 (relative to the Escherichia coli 16S rRNA gene).


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University