Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Evolutionary Distance


Evolutionary, or phylogenetic distance is a measure of evolutionary divergence between two homologous sequences.  More precisely, it is the number of residue substitions that have occurred between two sequences, since they diverged from their common ancestor, expressed as either a proportion or percentage.  Evolutionary distance can be estimated in a number of ways, the simplest being uncorrected distance.

Uncorrected distance

The simplest estimation of distance is to count the number of base mismatches m between the two sequences when aligned, then present this value as a proportion, or percentage, of the total alignment length n. 

D = m/n  (1)     

However, things get more complicated when gaps are introduced into the alignment; as it stands, equation (1) treats gaps in the same way as residues.  Gaps represent insertions and/or deletions within one or both of the sequences.  Should such events be given equal weight as base substitutions?  Should insertions and deletions have a different weighting to substitutions?  Or should they be ignored completely, and only base substitutions be considered?   There is no right answer to this question, and at different times different approaches have been followed.  The following equation allows for gaps to be treated in different ways depending on the value of the gap penalty variable.

D = m/([n - g] + [g * penalty])  (2)

Where g is the total number of gaps within the alignment's consensus sequence.  Further sophistication may be introduced into equation (2), for example by scoring partial matches as a fraction of 1.  Nevertheless, equation (2) is still likely to underestimate true evolutionary distance since it does not take into account the possiblity of multiple substitutions occurring at the same residue position. 


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University