Evolutionary Distance
Evolutionary, or phylogenetic distance is a measure of evolutionary
divergence between two homologous sequences. More precisely, it
is the number of residue substitions that have occurred between two
sequences, since they diverged from their common ancestor, expressed as
either a proportion or percentage. Evolutionary distance can be
estimated in a number of ways, the simplest being uncorrected distance.
Uncorrected distance
The simplest estimation of distance is to count the number of base
mismatches m between the two sequences when aligned, then present this
value as a proportion, or percentage, of the total alignment length
n.
D = m/n (1)
However, things get more complicated when gaps are introduced into the
alignment; as it stands, equation (1) treats gaps in the same way as
residues. Gaps represent insertions and/or deletions within one
or both of the sequences. Should such events be given equal
weight as base substitutions? Should insertions and deletions
have a different weighting to substitutions? Or should they be
ignored completely, and only base substitutions be
considered? There is no right answer to this question, and
at different times different approaches have been followed. The
following equation allows for gaps to be treated in different ways
depending on the value of the gap penalty variable.
D = m/([n - g] + [g * penalty]) (2)
Where g is the total number of gaps within the alignment's consensus
sequence. Further sophistication may be introduced into equation
(2), for example by scoring partial matches as a fraction of 1.
Nevertheless, equation (2) is still likely to underestimate true
evolutionary distance since it does not take into account the
possiblity of multiple substitutions occurring at the same residue
position.