
|
More formally...
- Input Sq - the
query sequence to
be
checked for
anomalies.
- Input Ss - the subject sequence, a
reliable,
error-free sequence*.
- Globally align Sq
with Ss
to generate alignment Sqs.
- Move a sampling window, of size w, b
bases at a
time along Sqs
and at each position i determine the percentage of mismatched
bases oi
within window wi where i is 1 ≤ i
≤ n, and n
is the total number of windows.
- Oqs
= {oi: o1, o2,
..., on} is
the set of observed
percentage
differences detected between Sq and Ss.
The corresponding expected percentage
differences Eqs
{ei: e1, e2,
..., en} are calculated from
the mean of Oqs.
- Subtracting ei from oi
for each
position i generates
a series of
deviations, the standard deviation of which quantifies the overall
deviation of Oqs from Eqs.
This is the Deviation from
Expectation
(DE) statistic.
*Note that algorithm accuracy is
dependent on the
choice of subject sequence. A good subject is both error-free and
as evolutionary close to the query as possible. Anomalies become
progressively harder to detect, the greater the overall evolutionary
distance between query and subject. |
|