Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Anomaly confirmation protocol


It's good practice to check that anomalous sequences, identified as such by programs like Mallard, are unambiguously anomalous.  The following protocol is a suggested method for unambiguous identification of anomalies.
  1. Blast search the putatively anomalous sequence (hereafter referred to as the query) against all publically available 16S rDNA sequences to find known nearest phylogenetic neighbours.  The online NCBI BlastN facility is recommended for this purpose.
  2. Choose a suitable near neighbour of the query for comparison (hereafter referred to as first subject).  An ideal* first subject will be a high (bit) scoring sequence, from a different gene library to that of the query (most easily achieved by selecting a record from a different research group to that of the query).
  3. Compare the query with the first subject using the Pintail program and assess the output for evidence of an anomaly.
  4. If an anomaly is revealed, confirm that the first subject is error-free by comparing it to a further near neighbour sequence (second subject), using Pintail.  Again ideally*, the source of the second subject should be different from that of the first subject and the query in order to arrive at an unambiguous result.  If Pintail reveals an anomaly between the first and second subject, then further investigation will be necessary to clarify the situation before returning to the original query.
  5. As a final confirmatory step, the query should be compared with second subject.
*It can be seen that, ideally, only three comparisons are necessary per query sequence to unambiguously identify an anomaly.  In practice this is not always possible, either because a lack of suitable database entries means that the only nearest neighbours available are those generated by the same research group and thus probably from the same gene library, or because the best available nearest neighbour is only distantly related to the query.  Under such circumstances, a number of comparisons may be necessary before a settled opinion can be arrived at.

An example

AY354817 has been identified as a possible chimera.  So show unambiguously that the sequence is anomalous, the following steps are carried out:

An NCBI BlastN search with  AY354817 identifies the following public records as nearest neighbours (Fig. 1).
Figure 1. First 10 nearest neighbours to AY354817 within the public repositories, as identified by an NCBI BlastN search.  Note that the first record listed is AY354817 itself.

                                                                   Score     E
Sequences producing significant alignments: (Bits) Value

gi|34100240|gb|AY354817.1| Uncultured alpha proteobacterium c... 2008 0.0
gi|34100237|gb|AY354814.1| Uncultured alpha proteobacterium c... 1170 0.0
gi|34100236|gb|AY354813.1| Uncultured alpha proteobacterium c... 1170 0.0
gi|34100246|gb|AY354823.1| Uncultured alpha proteobacterium c... 1152 0.0
gi|51492496|gb|AY697909.1| Uncultured Rhodobacteraceae bacter... 1148 0.0
gi|2707380|gb|U70680.1| Unidentified alpha proteobacterium OM... 1148 0.0
gi|34100242|gb|AY354819.1| Uncultured alpha proteobacterium c... 1128 0.0
gi|56069465|gb|AY794179.1| Uncultured alpha proteobacterium c... 1118 0.0
gi|62959000|gb|DQ009313.1| Uncultured marine bacterium clone ... 1118 0.0
gi|62958986|gb|DQ009299.1| Uncultured marine bacterium clone ... 1118 0.0
The highest scoring record (after AY354817 itself) is AY354814.  However, this record has been generated by the same research group and hence is likely to be from the same gene library.  AY697909, in contrast, is from a different research group and so this is selected as first subject.

Comparing AY354817 with AY697909, using Pintail, confirms the presence of a chimera-like anomaly (Fig. 2).
Figure 2. Variation in % difference between AY354817 (1013 nt) and AY697909 (1425 nt), determined with a 300 base window, moving 25 bases at a time along the sequences' length.

To establish that AY697909 is not the chimera, this record is compared with another near neighbour, AY794179.  No anomaly is detected (Fig. 3).
Figure 3. Variation in % difference between AY697909 (1425 nt) and AY794179 (2971 nt), determined with a 300 base window, moving 25 bases at a time along the sequences' length.

As a final confirmatory step AY354817 is also compared with AY794179. Once more a chimeric profile is seen (Fig. 4).
Figure 4. Variation in % difference between AY354817 (1013 nt) and  AY794179 (2971 nt), determined with a 300 base window, moving 25 bases at a time along the sequences' length.

In conclusion, AY354817 is unambiguously shown to be anomalous and the nature of the Pintail plots in Figures 2 and 4 does strongly suggest a chimera.  However, to confirm an observed anomaly as being unambiguously chimeric further steps need to be taken.


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University