Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Chimera Confirmation Protocol


A chimera is one of several anomalies likely to be exhibited by a 16S rRNA gene sequence.  Chimeras have a characteristic profile when viewed with the Pintail program.  This is not however, unambiguous evidence that the sequence is actually chimeric.  To unambiguously confirm a chimera it is necessary to identify, as far as possible, the phylogenetic identity of the original DNA templates from which the chimera was constructed. To achieve this, the following steps are recommended. 

For clarity, the following protocol will assume a two fragment chimera; that is, a chimeric sequence composed of two phylogenetically distinct regions only.  Three (or more) fragment chimeras can be treated in an equivalent way.
  1. Compare the putative chimera (query) with an error-free nearest neighbour sequence (subject) using the Pintail program.  A characteristic chimeric profile, with the plot reflecting two distinctly different evolutionary distances between query and subject, should be observed. 
  2. Within the Pintail program, identify and copy that region of the query that is most distantly related to the subject.
  3. Blast search the copied fragment against all publically available 16S rDNA sequences to find its nearest phylogenetic neighbours.  The online NCBI BlastN facility is recommended for this purpose.
  4. If the original query is a chimera, the phylogeny of the nearest neighbours will be somewhat different from that of the subject.  Select a suitable sequence and, using Pintail, compare that with the original unedited query.  A chimera is confirmed if the resulting Pintail plot is roughly the mirror-image of that obtained originally.
  5. As a final step, confirm that the nearest neighbour chosen is itself an error-free sequence by comparing this with its own nearest neighbour.
How definitive a conclusion that a sequence is indeed chimeric will depend on (i) how phylogenetically distinct the two regions of the sequence are, and (ii) the phylogenetic range of the 16S rDNA sequences held in the public repositories. 

An example

AY354817 has been identified as having an unambiguous anomaly which appears to be chimeric in nature.   To confirm that AY354817 is a chimera, the following steps are carried out.

Using the Pintail program, AY354817 is compared with an error-free near neighbour, AY697909 (representing an Alphaproteobacteria bacterium), and this results in a characteristic chimera-like profile (Fig. 1).
Figure 1. Variation in % difference between AY354817 (1013 nt) and AY697909 (1425 nt), determined with a 300 base window, moving 25 bases at a time along the sequences' length. Profile is characteristic of a two fragment chimera, with the break-point occurring at roughly base position 600.
Within the Pintail program, the 5' end of AY354817 is now progressively edited until only the more phylogenetically distant region, relative to AY697909, remains (Fig. 2).
Figure 2a. Initial comparison of AY354817 and AY697909, this time using a 100 base sampling window (for greater accuracy).


Figure 2b. Same comparison, but with 325 bases removed from 5' and of
AY354817.  Note how the expected percentage difference line (solid gray) has moved upwards, reflecting the increasing dominance of the 3' end fragment in determining overall evolutionary distance between query and subject sequences.


Figure 2c.  Same comparison, but with a further 215 bases removed
from 5' and of AY354817.  Note how the expected percentage difference line now coincides with the observed percentage difference line. 

After editing, a 473 base fragment of AY354817 remains.  An NCBI BlastN search with this fragment identifies the following public records as nearest neighbours (Fig. 3).
Figure 3. First 10 nearest neighbours, to the 473 base fragment from AY354817, within the public repositories, as identified by an NCBI BlastN search.  Note that the first record listed is AY354817 itself.

                                                                   Score     E
Sequences producing significant alignments: (Bits) Value

gi|34100240|gb|AY354817.1| Uncultured alpha proteobacterium c... 938 0.0
gi|33391956|gb|AY344418.1| Unidentified bacterium clone K2-30... 729 0.0
gi|62958803|gb|DQ009116.1| Uncultured marine bacterium clone ... 670 0.0
gi|38426800|gb|AY457135.1| Uncultured Bacteroidetes bacterium... 607 1e-170
gi|45439578|gb|AY547770.1| Uncultured bacterium clone B126-4.... 583 2e-163
gi|34100296|gb|AY354873.1| Uncultured Bacteroidetes bacterium... 525 4e-146
gi|38426799|gb|AY457134.1| Uncultured Bacteroidetes bacterium... 505 4e-140
gi|83026088|gb|DQ289539.1| Uncultured Flavobacteria bacterium... 492 5e-136
gi|73672783|gb|DQ153141.1| Uncultured Bacteroidetes bacterium... 478 8e-132
gi|73672777|gb|DQ153135.1| Uncultured Bacteroidetes bacterium... 478 8e-132
Note that record annotations suggest that these records represent Bacteriodetes bacteria.  Comparing the full length AY354817 sequence with AY344418 using Pintail, results in another chimera-like profile (Fig. 4), mirroring that seen in previously (Fig. 1).
Figure 4. Variation in % difference between AY354817 (1013 nt) and AY344418 (1493 nt), determined with a 300 base window, moving 25 bases at a time along the sequences' length.

Checking AY344418 against its nearest neigbour confirms it to be error free.  Using the Ribosome Database Project (RDP-II) Sequence Match facility confirms this sequence as belonging to a Bacteriodetes bacterium.

In conclusion, AY354817 is confirmed as a two-fragment chimera with the 5' end most similar to the Alphaproteobacteria sequence AY697909 and the 3' end most similar to the Bacteriodetes sequence AY344418.


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University