Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Mallard


Mallard (Fig. 1) is a program for identifying anomalous 16S rRNA gene sequences within multiple sequence alignments.  It can be used to screen clone libraries or any other equivalent collection of 16S rRNA gene sequences, of up to 1,000 sequences in number.  Mallard is released under the terms of the GNU General Public Licence and is freely available from www.cardiff.ac.uk/biosi/research/biosoft/.
Figure 1. Mallard screenshot (running on MS Windows XP).
The nature of DNA sequencing is such that artifacts are inevitably introduced and, for 16S rDNA, common errors are chimera formation, poor sequencing, and poor sequence assembly.  The public repositories are compromised with these unreliable sequences.  In addition, any new sequence generated is potentially anomalous. 

Researchers need to be as confident as possible that the sequences they are handling are reliable.  Undetected, anomalies can lead to misleading phylogenetic reconstruction, give a false impression of species diversity, and confuse attempts at species classification and identification.

Mallard has been developed as a tool to simultaneously screen a number of 16S rDNA sequences for anomalies.  It is based on the Pintail algorithm which determines whether a query 16S rDNA sequence is likely to contain anomalies through a pairwise comparison with a suitable error-free subject sequence.  Query sequences are judged to be anomalous if the resulting Deviation from Expectation (DE) statistic exceeds a pre-determined value.  Mallard works by generating DE values for each pairwise comparison within a supplied multiple sequence alignment, and plotting the resulting values against a simplified measure of evolutionary distance; DE values from comparisons between error-free sequences will tend to cluster together, whilst DE values associated with anomalous sequences will appear as plot outliers (Fig. 2).  By identifying these outliers, the program is able to determine which sequences are anomalous.
Figure 2. A typical Mallard plot.  In this example a multiple alignment of Verrucomicrobia 16S rDNA sequences is being analysed.  DE values resulting from comparisons of error-free Verrucomicrobia sequences cluster together relatively close to the x-axis and fall below the red dotted cut-off line.  In contrast, comparisons involving anomalous sequences produce outlier data points above the cut-off line.  From these identified outliers the responsible anomalous sequences can be identified.


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University