Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Mallard: Plotting DE Values


Generated DE values, are plotted against corresponding mean percentage difference values as a means of highlighting potential anomalous sequences within the supplied sequence data.

For example, analysing sequences derived from members of the Verrucomicrobia, results in the plot shown in Fig. 1.  Each data point represents a separate pairwise comparison.  In this example, 97 sequences were analysed, resulting in 4,656 separate comparisons.  
Figure 1. Analysis of Verrucomicrobia 16S rRNA gene sequences, as downloaded from the Ribosomal Database Project-II (RDP) website.  Each sequence has been compared with each other; the resulting DE values are then plotted against mean percentage difference values.  The data points clustering close to the x-axis represent comparisons between error-free sequences, whilst the outliers represent comparisons involving likely anomalies
Most of the data points in Fig. 1 cluster relatively close to the x-axis, and reflect comparisons between error-free sequences.  However, there are also some unusually high values indicating that some comparisons involve anomalous sequences . To clarify which data points should be treated as outliers, a suitable cut-off line is superimposed over the data (Fig. 2). Datapoints above the cut-off line are identified as outliers.  Sequences responsible for these outliers can now be identified.
Figure 2. A cut-off line is superimposed over the data points to identify which values should be treated as outliers.  Data points below the cut-off line are judged to result from comparisons between error-free sequences.  Data points above the line signify anomalous sequences.

 Note...

Right mouse-clicking the graph image calls up a pop-up menu that allows the user to view the raw data, print the plot or save the plot image.


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University