Bioinformatics toolkit
www.cardiff.ac.uk/biosi/research/biosoft/

Mallard: Why a Reference Sequence?


Whenever base-position-specific data, such as observed percentage differences and expected percentage differences are compared, it is important that only homologous positions are considered.  For this reason, it is necessary to include a reference sequence within multple sequence alignments, to relate calculated values to their correct 'absolute' positions within the 16S rRNA gene.  For this reason it is also important that a sequence representing the entire gene is included and this is not edited in any way.

Current versions of the Pintail and Mallard programs use Escherichia coli K12 U00096 as their reference sequence.  What does this mean in practice for the user?

Pintail program

The Pintail program automatically includes Escherichia coli U00096 when aligning query and subject sequences; no action needs to be taken.

Mallard program

The user needs to include Escherichia coli U00096 as part of the multiple sequence alignment supplied to the program.


Index | Toolkit website

Dr K.E. Ashelford. © 2006, Cardiff University