Wilke:Molecular Evolution: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(New page: == The Basics== Evolutionary rate (dN/dS) analyses involve a basic pipeline: sequence alignment, phylogenetic inference, and finally evolutionary rate inference. When dealing with protein ...)
 
Line 8: Line 8:
Commonly aligners include mafft, muscle, and prank. While prank is probably the most accurate, it is also very time consuming. We recommend using mafft for sequence alignments.
Commonly aligners include mafft, muscle, and prank. While prank is probably the most accurate, it is also very time consuming. We recommend using mafft for sequence alignments.


To align sequences in mafft, we additionally recommend using the "--auto" option. This will allow mafft to select the optimal alignment algorithm to use on your data.
To align sequences in mafft, we additionally recommend using the "--auto" option. This will allow mafft to select the optimal alignment algorithm to use on your data. Mafft accepts a variety of file formats, including fasta and phylip (sequential and/or interleaved).
 
The infile should contain unaligned '''amino acid''' sequences, and aligned sequences will be sent to the outfile name provided.
<pre>
mafft --auto infile > outfile
</pre>

Revision as of 11:58, 10 July 2013

The Basics

Evolutionary rate (dN/dS) analyses involve a basic pipeline: sequence alignment, phylogenetic inference, and finally evolutionary rate inference. When dealing with protein coding sequences, always align using amino acid data in order to preserve codons. Then, back-translate into nucleotide data, as is required for the final step in the pipeline. Phylogenies may be made either with amino acid or nucleotide data, although an amino acid tree may be slightly more accurate.

Additionally note that a minimum of 10 sequences are recommended to achieve well-supported results in an evolutionary rates analysis.


Sequence Alignment

Commonly aligners include mafft, muscle, and prank. While prank is probably the most accurate, it is also very time consuming. We recommend using mafft for sequence alignments.

To align sequences in mafft, we additionally recommend using the "--auto" option. This will allow mafft to select the optimal alignment algorithm to use on your data. Mafft accepts a variety of file formats, including fasta and phylip (sequential and/or interleaved).

The infile should contain unaligned amino acid sequences, and aligned sequences will be sent to the outfile name provided.

mafft --auto infile > outfile