User:Jarle Pahr/Phylogenetics
Much of the content on this page is based on Chapters 6 and 7 in Essential Bioinformatics by Jin Xiong.
Concepts/glossary
- Bootstrapping: A statistical technique that tests the sampling errors of a phylogenetic tree.
- Homoplasy: The obscuring of evolutionary distance which occurs because of several consecutive mutations at the same nucleotide positions.
- Among-site variation/among-site heterogenity: Differences in evolutionary rates among nucletoide/amino acid positions. Generally, a portion of sites are variant and the rest are invariant. The distribution of variant sites forllows a gamma distribution.
Newick format: A tree representation format using linear nested parantheses. Taxas are separated by commas. For scaled trees, branch lengths are indicated immediately after the taxon name. Examples:
(((B,C),A),(D,E))
Phylogenetic markers
- 16S RNA
- RpoB
- GyrB
- EF-Tu
- pgk
- dnaK
- 16S–23S ITS
Software
MEGA 5
MrBayes
DIVERGE: http://www.ncbi.nlm.nih.gov/pubmed/11934757
Substitution models
Statistical models used to correct homoplasy are called substitution models or evolutionary models.
Jukes-Cantor model:
Assumes that all nucleotides are substituted with equal probability (unrealistic).
- Can only handle reasonably closely related sequences.
Formula:
d_AB = -(3/4) ln [1-(4/3)p_AB]
d_AB: Evolutionary distance between sequences A and B. p_AB: Observed sequence distance, measured by proportion of substitutions over the entire length of the alignment.
Formula corrected for among-site variation:
d_AB = (3/4)alpha [(1-(4/3)p_AB)^-1/alpha] -1 ??
(Formula is incomplete in Xiong's book. Need to check this out.)
alpha: The gamma correction factor.
Kimura model:
Mutation rates for transitions and transversion are assumed to be different (more realisti than Jukes-Cantor)
Formula:
d_AB = -(1/2) ln(1- 2 p_ti - p_tv) - (1/4) ln (1-2 p_tv)
p_ti: Observed frequency for transition. p_tv: Observed frequency for transversion.
Formula adjusted for among-site variation:
d_AB = (alpha/2)[(1- 2pti - ptv)^-1/alpha - (1/2)(1-2ptv)^-1/alpha - 1/2]
alpha: The gamma correction factor.
Kimura model for protein distance:
d = -ln(1- p -0.2p^2)
p: Observed pairwise distance between two sequences.
More advanced models: TN93, HKY, GT3. Take more parameters into consideration, but not normally used in practice (complex calculation, high variance).
Three estimation methods
Clustering-based methods:
Unweighted Pair Group Method Using Arithmetic Average (UPGMA):
- The simplest clustering method.
- Basic assumption: All taxa evolve at a constant rate and are equally distant from the root ("molecular clock" hypothesis). Unlikely to hold for real data.
- Fast calculation speed.
Neighbour Joining (JI)
- The most widely used tree estimation method.
http://www.ncbi.nlm.nih.gov/pubmed/3447015
http://en.wikipedia.org/wiki/Neighbor_joining
Optimality-based methods:
Fitch-Margoliash (FM)
Minimum Evolution (ME)
Character-based methods:
Maximum Parsimony (MP)
- One of the first methods applied to phylogenetic tree construction.
Maximum Likelyhood (ML)
Bayesian Inference (BI)
Tree representation
- Phylogram (scaled tree): The branch lengths represent the amount of evolutionary divergence.
- Cladogram (unscaled tree): Branch lengths have no phyologenetic meaning.
Reclassifications
Examples from the literature of (proposals for) reclassifications of taxonomies:
Links
http://peter.unmack.net/molecular/index.html
http://www.kuleuven.be/aidslab/phylogenybook/home.html
http://asserttrue.blogspot.no/2013/07/do-it-yourself-phylogenetic-trees.html
Phylogeny.fr: http://www.phylogeny.fr/
Bibliography
Articles:
A daily-updated tree of (sequenced) life as a reference for genome research: http://www.nature.com/srep/2013/130618/srep02015/full/srep02015.html
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0062510
Evaluation of general 16S ribosomal RNA gene PCR primers for classical and next-generation sequencing-based diversity studies: http://nar.oxfordjournals.org/content/41/1/e1.full?sid=e66b42ac-a309-47cf-8cd1-94e1229a098e
Molecular phylogenetics: State of the art methods for looking into the past. Trends Genet. 17:262-72.
Books:
Phylogenetic Trees Made Easy - a how-to manual. Fourth edition. Barry G Hall.
Fundamentals of Molecular Evolution. Sunderland, MA: Sinauer Associates.
The Phylogenetic Handbook: http://www.amazon.com/dp/0521730716/ref=rdr_ext_tmb
Jin Xiong: Essential Bioinformatics