20.181/Lecture3

From OpenWetWare

Revision as of 08:38, 13 September 2006 by SoniaT (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

Phylogenetic trees

Input: (a multiple sequence alignment)

AATGC
TATGC
GGTGG
ACTCG

Output: tree, an abstract representation of the same data ((1,4),(2,3))

Overview of Approach

(in pseudocode)

for each possible tree:calculate the score of (tree,data)return tree with BEST score

Possible trees

how many trees are there?
- how does the number of possible trees increase with the number of leaves... linearly? ...exponentially?
  - start with the simplest unrooted tree, it has three leaves
  - how many ways are there to add another leaf? there are 3 ways- by adding the new leaf attached to each of the 3 existing branches (ignore the center leaf for now because we want to stick to binary trees)
  - now there are 5 places to add a leaf to a 4-leaf tree
  - every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
- f_trees(n) = f_trees(n-1) * (2n-5)
- f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76
- Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.

Tree Data Structure

For each node, we need to store:
1. names (and sequences?)
2. pointers to its left and right subchildren
we're going to use a built-in dictionary as our data structure

Retrieved from "https://openwetware.org/mediawiki/index.php?title=20.181/Lecture3&oldid=70253"

Navigation menu