20.181/Lecture3: Difference between revisions

Revision as of 08:57, 13 September 2006

Phylogenetic trees

Input: (a multiple sequence alignment)

AATGC
TATGC
GGTGG
ACTCG

Output: tree, an abstract representation of the same data ((1,4),(2,3))

Overview of Approach

(in pseudocode)

for each possible tree:calculate the score of (tree,data)return tree with BEST score

Possible trees

how many trees are there?
- how does the number of possible trees increase with the number of leaves... linearly? ...exponentially?
  - start with the simplest unrooted tree, it has three leaves
  - how many ways are there to add another leaf? there are 3 ways- by adding the new leaf attached to each of the 3 existing branches (ignore the center leaf for now because we want to stick to binary trees)
  - now there are 5 places to add a leaf to a 4-leaf tree
  - every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
- f_trees(n) = f_trees(n-1) * (2n-5)
- f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76
- Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.

Trees in Python

For each node, we need to store:
1. names (and sequences?)
2. pointers to its left and right subtrees (its "children")

Data Structure

We're going to use a built-in dictionary as our data structure
- example tree = ((a,b),c)

tree1 = {'name':'a','left':None,'right':None} #for the "subtree" that consists of leaf a tree2 = {'name':'internal','left':tree1 ...

well, we could do that, referencing our dictionary defined above. OR we could just avoid naming all the variables, and nest the definition of tree1 inside the bigger tree (tree2 above)

tree2 = {'name':'internal','left':{'name':'a','left':None,'right':None},'right':{'name':'b','left':None,'right':None}}

Parsing Function

functions for dealing with this sort of data structure will be recursive

def leaves(tree):if (tree['name' != 'internal): return [tree['name'] # very important that this returns a list return leaves(tree['left']) + leaves(tree['right']) # "+" concatenates listsdef tree2string(tree): #a function to print out your tree in newick formatprint '(' + left + ',' + right + ')'

You'll be writing functions like these on the next homework. This code will probably have to be modified slightly to work in the correct context.

@@ Line 28: / Line 28: @@
 ***every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
 ** <tt>f_trees(n) = f_trees(n-1) * (2n-5)
-** for n leaves,  f_trees(n) = (2n-3)!!
+** for n leaves,  f_trees(n) = (2n-3)!!   <- that double factorial sign means to skip every other number
 ** f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76 </tt>
 **Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.
-==Tree Data Structure==
+==Trees in Python==
 *For each node, we need to store:
 *#names (and sequences?)
-*#pointers to its left and right subchildren
+*#pointers to its left and right subtrees (its "children")
-*we're going to use a built-in dictionary as our data structure
+===Data Structure===
+*We're going to use a built-in dictionary as our data structure
+**example tree =  ((a,b),c)
+<blockquote><tt>
+tree1 = {'name':'a','left':None,'right':None} #for the "subtree" that consists of leaf a <br>
+tree2 = {'name':'internal','left':tree1 ...
+</tt></blockquote>
+*well, we ''could'' do that, referencing our dictionary defined above. OR we could just avoid naming all the variables, and nest the definition of tree1 inside the bigger tree (tree2 above)
+<blockquote><tt>
+tree2 = {'name':'internal','left':{'name':'a','left':None,'right':None},'right':{'name':'b','left':None,'right':None}}
+</tt></blockquote>
+===Parsing Function===
+*functions for dealing with this sort of data structure will be recursive
+<blockquote><tt>
+def leaves(tree):
+:if (tree['name' != 'internal):
+::return [tree['name']  # very important that this returns a list
+:return leaves(tree['left']) + leaves(tree['right']) # "+" concatenates lists
+----
+def tree2string(tree):  #a function to print out your tree in newick format
+:print '(' + left + ',' + right + ')'
+</tt></blockquote>
+*You'll be writing functions like these on the next homework. This code will probably have to be modified slightly to work in the correct context.

20.181/Lecture3: Difference between revisions

Revision as of 08:57, 13 September 2006

Contents

Phylogenetic trees

Overview of Approach

Possible trees

Trees in Python

Data Structure

Parsing Function

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools