20.181/Lecture3: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
No edit summary
Line 28: Line 28:
***every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
***every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
** <tt>f_trees(n) = f_trees(n-1) * (2n-5)
** <tt>f_trees(n) = f_trees(n-1) * (2n-5)
** for n leaves,  f_trees(n) = (2n-3)!!  
** for n leaves,  f_trees(n) = (2n-3)!!   <- that double factorial sign means to skip every other number
** f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76 </tt>
** f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76 </tt>
**Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.
**Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.


==Tree Data Structure==
==Trees in Python==
*For each node, we need to store:
*For each node, we need to store:
*#names (and sequences?)
*#names (and sequences?)
*#pointers to its left and right subchildren
*#pointers to its left and right subtrees (its "children")
*we're going to use a built-in dictionary as our data structure
===Data Structure===
*We're going to use a built-in dictionary as our data structure
**example tree =  ((a,b),c)
<blockquote><tt>
tree1 = {'name':'a','left':None,'right':None} #for the "subtree" that consists of leaf a <br>
tree2 = {'name':'internal','left':tree1 ... 
</tt></blockquote>
 
*well, we ''could'' do that, referencing our dictionary defined above. OR we could just avoid naming all the variables, and nest the definition of tree1 inside the bigger tree (tree2 above)
<blockquote><tt>
tree2 = {'name':'internal','left':{'name':'a','left':None,'right':None},'right':{'name':'b','left':None,'right':None}}
</tt></blockquote>
 
===Parsing Function===
*functions for dealing with this sort of data structure will be recursive
<blockquote><tt>
 
def leaves(tree):
:if (tree['name' != 'internal):
::return [tree['name']  # very important that this returns a list
:return leaves(tree['left']) + leaves(tree['right']) # "+" concatenates lists
 
----
def tree2string(tree):  #a function to print out your tree in newick format
:print '(' + left + ',' + right + ')'
 
</tt></blockquote>
 
*You'll be writing functions like these on the next homework. This code will probably have to be modified slightly to work in the correct context.

Revision as of 08:57, 13 September 2006

Phylogenetic trees

Input: (a multiple sequence alignment)

  1. AATGC
  2. TATGC
  3. GGTGG
  4. ACTCG

Output: tree, an abstract representation of the same data ((1,4),(2,3))

Overview of Approach

(in pseudocode)

for each possible tree:

calculate the score of (tree,data)

return tree with BEST score

Possible trees

  • how many trees are there?
    • how does the number of possible trees increase with the number of leaves... linearly? ...exponentially?
      • start with the simplest unrooted tree, it has three leaves
      • how many ways are there to add another leaf? there are 3 ways- by adding the new leaf attached to each of the 3 existing branches (ignore the center leaf for now because we want to stick to binary trees)
      • now there are 5 places to add a leaf to a 4-leaf tree
      • every time you put a new branch down, you gain 2 more places to put a new branch: one from splitting an existing branch into two parts, and one from the new branch itself
    • f_trees(n) = f_trees(n-1) * (2n-5)
    • for n leaves, f_trees(n) = (2n-3)!! <- that double factorial sign means to skip every other number
    • f_trees(n=10)= 34*10^6 f_trees(n=50) = 2.7*10^76
    • Enumerating trees is not possible, so we are going to look only at a small number of possible trees. We need a search strategy. And the optimal search strategy will depend on the topology of the space you're looking at.

Trees in Python

  • For each node, we need to store:
    1. names (and sequences?)
    2. pointers to its left and right subtrees (its "children")

Data Structure

  • We're going to use a built-in dictionary as our data structure
    • example tree = ((a,b),c)

tree1 = {'name':'a','left':None,'right':None} #for the "subtree" that consists of leaf a
tree2 = {'name':'internal','left':tree1 ...

  • well, we could do that, referencing our dictionary defined above. OR we could just avoid naming all the variables, and nest the definition of tree1 inside the bigger tree (tree2 above)

tree2 = {'name':'internal','left':{'name':'a','left':None,'right':None},'right':{'name':'b','left':None,'right':None}}

Parsing Function

  • functions for dealing with this sort of data structure will be recursive

def leaves(tree):

if (tree['name' != 'internal):
return [tree['name'] # very important that this returns a list
return leaves(tree['left']) + leaves(tree['right']) # "+" concatenates lists

def tree2string(tree): #a function to print out your tree in newick format

print '(' + left + ',' + right + ')'

  • You'll be writing functions like these on the next homework. This code will probably have to be modified slightly to work in the correct context.