SoniaSandbox: Difference between revisions

Revision as of 11:14, 20 September 2006

Temporary notes for 20.181, Lecture 5

Sankoff Downpass Algorithm

The tree we were working on last time:
Now we're going to try to code up the downpass.

So now we want to know: what is the sequence at this ancestor node, that we can't observe? We'll try all four possibilities, and calculate the penatly associated with each of those possibilities are.
Remember that the good solution with the fibonacci series was to pass all the information back each time so that we don't have to repeat calculations!
We left off at the root node, where we said there were 64 possibilities.
So let's go thru this by hand and try to write pseudo code for a function that will do the same thing.
What we're going to do, which might seem odd at first, is to pass the node's identity (nodeSeq) (A,C,G,or T) into the function as if we know it. Our function will look at this one possibility and return one column of the cost vector. Our code will look at each possibility, one by o

one.

Remember this cost matrix from Lecture4?

def sankoff(tree,nodeSeq):min = inf #initialization if tree is a leaf: #stop condition if tree['data'] == nodeSeq: return 0 return inf for leftSeq in [A,C,G,T]: #now we're at a node with two children for leftSeq in [A,C,G,T]: #we have to try all 16 possibilities here sum= cost(nodeSeq,leftSeq,rightSeq) #cost is some function that looks up the cost in the table sum += sankoff(tree['left'],leftSeq) #we have to remember to pass the cost along! sum += sankoff(tree['right'],rightSeq) if (sum < min): min = sum return mindef sankoffDownpass(tree): #we don't know what the sequence is at the rootfor seq in [A,C,G,T]: if sankoff(tree,seq) < min: min = sankoff(tree,seq) minSeq = seq return min

By structuring our code this way, we're only passing down the tree the best score obtained at each subtree (is this true??)
But now we're going to take a look back at Fitch Downpass and see if it was really necessary to pass the cost back (??)

Think about what the difference is between the root and the other internal nodes:

the root doesn't have any parents -- only children.

What if we added an outgroup, such that what was the root before is now an internal node
- would this change our guess about nucleotides are likely at the old root?
  yes!

Information flows down tree, and we use it to make inference about older and older internal nodes. That's why the inference at the root the best possible guess we can make; it has all the information from everywhere on the tree. But the inferences we've written at the internal nodes are NOT YET correct- these internal nodes only have part of the information on the tree

Fitch's upPass algorithm

We'll pass the information back up from the root node to the other internal nodes in order to decide what their most likely sequences were!

computer science jargon: three ways to traverse trees
1. pre-order: do some operation, then call left subtree, then call right subtree
  like typing into a calculator: + 5 3
2. in-order: L, op, R
  like typing into a calculator: 5 + 3
3. post-order: L, R, op
  like typing into a calculator: 5 3 +

@@ Line 12: / Line 12: @@
 #We left off at the root node, where we said there were 64 possibilities.
 #So let's go thru this by hand and try to write pseudo code for a function that will do the same thing.
-#What we're going to do, which might seem odd at first, is to pass the node's identity (nodeSeq) (A,C,G,or T) into the function as if we know it. Our function will look at this one possibility and return one column of the cost vector. Our code will look at each possibility, one by one.
+#What we're going to do, which might seem odd at first, is to pass the node's identity (nodeSeq) (A,C,G,or T) into the function as if we know it. Our function will look at this one possibility and return one column of the cost vector. Our code will look at each possibility, one by o
+one.
+[[Image:L4_costmatrix.png|right|Remember this cost matrix from Lecture4?]]
 <blockquote><tt>
 def sankoff(tree,nodeSeq):
@@ Line 36: / Line 38: @@
 :::minSeq = seq
 :return min
-[[Image:L4_costmatrix.png|right|Remember this cost matrix from Lecture4?]]
 </tt> </blockquote>
+[[Image:L5_tree.jpg|400px|left]]
 *By structuring our code this way, we're only passing down the tree the best score obtained at each subtree (is this true??)
 *But now we're going to take a look back at '''Fitch Downpass''' and see if it was really necessary to pass the cost back (??)
+[[Image:L5_tree_outgroup.jpg|400px|left]]
 *Think about what the difference is between the root and the other internal nodes:
 :the root doesn't have any parents -- only children.
-:what if
+*What if we added an outgroup, such that what ''was'' the root before is now an internal node
+**would this change our guess about nucleotides are likely at the old root?
+**:yes!
 *Information flows down tree, and we use it to make inference about older and older internal nodes. That's why the inference at the root the best possible guess we can make; it has '''all''' the information from '''everywhere''' on the tree. But the inferences we've written at the internal nodes are NOT YET correct- these internal nodes only have ''part'' of the information on the tree

SoniaSandbox: Difference between revisions

Revision as of 11:14, 20 September 2006

Sankoff Downpass Algorithm

Fitch's upPass algorithm

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools