SoniaSandbox: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 4: Line 4:
==Sankoff Downpass Algorithm==
==Sankoff Downpass Algorithm==


The tree we were working on last time:
The tree we were working on last time:<br>
Now we're going to try to code up the downpass.
Now we're going to try to code up the downpass.
[[Image:L4_tree.jpg]]
[[Image:L4_tree.jpg]]
Line 42: Line 42:
:for seq in [A,C,G,T]:
:for seq in [A,C,G,T]:
::if sankoff(tree,seq) < min:
::if sankoff(tree,seq) < min:
:::
:::min = sankoff(tree,seq)
:::minSeq = seq
:return min


<\blockquote><\tt>
<\blockquote><\tt>
*By structuring our code this way, we're only passing down the tree the best score obtained at each subtree (is this true??)
*But now we're going to take a look back at '''Fitch Downpass''' and see if it was really necessary to pass the cost back (??)
*Think about what the difference is between the root and the other internal nodes:
:the root doesn't have any parents -- only children.
<blockquote><tt>
<\blockquote><\tt>




{{{code block?}}}
{{{code block?}}}

Revision as of 07:44, 20 September 2006

Temporary notes for 20.181, Lecture 5


Sankoff Downpass Algorithm

The tree we were working on last time:
Now we're going to try to code up the downpass.

  1. So now we want to know: what is the sequence at this ancestor node, that we can't observe? We'll try all four possibilities, and calculate the penatly associated with each of those possibilities are.
  2. Remember that the good solution with the fibonacci series was to pass all the information back each time so that we don't have to repeat calculations!
  3. We left off at the root node, where we said there were 64 possibilities.
  4. So let's go thru this by hand and try to write pseudo code for a function that will do the same thing.
  5. (???)The odd thing I'm going to do starting out is to say we know the sequence and have the function pass that back, ie it will return one column of this vector of possible answers at the root. The reason for this will become clear as we go along.


def sankoff(tree,nodeSeq) min = inf #initialization

if tree is a leaf: #stop condition
if tree['data'] == nodeSeq:
return 0
return inf
  1. now we're at a node with two children
  2. they might be leaves, they might not
  3. we have to try all 16 possibilities here
for leftSeq in [A,C,G,T]:
for leftSeq in [A,C,G,T]:
sum= cost(nodeSeq,leftSeq,rightSeq)
  1. cost is some function that looks up the cost in the table
  2. what are we missing?
  3. passing the cost along!
sum = cost(nodeSeq,leftSeq,rightSeq)
sum += sankoff(tree['left'],leftSeq)
sum += sankoff(tree['right'],rightSeq)
if (sum < min):
min = sum
return min

def sankoffDownpass(tree):

  1. we don't know what the sequence is at the root
for seq in [A,C,G,T]:
if sankoff(tree,seq) < min:
min = sankoff(tree,seq)
minSeq = seq
return min

<\blockquote><\tt>

  • By structuring our code this way, we're only passing down the tree the best score obtained at each subtree (is this true??)
  • But now we're going to take a look back at Fitch Downpass and see if it was really necessary to pass the cost back (??)
  • Think about what the difference is between the root and the other internal nodes:
the root doesn't have any parents -- only children.

<\blockquote><\tt>


{{{code block?}}}