SoniaSandbox: Difference between revisions
Line 7: | Line 7: | ||
:(from [http://www.scs.fsu.edu/~beerli/BSC-5936/08-31-05/Lecture_Notes_2.pdf Peter Beerli's website]) | :(from [http://www.scs.fsu.edu/~beerli/BSC-5936/08-31-05/Lecture_Notes_2.pdf Peter Beerli's website]) | ||
[[Image:UpPass_subtree.jpg| | [[Image:UpPass_subtree.jpg|150px|right]] | ||
ancestor = a | ancestor = a | ||
Line 17: | Line 17: | ||
S<sub>x</sub>: the downpass set we got to <br> | S<sub>x</sub>: the downpass set we got to <br> | ||
[[Image:UpPass.jpg| | [[Image:UpPass.jpg|250px]] | ||
==Revisit overall strategy== | ==Revisit overall strategy== | ||
Line 39: | Line 33: | ||
==ML intro== | ==ML intro== | ||
*examples of a ML estimator: | |||
for normally | *#for normally distributed random var X, X(bar), the mean of the data you observe, is a ML estimator of the mean of the distribution they were drawn from | ||
best fit line is a ML estimator | *#A best fit line thru data is a ML estimator. | ||
[[Image:ML_plot.jpg]] | |||
==Probability Refresher== | ==Probability Refresher== | ||
[[Image:Bayes_diagram.jpg|250px|right]] | |||
total area of a box = 1 | |||
<blockquote><tt> | |||
p(A)= | |||
p(B)= | |||
p(A,B)= | |||
p(A|B)= | |||
deriving bayes' rule | |||
</tt></blockquote> | |||
Line 84: | Line 76: | ||
==Jukes-Cantor== | ==Jukes-Cantor== | ||
cost matrix | cost matrix | ||
<blockquote><tt> | |||
if x == y : | if x == y : | ||
[eqn] | [JC eqn] | ||
if x != y : | if x != y : | ||
[eqn] | [JC eqn] | ||
</tt></blockquote> | |||
equations | equations | ||
draw the little graph thing | draw the little graph thing |
Revision as of 14:50, 2 October 2006
Temp notes for Lecture 7
quick comment on upPass
- not a popular algorithm
- (you won't be tested on it)
- but here's the correct way to do it:
- (from Peter Beerli's website)
ancestor = a
parent = p, node we're looking at
children = q,r
Fx: the upPass set we want to get to
Sx: the downpass set we got to
Revisit overall strategy
for all possible trees:
- compute score (tree)
return best tree
Scoring functions
- max parsimony (fewest mutations)
- generalized parsimony (Sankoff: weighted mutation costs)
- Maximum Likelihood
ML intro
- examples of a ML estimator:
- for normally distributed random var X, X(bar), the mean of the data you observe, is a ML estimator of the mean of the distribution they were drawn from
- A best fit line thru data is a ML estimator.
Probability Refresher
total area of a box = 1
p(A)= p(B)= p(A,B)= p(A|B)=
deriving bayes' rule
ML in trees
What is the best tree T given the data we have D ? p(T|D) is what we want to maximize not intuitively obvious how we want to do that, so use bayes law
p(D) is a constant ... don't have to worry about it
- what is the a priori probability of the tree ?
- Without looking at the data, do we have a way of saying any tree is more likely than another one if they don't have any data associated with them ?
- No... not really
so what we're left maximizing is just p(D|T)
Tree now consists of topology AND distances
p(D|T) = p(x->A|d_1) * p(x->y|d_2) * p(y->G|d_3) * p(y->G|d_4)
p(A U B) = p(A) + p(B) - p(A,B) p(A I B) = p(A)*p(B)
these are independent events (that's why you multiply)
Jukes-Cantor
cost matrix
if x == y : [JC eqn] if x != y : [JC eqn]
equations draw the little graph thing
Evolutionary Model
gives up likelihood of D|T (need branch lengths)
downPass for ML
compute L(p|q,r,d)
q , r = likelihood of the two subtrees and the distance to them
Temp notes for Lecture 6
Exams and HWs
- In-class midterm on 10/11
- studying for the exam:
- don't memorize lecture notes,
- more important to be able to work through the problems
- understand all the homeworks and you'll be prepared
Homeworks coming up
- HW5: Due next wednesday
- downPass, maximum likelihood
- HW6:
- search tree space
Up Pass
- If we know what the best answer is at the root- all of out other internal nodes aren't necessarily the best guess. We need an upPass algorithm that passes information from the root, back up to the leaves.
- For this example, we are dealing with one column of the sequence alignment
def upPass(tree,parent):
- if tree is a leaf: #(stop)
- return
- i123 = parent <intersect> left <intersect> right
- i12 = left <intersect> right
- i23 = parent <intersect> left
- i13 = parent <intersect> right
- if i123 != None:
- tree['data'] = i123
- else: #possible to simplify if empty set doesn't add with +=
- if i12 != None:
- data += i12
- if i13 != None:
- data += i13
- if i23 != None:
- data += i23
- if data == None:
- data = parent + left + right # union here
- tree['data']= data
- We are doing this for the most general case, where we considered:
- what should we do when there is no intersect ?
- When you start upPass that can't be the case because every parent should have an interset with its children if you've done downPass correctly....BUT this situation does arise as you traverse up the tree because you eliminate possibilities at a parent before you test for intersect with the children.