User:Matthew Whiteside/Notebook/Ortholuge Development/2009/01/30

From OpenWetWare

Jump to: navigation, search
Ortholuge Development Project Main project page
Previous entry      Next entry

Task 1.2

Bias above the cutoff. Is it consistent?

Jeong Eun produced some more figures and emailed them to me. The prop.orthologs.pdf figure shows that the bias is roughly consistent above the cutoff.


Email: Hi Matthew,

I'd like to show that dataset 8 performs better than dataset 7 by the plot attached "fitted null vs. true null.pdf" for dataset 7 and 8. As you can see, with 25% (p1) of paralogs in the dataset 7, the fitted null using the normal density fitting method in OL.locfdr is shifted from the true null, and hence the not-well fitted null distribution in dataset 7 makes the incorrect fdr estimation. However, the dataset 8 shows the fitted null is closed to the true null. Also, the fitted tail is below the true tail, so the procedure thinks it has fewer orthologs than it really does, leading to a negative bias. (bias=the difference between the estimated proportion of orthologs and the true proportion of orthologs in the bin that contains the cutoff) Therefore, dataset8 outperforms dataset7.

Also, please see the file "prop.ortholgos.pdf" for plots of the expected and true proportion of orthologs as a function of the bin location for dataset 8. It shows negative bias for other bins to the right of the cutoff (around 0.65). (Note that I set the x-axes from the major mode of the mixture distribution rather than from zero.)

I hope this answers your question. Jeong Eun


Image:Prop.orthologs.pdf Image:Fitted null vs. true null.pdf


Task 3.1

What is the ideal outgroup?

This also makes it clearer about the ideal outgroup. If you notice in dataset 8 you get a "shift" of the ig2 paralogs even in ratio 2 (despite ig2 gene being in numerator and demoninator). This is likely because the outgroup is so distant, that changing to a paralog in demoninator does not change distance (property of phylo distances?). This shift though is necessary to prevent the ratio 2 paralogs from influencing the ssd distribution estimation when coming up with a r2 cutoff.



Personal tools