Electronic Lab Notebook
- normalizing the log ratios between all the samples went well, copying and pasting the entire dataset was easy, but dragging the selection box over each column was tedious
- learned that putting dollar signs in front of the letter and number designator of cells will 'lock' them so that when dragged or copied, those values will remain constant
- Got slightly confused when copying columns into 'statistics' worksheet due to the fact that I used slightly different columns than stated so it was slightly off. Would be better to just say what content was in each column instead of location.
- Learned that if I say 'paste special' I can simply copy and paste numerical values to equations in cells instead of typing them all out again
- Again getting confused due to location being used as the identifier instead of the name
- Learned how to Autofilter!
- saved entire spreadsheet as well as created separate txt file of the final sheet
Sanity Check Questions
- How many genes have p value < 0.05? - roughly 134 genes
- What about p < 0.01? - roughly 26 genes
- What about p < 0.001? - none
- What about p < 0.0001? - none
- Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there? - roughly 53 genes
- Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there? - roughly 82 genes
- What about an average log fold change of > 0.25 or < -0.25? - 52 genes with a log fold change >.25 and 82 genes with a log fold change <.25
- What criteria did Merrell et al. (2002) use to determine a significant gene expression change? How does it compare to our method? - Their methods were similar except for one part; they decided that the statistically significant change in the level of expression would be at least "a twofold change". While our filter was applied to check the x in Avg_LogFC_all such that 0.25<x<-0.25, the filter advocated by Merrell suggests a more stringent check of 1<x<-1
- Merrell et al. (2002) report that genes with IDs: VC0028, VC0941, VC0869, VC0051, VC0647, VC0468, VC2350, and VCA0583 were all significantly changed in their data. Look these genes up in your spreadsheet? What are their fold changes and p values? Are they significantly changed in our analysis?
- I have two entries for VC0028 for some reason, however both have P values of .18 and .26 with average log fold changes of 1.6 and 1.3 respectively.
- VC0941 also has two entries, again both have P values above .05 with .9 and .5, with average log fold changes of .1 and -.3 respectively.
- VC0869 has five entries with P values of .24, .11, .01, .08, and .06 so only one of these had a P value of less than .05, their average log fold changes were 1.6, 2, 2.25, 1.5, and 2.16 respectively.
- VC0051 has two entries with P values of .12 and .11 with average log fold changes of 1.97 and 1.94 respectively.
- VC0647 has three entries with P values of .03, .06, and .11 so only one entry had a P value less than .05, with average log fold changes of -1.1, -.9, and -1.1 respectively.
- VC0468 has only one entry with a P value of .73 and an average log fold change of -.17.
- VC2350 has only one entry with a P value of .17 and an average log fold change of -2.5. VCA0583 has only one entry with a P value of .45 and an average log fold change of 1.12.
- In conclusion, only two instances of these genes had a P value of lower than .05 which means that according to our analysis the rest of them could have simply changed due to chance. So no, our results do not match with the paper
Data for GenMAPP
Excel Spreadsheet of Data