Evan Montz Week 8: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Evan Montz (talk | contribs) (Electronic Notebook) |
Evan Montz (talk | contribs) (Sanity check 1) |
||
Line 22: | Line 22: | ||
*Lastly, the file was re-saved as an .xls file and then saved as a .txt file. | *Lastly, the file was re-saved as an .xls file and then saved as a .txt file. | ||
*Both were then uploaded to this page at the top. | *Both were then uploaded to this page at the top. | ||
==Sanity Check: Number of genes significantly changed== | |||
*I clicked on the A1 cell and applied the Autofilter. | |||
*The autofilter was applied to the pvalue column according to the following criterion: | |||
p<0.05 = 948 genes fit into this criteria | |||
p<0.01 = 235 genes fit into this criteria | |||
p<0.001 = 24 genes fit into this criteria | |||
p<0.0001 = 2 genes fit into this criteria | |||
*The pvalue criteria was returned to p<0.05 | |||
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero. | |||
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria | |||
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero. | |||
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria | |||
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic). | |||
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria | |||
p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria | |||
*From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar. |
Revision as of 19:11, 24 October 2010
Excel File Spreadsheet
Text File Spreadsheet
Electronic Notebook
- I first downloaded the file, opened it, and re-saved it to make it unique.
- I inserted the "scaled_centered" worksheet into the file.
- Then I copied all the raw data into the new worksheet and added the rows for Average and Standard Deviation.
- Using the "=Average()" and "STDEV()" functions, I was able to calculate the values for the log ratios for each chip.
- It was noticed that there were some cells within the raw data that had no data in them. This yielded a minor error saying that some cells had no data. This error was ignored.
- Next I added in the "scaled_centered" columns for each run of each patient.
- Now using the "=(B4-$B$2)/$B$3" equation along with the drag down feature, this value was calculated for each cell for every run.
- NOTE: It is vital to use the "$" symbol while writing this equation because without it, when we used the drag down feature, excel would continue to reference different cells and that was not what we desired.
- Next, I created the new worksheet titled "statistics"
- In this worksheet, I copied the first column along with all the columns that were scaled and centered.
- Then I created the three average columns for each patient and using the =AVERAGE command along with the drag down feature again, I computed the values for each cell for all three patients.
- Next, I created the column that was to display the average of all three averages. The averages of these values were taken using the same method as the previous step, except I took the average of the three averages rather than the scaled and centered data.
- Next, I created the Tstat column and computed the results using the function: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) where the number of replicates was 3 for our case. The drag down feature was again used to obtain the values for all cells.
- The last column that was created was the Pvalue column. The values were calculated using the function: =TDIST(ABS(R2),degrees of freedom,2) where the degrees of freedom was 2 for our case. The values were again dragged down to get the rest of the numbers.
- Finally, I created the "forGenMAPP" worksheet.
- Here, the data was manipulated to get the correct number of decimal places for the different values. The statistical columns were also cut and shifted to the beginning of the spreadsheet. I also added the "system code" column after the ID column and added an "N" to each cell using the drag down method.
- Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
- Both were then uploaded to this page at the top.
Sanity Check: Number of genes significantly changed
- I clicked on the A1 cell and applied the Autofilter.
- The autofilter was applied to the pvalue column according to the following criterion:
p<0.05 = 948 genes fit into this criteria p<0.01 = 235 genes fit into this criteria p<0.001 = 24 genes fit into this criteria p<0.0001 = 2 genes fit into this criteria
- The pvalue criteria was returned to p<0.05
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria
- The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria
- From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar.