Evan Montz Week 8: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(Electronic Notebook)
(Sanity check 1)
Line 22: Line 22:
*Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
*Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
*Both were then uploaded to this page at the top.
*Both were then uploaded to this page at the top.
==Sanity Check: Number of genes significantly changed==
*I clicked on the A1 cell and applied the Autofilter.
*The autofilter was applied to the pvalue column according to the following criterion:
p<0.05 = 948 genes fit into this criteria
p<0.01 = 235 genes fit into this criteria
p<0.001 = 24 genes fit into this criteria
p<0.0001 = 2 genes fit into this criteria
*The pvalue criteria was returned to p<0.05
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria
*The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria
p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria
*From what I can tell, our experiment was very similar to that of Merrell et al. (2002).  The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel.  Assuming that both programs do the same thing, both experiments were very similar.

Revision as of 19:11, 24 October 2010

Excel File Spreadsheet

Evan Montz Excel Spreadsheet

Text File Spreadsheet

Evan Montz Text Spreadsheet

Electronic Notebook

  • I first downloaded the file, opened it, and re-saved it to make it unique.
  • I inserted the "scaled_centered" worksheet into the file.
  • Then I copied all the raw data into the new worksheet and added the rows for Average and Standard Deviation.
  • Using the "=Average()" and "STDEV()" functions, I was able to calculate the values for the log ratios for each chip.
    • It was noticed that there were some cells within the raw data that had no data in them. This yielded a minor error saying that some cells had no data. This error was ignored.
  • Next I added in the "scaled_centered" columns for each run of each patient.
  • Now using the "=(B4-$B$2)/$B$3" equation along with the drag down feature, this value was calculated for each cell for every run.
    • NOTE: It is vital to use the "$" symbol while writing this equation because without it, when we used the drag down feature, excel would continue to reference different cells and that was not what we desired.
  • Next, I created the new worksheet titled "statistics"
  • In this worksheet, I copied the first column along with all the columns that were scaled and centered.
  • Then I created the three average columns for each patient and using the =AVERAGE command along with the drag down feature again, I computed the values for each cell for all three patients.
  • Next, I created the column that was to display the average of all three averages. The averages of these values were taken using the same method as the previous step, except I took the average of the three averages rather than the scaled and centered data.
  • Next, I created the Tstat column and computed the results using the function: =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)) where the number of replicates was 3 for our case. The drag down feature was again used to obtain the values for all cells.
  • The last column that was created was the Pvalue column. The values were calculated using the function: =TDIST(ABS(R2),degrees of freedom,2) where the degrees of freedom was 2 for our case. The values were again dragged down to get the rest of the numbers.
  • Finally, I created the "forGenMAPP" worksheet.
  • Here, the data was manipulated to get the correct number of decimal places for the different values. The statistical columns were also cut and shifted to the beginning of the spreadsheet. I also added the "system code" column after the ID column and added an "N" to each cell using the drag down method.
  • Lastly, the file was re-saved as an .xls file and then saved as a .txt file.
  • Both were then uploaded to this page at the top.

Sanity Check: Number of genes significantly changed

  • I clicked on the A1 cell and applied the Autofilter.
  • The autofilter was applied to the pvalue column according to the following criterion:
p<0.05 = 948 genes fit into this criteria
p<0.01 = 235 genes fit into this criteria
p<0.001 = 24 genes fit into this criteria
p<0.0001 = 2 genes fit into this criteria
  • The pvalue criteria was returned to p<0.05
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than zero.
p<0.05 and "Avg_LogFC_all">0 = 352 genes fit into this criteria
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change less than zero.
p<0.05 and "Avg_LogFC_all"<0 = 596 genes fit into this criteria
  • The "Avg_LogFC_all" column was then filtered to display all genes with an average log fold change greater than 0.25 or less than -0.25 (This is more realistic).
p<0.05 and "Avg_LogFC_all">0.25 = 339 genes fit into this criteria
p<0.05 and "Avg_LogFC_all"<-0.25 = 579 genes fit into this criteria
  • From what I can tell, our experiment was very similar to that of Merrell et al. (2002). The main difference I see is that they used Statistical Analysis for Microarrays (SAM) program and we used Excel. Assuming that both programs do the same thing, both experiments were very similar.