Carmen E. Castaneda: Week 11: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 127: Line 127:


I have 5 different genes that share the smallest p value of 0.0000 at different time intervals.
I have 5 different genes that share the smallest p value of 0.0000 at different time intervals.
At t120, there is YML038C and YIR024C


At t12h, there is YDR211W and YBL056W
*At t120, there is [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YML038C YML038C] and [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YIR024C YIR024C].


At t60h, there is YML132W.
*At t12h, there is [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YDR211W YDR211W] and [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YBL056W YBL056W].
 
*At t60h, there is [http://www.yeastgenome.org/cgi-bin/locus.fpl?locus=YML132W YML132W].

Revision as of 18:12, 4 April 2011

Microarray Data Analysis

Background

This is a list of steps required to analyze DNA microarray data.

  1. Quantitate the fluorescence signal in each spot
  2. Calculate the ratio of red/green fluorescence
  3. Log transform the ratios
  4. Normalize the ratios on each microarray slide
    • Steps 1-4 are performed by the GenePix Pro software.
    • You will perform the following steps:
  5. Normalize the ratios for a set of slides in an experiment
  6. Perform statistical analysis on the ratios
  7. Compare individual genes with known data
    • Steps 5-7 are performed in Microsoft Excel
  8. Pattern finding algorithms (clustering)
  9. Map onto biological pathways
    • We will use software called STEM for the clustering and mapping
  10. Create mathematical model of transcriptional network

Experimental Design

For the Schade data, the timepoints are t0, t10, t30, t120, t12h (12 hours), and t60 (60 hours) of cold shock at 10°C.

  • Begin by recording in your wiki the number of replicates for each time point in your data. For the group assigned to the Schade data, compare the number of replicates with what is stated in the Materials and Methods for the paper. Is it the same? If not, how is it different?

There are 3 replicates for t0, 7 replicates for t10, 6 replicates for t30, 4 replicates for t120, 4 replicates for t12h, and 6 replicates for t60h.

Normalize the ratios for a set of slides in an experiment

Why is this important?

Perform statistical analysis on the ratios

Record the number of replacements made in your wiki page.

  • I made zero replacements in this step.

Sanity Check: Number of genes significantly changed

  • Answer these questions for each timepoint in your dataset.
    • How many genes have p value < 0.05?
  • For t0, there are 141 genes have a p value < 0.05.
  • For t10, there are 264 genes have a p value < 0.05.
  • For t30, there are 284 genes have a p value < 0.05.
  • For t120, there are 480 genes have a p value < 0.05.
  • For t12h, there are 387 genes have a p value < 0.05.
  • For t60h, there are 374 genes have a p value < 0.05.
    • What about p < 0.01?
  • For t0, there are 15 genes have a p value < 0.01.
  • For t10, there are 42 genes have a p value < 0.01.
  • For t30, there are 54 genes have a p value < 0.01.
  • For t120, there are 96 genes have a p value < 0.01.
  • For t12h, there are 87 genes have a p value < 0.01.
  • For t60h, there are 73 genes have a p value < 0.01.
    • What about p < 0.001?
  • For t0, there are 1 genes have a p value < 0.001.
  • For t10, there are 3 genes have a p value < 0.001.
  • For t30, there are 3 genes have a p value < 0.001.
  • For t120, there are 7 genes have a p value < 0.001.
  • For t12h, there are 8 genes have a p value < 0.001.
  • For t60h, there are 8 genes have a p value < 0.001.
    • What about p < 0.0001?
  • For t0, there are 0 genes have a p value < 0.0001.
  • For t10, there are 0 genes have a p value < 0.0001.
  • For t30, there are 0 genes have a p value < 0.0001.
  • For t120, there are 3 genes have a p value < 0.0001.
  • For t12h, there are 3 genes have a p value < 0.0001.
  • For t60h, there are 2 genes have a p value < 0.0001.


Perform this correction and determine whether and how many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction.

  • For t0, there are 0 genes have a p value < 0.05.
  • For t10, there are 0 genes have a p value < 0.05.
  • For t30, there are 0 genes have a p value < 0.05.
  • For t120, there are 1 genes have a p value < 0.05.
  • For t12h, there are 0 genes have a p value < 0.05.
  • For t60h, there are 0 genes have a p value < 0.05.


  • The "AvgLogFC" tells us the magnitude of the gene expression change and in which direction. Positive values are increases relative to the control; negative values are decreases relative to the control. For the timepoint that had the greatest number of genes significantly changed at p < 0.05, answer the following:
    • Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria?
  • For t0, there are 215 genes that meet these two criteria.
  • For t10, there are 259 genes that meet these two criteria.
  • For t30, there are 248 genes that meet these two criteria.
  • For t120, there are 252 genes that meet these two criteria.
  • For t12h, there are 244 genes that meet these two criteria.
  • For t60h, there are 237 genes that meet these two criteria.
    • Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria?
  • For t0, there are 265 genes that meet these two criteria.
  • For t10, there are 221 genes that meet these two criteria.
  • For t30, there are 232 genes that meet these two criteria.
  • For t120, there are 228 genes that meet these two criteria.
  • For t12h, there are 236 genes that meet these two criteria.
  • For t60h, there are 243 genes that meet these two criteria.
    • Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05?
  • For t0, there are 99 genes that meet these two criteria.
  • For t10, there are 207 genes that meet these two criteria.
  • For t30, there are 150 genes that meet these two criteria.
  • For t120, there are 171 genes that meet these two criteria.
  • For t12h, there are 179 genes that meet these two criteria.
  • For t60h, there are 131 genes that meet these two criteria.
    • How many have an average log fold change of < -0.25 and p < 0.05?
  • For t0, there are 126 genes that meet these two criteria.
  • For t10, there are 169 genes that meet these two criteria.
  • For t30, there are 146 genes that meet these two criteria.
  • For t120, there are 175 genes that meet these two criteria.
  • For t12h, there are 174 genes that meet these two criteria.
  • For t60h, there are 137 genes that meet these two criteria.


  • What criteria did Schade et al. (2004) use to determine a significant gene expression change? How does it compare to our method?


Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset.

  • For t0, the average fold change is -0.68 and the p value is 0.4926.
  • For t10, the average fold change is 0.72 and the p value is 0.5446.
  • For t30, the average fold change is 3.19 and the p value is 0.6921.
  • For t120, the average fold change is 3.60 and the p value is 0.0258.
  • For t12h, the average fold change is 1.76 and the p value is 0.0089.
  • For t60h, the average fold change is 0.41 and the p value is 0.0015.


  • Which gene has the smallest p value in your dataset (at any timepoint)? You can find this by sorting your data based on p value (but be careful that you don't cause a mismatch in the rows of your data!) Look up the function of this gene at the Saccharomyces Genome Database and record it in your notebook. Why do you think the cell is changing this gene's expression upon cold shock?

I have 5 different genes that share the smallest p value of 0.0000 at different time intervals.