Kasey E. O'Connor Week 9 Journal

(Difference between revisions)
 Revision as of 12:46, 26 March 2013 (view source) (layout start)← Previous diff Current revision (17:58, 2 April 2013) (view source) (2 intermediate revisions not shown.) Line 1: Line 1: ==Microarry Data Analysis== ==Microarry Data Analysis== + ===Process=== + For this assignment, I began with the raw GLN3 data. To start analyzing it, the numbers must first be scaled and centered so they can be more accurately compared to one another. To do this, I had to find the average and standard deviation for each trial of each time period. After the data was scaled and centered, I was able to perform statistical analysis on the data. I found the Average Log Fold among all the trials for each time period.  Then, I used that data to find the P-value for each gene at every time period. With this, I filtered out and calculated the number of genes with significant expression change based on predetermined P-values. Doing this allowed me to see the change in gene expression as a reaction to the cold shock, and determine if there was significant up or down regulation. + + ===Questions=== + + #The number of replicates for each time point in the data. + #*There were four replications for each of the time periods: t15, t30, t60, t90, and t120. + #Why is the use of the dollar sign symbols in front of the number important?" + #*We must use the dollar sign symbols in front of the number to make sure that we are using the cell for average and standard deviation in the equation. Without it, Excel would take the data in incorrect cells as we copy and paste the master equation down throughout the whole column. + #How many genes have p value < 0.05? + #*t15: 781 + #*t30: 1539 + #*t60: 1559 + #*t90: 538 + #*t120: 564 + #What about p < 0.01? + #*t15: 218 + #*t30: 456 + #*t60: 384 + #*t90: 129 + #*t120: 114 + #What about p < 0.001? + #*t15: 21 + #*t30: 55 + #*t60: 51 + #*t90: 9 + #*t120: 16 + #What about p < 0.0001? + #*t15: 1 + #*t30: 4 + #*t60: 10 + #*t90: 3 + #*t120: 5 + #How many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction? + #*t15: 1 + #*t30: 0 + #*t60: 2 + #*t90: 1 + #*t120: 0 + #For time, t60, keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria? + #*760 + #Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria? + #*799 + #Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05? + #*727 + #How many have an average log fold change of < -0.25 and p < 0.05? + #*745 + #Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset. + #*Average Fold Change + #**t15: 1.2 + #**t30: 1.98 + #**t60: 1.96 + #**t90: -0.75 + #**t120: -0.63 + #*P-Value + #**t15: 0.0046 + #**t30: 0.0180 + #**t60: 0.0151 + #**t90: 0.0676 + #**t120: 0.1061 + #*For t15, t30, and t60, there is significant change in expression at the p < 0.05 level, however, only t15 is significant at the p < 0.01 level. + #Which gene has the smallest p value in your dataset (at any time point)? Why do you think the cell is changing this gene's expression upon cold shock? + #*SFH5 (YJL145W) has the smallest P-Value in the data set at time t90. This gene is responsible for protein transport into the plasma membrane, as well as transfer from the Golgi body. It would make sense that this gene is down regulated during recovery because as the cell is recuperating after the cold shock, most of the effort will be within the cell to repair, and there will be less need to bring molecules into the cell. + ====Useful Links==== ====Useful Links==== {{Kasey E. O'Connor}} {{Kasey E. O'Connor}}

Microarry Data Analysis

Process

For this assignment, I began with the raw GLN3 data. To start analyzing it, the numbers must first be scaled and centered so they can be more accurately compared to one another. To do this, I had to find the average and standard deviation for each trial of each time period. After the data was scaled and centered, I was able to perform statistical analysis on the data. I found the Average Log Fold among all the trials for each time period. Then, I used that data to find the P-value for each gene at every time period. With this, I filtered out and calculated the number of genes with significant expression change based on predetermined P-values. Doing this allowed me to see the change in gene expression as a reaction to the cold shock, and determine if there was significant up or down regulation.

Questions

1. The number of replicates for each time point in the data.
• There were four replications for each of the time periods: t15, t30, t60, t90, and t120.
2. Why is the use of the dollar sign symbols in front of the number important?"
• We must use the dollar sign symbols in front of the number to make sure that we are using the cell for average and standard deviation in the equation. Without it, Excel would take the data in incorrect cells as we copy and paste the master equation down throughout the whole column.
3. How many genes have p value < 0.05?
• t15: 781
• t30: 1539
• t60: 1559
• t90: 538
• t120: 564
4. What about p < 0.01?
• t15: 218
• t30: 456
• t60: 384
• t90: 129
• t120: 114
5. What about p < 0.001?
• t15: 21
• t30: 55
• t60: 51
• t90: 9
• t120: 16
6. What about p < 0.0001?
• t15: 1
• t30: 4
• t60: 10
• t90: 3
• t120: 5
7. How many of the genes are still significantly changed at p < 0.05 after the Bonferroni correction?
• t15: 1
• t30: 0
• t60: 2
• t90: 1
• t120: 0
8. For time, t60, keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change greater than zero. How many meet these two criteria?
• 760
9. Keeping the "Pval" filter at p < 0.05, filter the "AvgLogFC" column to show all genes with an average log fold change less than zero. How many meet these two criteria?
• 799
10. Keeping the "Pval" filter at p < 0.05, How many have an average log fold change of > 0.25 and p < 0.05?
• 727
11. How many have an average log fold change of < -0.25 and p < 0.05?
• 745
12. Find NSR1 in your dataset. Is it's expression significantly changed at any timepoint? Record the average fold change and p value for NSR1 for each timepoint in your dataset.
• Average Fold Change
• t15: 1.2
• t30: 1.98
• t60: 1.96
• t90: -0.75
• t120: -0.63
• P-Value
• t15: 0.0046
• t30: 0.0180
• t60: 0.0151
• t90: 0.0676
• t120: 0.1061
• For t15, t30, and t60, there is significant change in expression at the p < 0.05 level, however, only t15 is significant at the p < 0.01 level.
13. Which gene has the smallest p value in your dataset (at any time point)? Why do you think the cell is changing this gene's expression upon cold shock?
• SFH5 (YJL145W) has the smallest P-Value in the data set at time t90. This gene is responsible for protein transport into the plasma membrane, as well as transfer from the Golgi body. It would make sense that this gene is down regulated during recovery because as the cell is recuperating after the cold shock, most of the effort will be within the cell to repair, and there will be less need to bring molecules into the cell.