Dahlquist:Notebook/Microarray Data Analysis/2008/10/21: Difference between revisions

Revision as of 13:54, 21 October 2008

Home Research Protocols Notebook People Publications Courses Contact

Microarray Data Analysis

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Today's Workflow

The results generated on 10/14/2008 were downloaded and placed on the Desktop in "Edge Analysis" in Kevin's profile. Significant gene results were saved as tab-delimited files and the Pvalue Histograms and QPlots were saved into a powerpoint and printed.

Only the wt-only results should be used, the other results are useless, see below for explanation.

Previous run (10/14/2008) on dCIN5-only dataset gave interesting results. While the wt-only dataset produced about 1000 significant genes, the dCIN5-only one gave about 150 significant genes. To verify this result:

First the covariates and genelist files were uploaded to lion share. They will be opened with excel and checked for errors.
- IMPORTANT: It was found that the flask numbers were wrong for covariates files for dCIN5-only and wt-vs-dCIN5. They were changed and new runs were completed.
- The new files were saved on the desktop in the Edge Analysis folder as:
  - dCIN5-only_Edge_covariates_20081021.txt and
  - wt-vs-dCIN5_Edge_covariates_20081021.txt

Reran the dCIN5-vs-wt data with the updated covariate file:

Gene file in Desktop "Data analysis 2008-10-02", Covariate file on Desktop
- Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
- Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20081021.txt
Load both into an Edge session.
Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
- Percent of genes missing data: 7.63%
- Percent of arrays missing data: 95.35%
- Overall percent of missing data: 3.15%
For KNN Parameters, set:
- Percent of missing values to tolerate in a gene: 100 (so all genes included)
- Number of nearest neighbors to use (maximum of 15): 15
- clicked GO to impute missing data.
Selected "Identify Differentially Expressed Genes"
- Class Variable is: Strain
- Differential Expression Type is: Time Course
- Number of null iterations, set to 1000
- Choose a seed for reproducible results, set to 47
- Choose Time Course Settings
- Covariate giving time points is: Timepoint
- Covariate corresponding to individuals is: Flask
- Choose spline type, accepted default of Natural Cubic Spline, dimension 4
- Click "Apply" and then click "Go"
- 1000 permutations looks like it will take about 9 minutes.
Results: (Saved in 2008-10-14 Results)
- No significant genes under these settings.
- Choose Q-Value cutoff as 1, recalculate
  - Saved total list of genes as: "GeneList_20081014_wt-vs-dCIN5"
- To save the plots, do the following command in the R console window.

savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())

This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
- Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
- Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5

Then dCIN5 dataset was ran on its own:

Files in "Edge_data_20080710"
- Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
- Used covariate file "dCIN5-only_Edge_covariates_20080710.txt"
Load both into an Edge session.
Select "Impute Missing Data" from the menu. Calculate Percent Missing Data by clicking on the button. The results are:
- Percent of genes missing data: 1.32%
- Percent of arrays missing data: 90%
- Overall percent of missing data: 0.09%
For KNN Parameters, set:
- Percent of missing values to tolerate in a gene: 100 (so all genes included)
- Number of nearest neighbors to use (maximum of 15): 15
- clicked GO to impute missing data.
Selected "Identify Differentially Expressed Genes"
- Class Variable is: None (within class differential expression)
- Differential Expression Type is: Time Course
- Number of null iterations, set to 1000
- Choose a seed for reproducible results, set to 47
- Choose Time Course Settings
- Covariate giving time points is: Timepoint
- Covariate corresponding to individuals is: Flask
- Choose spline type, accepted default of Natural Cubic Spline, dimension 4
- Click "Apply" and then click "Go"
- 1000 permutations looks like it will take about 2 minutes.
Results: (Saved in 2008-10-14 Results)
- 157 Genes Called Significant (Cutoff Q Value 0.1)
- Saved total list of genes as "GeneList_20081014_dCIN5-only"
- Saved Q-Plot as "QPlot_20081014_dCIN5-only"
- Saved Histograms as "PvalHistogram_20081014_dCIN5-only"

@@ Line 22: / Line 22: @@
 *** wt-vs-dCIN5_Edge_covariates_20081021.txt
-* Then for an additional test, the difference between dCIN5 and wt at an individual timepoint was tested:
+''' Reran the dCIN5-vs-wt data with the updated covariate file:'''
-** Files in Desktop "Data analysis 2008-10-02"
+* Gene file in Desktop "Data analysis 2008-10-02", Covariate file on Desktop
 ** Used gene file "wt-dCIN5_consolidated_Edge_genes-indexonly_20080715.txt"
-** Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20080710.txt"
+** Used covariate file "wt-dCIN5_consolidated_Edge_covariates_20081021.txt
 * Load both into an Edge session.
 * Select "Impute Missing Data" from the menu.  Calculate Percent Missing Data by clicking on the button.  The results are:
@@ Line 37: / Line 37: @@
 * Selected "Identify Differentially Expressed Genes"
 ** Class Variable is: Strain
-** Differential Expression Type is: Static (standard, non-time course sampling)
+** Differential Expression Type is: Time Course
 ** Number of null iterations, set to 1000
 ** Choose a seed for reproducible results, set to 47
-** click "Go"
+** Choose Time Course Settings
-** 1000 permutations looks like it will take about 1h 35min. Because it was taking so long and because it may not have produced the results we wanted, I aborted to do the analysis stated below.
+** Covariate giving time points is: Timepoint
-** This computation will identify genes with a significant difference in expression between wt and dCIN5 without respect to time. To determine the difference between individual timepoints, the genes-indexonly files will have to be changed to show only the timepoint of interest.
+** Covariate corresponding to individuals is: Flask
+** Choose spline type, accepted default of Natural Cubic Spline, dimension 4
-* Results: (Saved in )
+** Click "Apply" and then click "Go"
-** # significant genes under these settings.
+** 1000 permutations looks like it will take about 9 minutes.
+* Results: (Saved in 2008-10-14 Results)
+** No significant genes under these settings.
 ** Choose Q-Value cutoff as 1, recalculate
-*** Saved total list of genes as: ""
+*** Saved total list of genes as: "GeneList_20081014_wt-vs-dCIN5"
 ** To save the plots, do the following command in the R console window.
   savePlot(filename = "PvalHistogram_wt-vs-dCIN5", type = c("png"), device = dev.cur())
 * This will save the active plot window under a file name you choose. Saves in folder "edge_1.1.290"
-** Saved Q-Plot as ""
+** Saved Q-Plot as "QPlot_20081014_wt-vs-dCIN5"
-** Saved Histograms as "
+** Saved Histograms as "PvalHistogram_20081014_wt-vs-dCIN5
+'''Then dCIN5 dataset was ran on its own:'''
+* Files in "Edge_data_20080710"
+** Used gene file "dCIN5-only_Edge_genes-indexonly_20080715.txt"
+** Used covariate file "dCIN5-only_Edge_covariates_20080710.txt"
+* Load both into an Edge session.
+* Select "Impute Missing Data" from the menu.  Calculate Percent Missing Data by clicking on the button.  The results are:
+** Percent of genes missing data: 1.32%
+** Percent of arrays missing data: 90%
+** Overall percent of missing data: 0.09%
+* For KNN Parameters, set:
+** Percent of missing values to tolerate in a gene: 100 (so all genes included)
+** Number of nearest neighbors to use (maximum of 15): 15
+** clicked GO to impute missing data.
+* Selected "Identify Differentially Expressed Genes"
+** Class Variable is: None (within class differential expression)
+** Differential Expression Type is: Time Course
+** Number of null iterations, set to 1000
+** Choose a seed for reproducible results, set to 47
+** Choose Time Course Settings
+** Covariate giving time points is: Timepoint
+** Covariate corresponding to individuals is: Flask
+** Choose spline type, accepted default of Natural Cubic Spline, dimension 4
+** Click "Apply" and then click "Go"
+** 1000 permutations looks like it will take about 2 minutes.
+* Results: (Saved in 2008-10-14 Results)
+** 157 Genes Called Significant (Cutoff Q Value 0.1)
+** Saved total list of genes as "GeneList_20081014_dCIN5-only"
+** Saved Q-Plot as "QPlot_20081014_dCIN5-only"
+** Saved Histograms as "PvalHistogram_20081014_dCIN5-only"

Dahlquist:Notebook/Microarray Data Analysis/2008/10/21: Difference between revisions

Revision as of 13:54, 21 October 2008

Today's Workflow

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools