Monica Hong Electronic Journal Edit: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Monica Hong (talk | contribs) No edit summary |
Monica Hong (talk | contribs) No edit summary |
||
(2 intermediate revisions by the same user not shown) | |||
Line 40: | Line 40: | ||
* Download the [[Media:Ontario_Chip_Within-Array_Normalization_modified_20150514.R script|Ontario_Chip_Within-Array_Normalization_modified_20150514.R]] and save (or move) it to this folder. | * Download the [[Media:Ontario_Chip_Within-Array_Normalization_modified_20150514.R script|Ontario_Chip_Within-Array_Normalization_modified_20150514.R]] and save (or move) it to this folder. | ||
* Download the [[Media:Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script|Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script]] and save (or move) it to this folder. | * Download the [[Media:Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script|Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script]] and save (or move) it to this folder. | ||
===Within Array Normalization for the Ontario Chips=== | |||
* Launch R x64 3.1.0 (make sure you are using the 64-bit version). | |||
* Change the directory to the folder containing the targets file and the GPR files for the Ontario chips by selecting the menu item File > Change dir... and clicking on the appropriate directory. You will need to click on the + sign to drill down to the right directory. Once you have selected it, click OK. | |||
* In R, select the menu item File > Source R code..., and select the Ontario_Chip_Within-Array_Normalization_modified_20150514.R script. | |||
** You will be prompted by an Open dialog for the Ontario targets file. Select the file Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1-Spar_20150514.csv and click Open. | |||
** Wait while R processes your files. | |||
===Within Array Normalization for the GCAT Chips and Between Array Normalization for All Chips=== | |||
* These instructions assume that you have just completed the Within Array Normalization for the Ontario Chips in the section above. | |||
* In R, select the menu item File > Source R code..., and select the Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script. | |||
** You will be prompted by an Open dialog for the GCAT targets file. Select the file GCAT_Targets.csv and click Open. | |||
** Wait while R processes your files. | |||
* When the processing has finished, you will find two files called GCAT_and_Ontario_Within_Array_Normalization.csv and GCAT_and_Ontario_Final_Normalized_Data.csv in the same folder. | |||
** Save these files to LionShare and/or to a flash drive. | |||
===Visualizing the Normalized Data=== | |||
Creating MA Plots and Box Plots for the GCAT Chips | |||
Input the following code, line by line, into the main R window. Press the enter key after each block of code. | |||
GCAT.GeneList<-RGG$genes$ID | |||
lg<-log2((RGG$R-RGG$Rb)/(RGG$G-RGG$Gb)) | |||
* If you get a message saying "NaNs produced" this is OK, proceed to the next step. | |||
r0<-length(lg[1,]) | |||
rx<-tapply(lg[,1],as.factor(GCAT.GeneList),mean) | |||
r1<-length(rx) | |||
MM<-matrix(nrow=r1,ncol=r0) | |||
for(i in 1:r0) {MM[,i]<-tapply(lg[,i],as.factor(GCAT.GeneList),mean)} | |||
MC<-matrix(nrow=r1,ncol=r0) | |||
for(i in 1:r0) {MC[,i]<-dw[i]*MM[,i]} | |||
MCD<-as.data.frame(MC) | |||
colnames(MCD)<-chips | |||
rownames(MCD)<-gcatID | |||
la<-(1/2*log2((RGG$R-RGG$Rb)*(RGG$G-RGG$Gb))) | |||
* If you get these Warning messages, it's OK: | |||
:1: In (RGG$R - RGG$Rb) * (RGG$G - RGG$Gb) : | |||
:NAs produced by integer overflow | |||
:2: NaNs produced | |||
r2<-length(la[1,]) | |||
ri<-tapply(la[,1],as.factor(GCAT.GeneList),mean) | |||
r3<-length(ri) | |||
AG<-matrix(nrow=r3,ncol=r2) | |||
for(i in 1:r2) {AG[,i]<-tapply(la[,i],as.factor(GCAT.GeneList),mean)} | |||
par(mfrow=c(3,3)) | |||
for(i in 1:r2) {plot(AG[,i],MC[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} | |||
browser() | |||
* Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter. | |||
** To make sure that you save the clearest image, do not scroll in the window because a grey bar will appear if you do so. | |||
* The next set of code is for the generation of the GCAT boxplots for the wild-type data. | |||
x0<-tapply(MAG$A[,1],as.factor(MAG$genes$ID),mean) | |||
y0<-length(MAG$A[1,]) | |||
x1<-length(x0) | |||
AAG<-matrix(nrow=x1,ncol=y0) | |||
for(i in 1:y0) {AAG[,i]<-tapply(MAG$A[,i],as.factor(MAG$genes$ID),mean)} | |||
par(mfrow=c(3,3)) | |||
for(i in 1:y0) {plot(AAG[,i],MG2[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} | |||
browser() | |||
* Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter. | |||
par(mfrow=c(1,3)) | |||
boxplot(MCD,main="Before Normalization",ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(chips)$x,tick=TRUE,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
boxplot(MG2,main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(chips)$x,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
boxplot(MAD[,Gtop$MasterList],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1, at=xy.coords(chips)$x,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
* Maximize the window in which the plots have appeared. You may not want to actually maximize them because you might lose the labels on the x axis, but make them as large as you can. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. | |||
* To continue with the rest of the code, press enter. | |||
par(mfrow=c(1,3)) | |||
boxplot(MCD,main="Before Normalization",ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(chips)$x,tick=TRUE,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
boxplot(MG2,main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(chips)$x,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
boxplot(MAD[,Gtop$MasterList],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1, at=xy.coords(chips)$x,labels=FALSE) | |||
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE) | |||
* Maximize the window in which the plots have appeared. You may not want to actually maximize them because you might lose the labels on the x axis, but make them as large as you can. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. | |||
==== Visualizing the Normalized Data; Create MA Plots and Box Plots for the Ontario Chips ==== | |||
Input the following code, line by line, into the main R window. Press the enter key after each block of code. | |||
Ontario.GeneList<-RGO$genes$Name | |||
lr<-log2((RGO$R-RGO$Rb)/(RGO$G-RGO$Gb)) | |||
* Warning message: "NaNs produced" is OK. | |||
z0<-length(lr[1,]) | |||
v0<-tapply(lr[,1],as.factor(Ontario.GeneList),mean) | |||
z1<-length(v0) | |||
MT<-matrix(nrow=z1,ncol=z0) | |||
for(i in 1:z0) {MT[,i]<-tapply(lr[,i],as.factor(Ontario.GeneList),mean)} | |||
MI<-matrix(nrow=z1,ncol=z0) | |||
for(i in 1:z0) {MI[,i]<-ds[i]*MT[,i]} | |||
MID<-as.data.frame(MI) | |||
colnames(MID)<-headers | |||
rownames(MID)<-ontID | |||
ln<-(1/2*log2((RGO$R-RGO$Rb)*(RGO$G-RGO$Gb))) | |||
* Warning messages are OK: | |||
:1: In (RGO$R - RGO$Rb) * (RGO$G - RGO$Gb) : | |||
: NAs produced by integer overflow | |||
:2: NaNs produced | |||
z2<-length(ln[1,]) | |||
zi<-tapply(ln[,1],as.factor(Ontario.GeneList),mean) | |||
z3<-length(zi) | |||
AO<-matrix(nrow=z3,ncol=z2) | |||
for(i in 1:z0) {AO[,i]<-tapply(ln[,i],as.factor(Ontario.GeneList),mean)} | |||
strains<-c('wt','dCIN5','dGLN3','dHAP4','dHMO1','dSWI4','dZAP1','Spar') | |||
*After entering the call browser() below, maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear. | |||
**The last graph to appear will be the spar graphs. | |||
**The graphs generated from this code are the before Ontario chips | |||
*Be sure to save the 8 graphs before moving on to the next step | |||
for (i in 1:length(strains)) { | |||
st<-strains[i] | |||
lt<-which(Otargets$Strain %in% st) | |||
if (st=='wt') { | |||
par(mfrow=c(3,5)) | |||
} else { | |||
par(mfrow=c(4,5)) | |||
} | |||
for (i in lt) { | |||
plot(AO[,i],MI[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) | |||
} | |||
browser() | |||
} | |||
*To continue generating plots, press enter. | |||
j0<-tapply(MAO$A[,1],as.factor(MAO$genes[,5]),mean) | |||
k0<-length(MAO$A[1,]) | |||
j1<-length(j0) | |||
AAO<-matrix(nrow=j1,ncol=k0) | |||
for(i in 1:k0) {AAO[,i]<-tapply(MAO$A[,i],as.factor(MAO$genes[,5]),mean)} | |||
*Remember, that after entering the call readline('Press Enter to continue'), maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear. | |||
**Again, the last graphs to appear will be the spar graphs. | |||
**These graphs that are produced are for the after Ontario chips | |||
*Again, be sure to save 8 graphs before moving on to the next part of the code. | |||
for (i in 1:length(strains)) { | |||
st<-strains[i] | |||
lt<-which(Otargets$Strain %in% st) | |||
if (st=='wt') { | |||
par(mfrow=c(3,5)) | |||
} else { | |||
par(mfrow=c(4,5)) | |||
} | |||
for (i in lt) { | |||
plot(AAO[,i],MD2[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) | |||
} | |||
browser() | |||
} | |||
*To continue generating plots, press enter. | |||
for (i in 1:length(strains)) { | |||
par(mfrow=c(1,3)) | |||
st<-strains[i] | |||
lt<-which(Otargets$Strain %in% st) | |||
if (st=='wt') { | |||
xcoord<-xy.coords(lt)$x-1 | |||
fsize<-0.9 | |||
} else { | |||
xcoord<-xy.coords(lt)$x-1.7 | |||
fsize<-0.8 | |||
} | |||
boxplot(MID[,lt],main='Before Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(lt)$x,labels=FALSE) | |||
text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) | |||
boxplot(MD2[,lt],main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(lt)$x,labels=FALSE) | |||
text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) | |||
ft<-Otargets$MasterList[which(Otargets$Strain %in% st)] | |||
boxplot(MAD[,ft],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') | |||
axis(1,at=xy.coords(lt)$x,labels=FALSE) | |||
text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) | |||
browser() | |||
} | |||
*To continue generating the box plots, press enter. | |||
**You will have to save 8 plots before you have completed the procedure. The last box plot is for spar. | |||
* Warnings are OK. | |||
* Zip the files of the plots together and upload to LionShare and/or save to a flash drive. | |||
===Step 6: Statistical Analysis=== | |||
* For the statistical analysis, we will begin with the file "GCAT_and_Ontario_Final_Normalized_Data.csv" that you generated in the previous step. | |||
* Open this file in Excel and Save As an Excel Workbook *.xlsx. It is a good idea to add your initials and the date (yyyymmdd) to the filename as well. | |||
* Rename the worksheet with the data "Compiled_Normalized_Data". | |||
** Type the header "ID" in cell A1. | |||
** Insert a new column after column A and name it "Standard Name". Column B will contain the common names for the genes on the microarray. | |||
*** Copy the entire column of IDs from Column A. | |||
*** Paste the names into the "Value" field of the [http://www.yeastract.com/formorftogene.php ORF List <-> Gene List] tool in [http://www.yeastract.com YEASTRACT]. Then, click on the "Transform" button. | |||
*** Select all of the names in the "Gene Name" column of the resulting table. | |||
*** Copy and paste these names into column B of the *.xlsx file. Save your work. | |||
** Insert a new column on the very left and name it "MasterIndex". We will create a numerical index of genes so that we can always sort them back into the same order. | |||
*** Type a "1" in cell A2 and a "2" in cell A3. | |||
*** Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 6189 (the number of genes on the microarray). | |||
* Insert a new worksheet and call it "Rounded_Normalized_Data". We are going to round the normalization results to four decimal places because of slight variations seen in different runs of the normalization script. | |||
** Copy the first three columns of the "Compiled_Normalized_Data" sheet and paste it into the first three columns of the "Rounded_Normalized_Data" sheet. | |||
** Copy the first row of the "Compiled_Normalized_Data" sheet and paste it into the first row of the "Rounded_Normalized_Data" sheet. | |||
** In cell C2, type the equation =ROUND(Compiled_Normalized_Data!C2,4). | |||
** Copy and paste this equation in the rest of the cells of row 2. | |||
** Select all of the cells of row 2 and hover your mouse over the bottom right corner of the selection. When the cursor changes to a thin black "plus" sign, double-click on it to paste the equation to all the rows in the worksheet. Save your work. | |||
* Insert a new worksheet and call it "Master_Sheet". | |||
** Go back to the "Rounded_Normalized_Data" sheet and Select All and Copy. | |||
** Click on cell A1 of the "Master_Sheet" worksheet. Select Paste special > Paste values to paste the values, but not the formulas from the previous sheet. Save your work. | |||
** There will be some #VALUE! errors in cells where there was missing data for genes that existed on the Ontario chips, but not the GCAT chips. | |||
*** Select the menu item Find/Replace and Find all cells with "#VALUE!" and replace them with a single space character. Record how many replacements were made to your electronic lab notebook. Save your work. | |||
* This will be the starting point for our statistical analysis below. | |||
===P-value tables for dHAP4 strain=== | ===P-value tables for dHAP4 strain=== | ||
[[Media:PvaluesMH051915_table.pptx|PvaluesMH051915_table.pptx]] | [[Media:PvaluesMH051915_table.pptx|PvaluesMH051915_table.pptx]] |
Revision as of 11:23, 20 May 2015
Microarray Data Analysis
- Edited on 05/18/15, 05/19/15
Viewing File Extensions
- The Windows 7 operating systems defaults to hiding file extensions. To turn them back on, do the following:
- Go to the Start menu and select "Control Panel".
- In the window that appears, search for "Folder Options" in the search field in the upper right hand corner.
- Click on "Folder Options" in the main window.
- When the Folder Options window appears, click on the View tab.
- Uncheck the box for "Hide extensions for known file types".
- Click the OK button.
Set Your Browser to Prompt You for the Location to Save your Downloaded Files
- In Google Chrome, open the Settings window.
- Click on the link at the bottom of the page that says "Advanced Settings".
- Scroll down to "Downloads" and check the box that says "Ask where to save each file before downloading".
- You could also change the default Download location to your Desktop, so that will be the first choice when it prompts you where to save the file.
- Your settings are automatically saved.
Steps 1-3: Generating Log2 Ratios with GenePix Pro
- The protocol for gridding and generating the intensity (log2 ratio) data with GenePix Pro 6.1 is found on [[1]].
- This protocol will generate a *.gpr file for each chip which is then fed into the normalization protocol below.
Steps 4-5: Within- and Between-chip Normalization
- Installing R 3.1.0 and the limma package
- The following protocol was developed to normalize GCAT and Ontario DNA microarray chip data from the Dahlquist lab using the R Statistical Software and the limma package (part of the Bioconductor Project).
- The normalization procedure has been verified to work with version 3.1.0 of R released in April 2014 ([[2]]) and and version 3.20.1 of the limma package (Limma.3.20.1.zip) on the Windows 7 platform.
- Note that using other versions of R or the limma package might give different results.
- Note also that using the 32-bit versus the 64-bit versions of R 3.1.0 will give different results for the normalization out in the 10-13 or 10-14 decimal place. The Dahlquist Lab is standardizing on using the 64-bit version of R.
- To install R for the first time, download and run the installer from the link above, accepting the default installation.
- To use the limma package, unzip the file and place the contents into a folder called "limma" in the library directory of the R program. If you accept the default location, that will be C:\Program Files\R\R-3.1.0\library (this will be different on the computers in S120 since you do not have administrator rights).
- The normalization procedure has been verified to work with version 3.1.0 of R released in April 2014 ([[2]]) and and version 3.20.1 of the limma package (Limma.3.20.1.zip) on the Windows 7 platform.
Running the Normalization Scripts
- Create a folder on your Desktop to store your files for the microarray analysis procedure.
- Download the zipped file wt-dCIN5-dGLN3-dHAP1-dHMO1-dSWI4-dZAP1-Spar_gpr-files.zip that contains the .gpr files and save it to this folder (or move it if it saved in a different folder).
- Unzip this file using 7-zip. Right-click on the file and select the menu item, "7-zip > Extract Here".
- Download the GCAT_Targets.csv file GCAT_Targets.csv and Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1-Spar_20150514.csv files and save them to this folder (or move them if they saved to a different folder).
- Download the Ontario_Chip_Within-Array_Normalization_modified_20150514.R and save (or move) it to this folder.
- Download the Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script and save (or move) it to this folder.
Within Array Normalization for the Ontario Chips
- Launch R x64 3.1.0 (make sure you are using the 64-bit version).
- Change the directory to the folder containing the targets file and the GPR files for the Ontario chips by selecting the menu item File > Change dir... and clicking on the appropriate directory. You will need to click on the + sign to drill down to the right directory. Once you have selected it, click OK.
- In R, select the menu item File > Source R code..., and select the Ontario_Chip_Within-Array_Normalization_modified_20150514.R script.
- You will be prompted by an Open dialog for the Ontario targets file. Select the file Ontario_Targets_wt-dCIN5-dGLN3-dHAP4-dHMO1-dSWI4-dZAP1-Spar_20150514.csv and click Open.
- Wait while R processes your files.
Within Array Normalization for the GCAT Chips and Between Array Normalization for All Chips
- These instructions assume that you have just completed the Within Array Normalization for the Ontario Chips in the section above.
- In R, select the menu item File > Source R code..., and select the Within-Array_Normalization_GCAT_and_Merged_Ontario-GCAT_Between-Chip_Normalization_modified_20150514.R script.
- You will be prompted by an Open dialog for the GCAT targets file. Select the file GCAT_Targets.csv and click Open.
- Wait while R processes your files.
- When the processing has finished, you will find two files called GCAT_and_Ontario_Within_Array_Normalization.csv and GCAT_and_Ontario_Final_Normalized_Data.csv in the same folder.
- Save these files to LionShare and/or to a flash drive.
Visualizing the Normalized Data
Creating MA Plots and Box Plots for the GCAT Chips
Input the following code, line by line, into the main R window. Press the enter key after each block of code.
GCAT.GeneList<-RGG$genes$ID
lg<-log2((RGG$R-RGG$Rb)/(RGG$G-RGG$Gb))
- If you get a message saying "NaNs produced" this is OK, proceed to the next step.
r0<-length(lg[1,]) rx<-tapply(lg[,1],as.factor(GCAT.GeneList),mean) r1<-length(rx) MM<-matrix(nrow=r1,ncol=r0)
for(i in 1:r0) {MM[,i]<-tapply(lg[,i],as.factor(GCAT.GeneList),mean)}
MC<-matrix(nrow=r1,ncol=r0)
for(i in 1:r0) {MC[,i]<-dw[i]*MM[,i]}
MCD<-as.data.frame(MC) colnames(MCD)<-chips rownames(MCD)<-gcatID
la<-(1/2*log2((RGG$R-RGG$Rb)*(RGG$G-RGG$Gb)))
- If you get these Warning messages, it's OK:
- 1: In (RGG$R - RGG$Rb) * (RGG$G - RGG$Gb) :
- NAs produced by integer overflow
- 2: NaNs produced
r2<-length(la[1,]) ri<-tapply(la[,1],as.factor(GCAT.GeneList),mean) r3<-length(ri) AG<-matrix(nrow=r3,ncol=r2)
for(i in 1:r2) {AG[,i]<-tapply(la[,i],as.factor(GCAT.GeneList),mean)}
par(mfrow=c(3,3))
for(i in 1:r2) {plot(AG[,i],MC[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} browser()
- Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter.
- To make sure that you save the clearest image, do not scroll in the window because a grey bar will appear if you do so.
- The next set of code is for the generation of the GCAT boxplots for the wild-type data.
x0<-tapply(MAG$A[,1],as.factor(MAG$genes$ID),mean) y0<-length(MAG$A[1,]) x1<-length(x0) AAG<-matrix(nrow=x1,ncol=y0)
for(i in 1:y0) {AAG[,i]<-tapply(MAG$A[,i],as.factor(MAG$genes$ID),mean)}
par(mfrow=c(3,3))
for(i in 1:y0) {plot(AAG[,i],MG2[,i],main=chips[i],xlab='A',ylab='M',ylim=c(-5,5),xlim=c(0,15))} browser()
- Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window. To continue with the rest of the code, press Enter.
par(mfrow=c(1,3))
boxplot(MCD,main="Before Normalization",ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,tick=TRUE,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MG2,main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MAD[,Gtop$MasterList],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1, at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
- Maximize the window in which the plots have appeared. You may not want to actually maximize them because you might lose the labels on the x axis, but make them as large as you can. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.
- To continue with the rest of the code, press enter.
par(mfrow=c(1,3))
boxplot(MCD,main="Before Normalization",ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,tick=TRUE,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MG2,main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1,at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
boxplot(MAD[,Gtop$MasterList],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n')
axis(1, at=xy.coords(chips)$x,labels=FALSE)
text(xy.coords(chips)$x-1,par('usr')[3]-0.6,labels=chips,srt=45,cex=0.9,xpd=TRUE)
- Maximize the window in which the plots have appeared. You may not want to actually maximize them because you might lose the labels on the x axis, but make them as large as you can. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.
Visualizing the Normalized Data; Create MA Plots and Box Plots for the Ontario Chips
Input the following code, line by line, into the main R window. Press the enter key after each block of code.
Ontario.GeneList<-RGO$genes$Name
lr<-log2((RGO$R-RGO$Rb)/(RGO$G-RGO$Gb))
- Warning message: "NaNs produced" is OK.
z0<-length(lr[1,]) v0<-tapply(lr[,1],as.factor(Ontario.GeneList),mean) z1<-length(v0) MT<-matrix(nrow=z1,ncol=z0)
for(i in 1:z0) {MT[,i]<-tapply(lr[,i],as.factor(Ontario.GeneList),mean)}
MI<-matrix(nrow=z1,ncol=z0)
for(i in 1:z0) {MI[,i]<-ds[i]*MT[,i]}
MID<-as.data.frame(MI) colnames(MID)<-headers rownames(MID)<-ontID
ln<-(1/2*log2((RGO$R-RGO$Rb)*(RGO$G-RGO$Gb)))
- Warning messages are OK:
- 1: In (RGO$R - RGO$Rb) * (RGO$G - RGO$Gb) :
- NAs produced by integer overflow
- 2: NaNs produced
z2<-length(ln[1,]) zi<-tapply(ln[,1],as.factor(Ontario.GeneList),mean) z3<-length(zi) AO<-matrix(nrow=z3,ncol=z2)
for(i in 1:z0) {AO[,i]<-tapply(ln[,i],as.factor(Ontario.GeneList),mean)}
strains<-c('wt','dCIN5','dGLN3','dHAP4','dHMO1','dSWI4','dZAP1','Spar')
- After entering the call browser() below, maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear.
- The last graph to appear will be the spar graphs.
- The graphs generated from this code are the before Ontario chips
- Be sure to save the 8 graphs before moving on to the next step
for (i in 1:length(strains)) { st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { par(mfrow=c(3,5)) } else { par(mfrow=c(4,5)) } for (i in lt) { plot(AO[,i],MI[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) } browser() }
- To continue generating plots, press enter.
j0<-tapply(MAO$A[,1],as.factor(MAO$genes[,5]),mean) k0<-length(MAO$A[1,]) j1<-length(j0) AAO<-matrix(nrow=j1,ncol=k0)
for(i in 1:k0) {AAO[,i]<-tapply(MAO$A[,i],as.factor(MAO$genes[,5]),mean)}
- Remember, that after entering the call readline('Press Enter to continue'), maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window and press Enter for the next set of graphs to appear.
- Again, the last graphs to appear will be the spar graphs.
- These graphs that are produced are for the after Ontario chips
- Again, be sure to save 8 graphs before moving on to the next part of the code.
for (i in 1:length(strains)) { st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { par(mfrow=c(3,5)) } else { par(mfrow=c(4,5)) } for (i in lt) { plot(AAO[,i],MD2[,i],main=headers[i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15)) } browser() }
- To continue generating plots, press enter.
for (i in 1:length(strains)) { par(mfrow=c(1,3)) st<-strains[i] lt<-which(Otargets$Strain %in% st) if (st=='wt') { xcoord<-xy.coords(lt)$x-1 fsize<-0.9 } else { xcoord<-xy.coords(lt)$x-1.7 fsize<-0.8 } boxplot(MID[,lt],main='Before Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) boxplot(MD2[,lt],main='After Within Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) ft<-Otargets$MasterList[which(Otargets$Strain %in% st)] boxplot(MAD[,ft],main='After Between Array Normalization',ylab='Log Fold Change',ylim=c(-5,5),xaxt='n') axis(1,at=xy.coords(lt)$x,labels=FALSE) text(xcoord,par('usr')[3]-0.65,labels=headers[lt],srt=45,cex=fsize,xpd=TRUE) browser() }
- To continue generating the box plots, press enter.
- You will have to save 8 plots before you have completed the procedure. The last box plot is for spar.
- Warnings are OK.
- Zip the files of the plots together and upload to LionShare and/or save to a flash drive.
Step 6: Statistical Analysis
- For the statistical analysis, we will begin with the file "GCAT_and_Ontario_Final_Normalized_Data.csv" that you generated in the previous step.
- Open this file in Excel and Save As an Excel Workbook *.xlsx. It is a good idea to add your initials and the date (yyyymmdd) to the filename as well.
- Rename the worksheet with the data "Compiled_Normalized_Data".
- Type the header "ID" in cell A1.
- Insert a new column after column A and name it "Standard Name". Column B will contain the common names for the genes on the microarray.
- Copy the entire column of IDs from Column A.
- Paste the names into the "Value" field of the ORF List <-> Gene List tool in YEASTRACT. Then, click on the "Transform" button.
- Select all of the names in the "Gene Name" column of the resulting table.
- Copy and paste these names into column B of the *.xlsx file. Save your work.
- Insert a new column on the very left and name it "MasterIndex". We will create a numerical index of genes so that we can always sort them back into the same order.
- Type a "1" in cell A2 and a "2" in cell A3.
- Select both cells. Hover your mouse over the bottom-right corner of the selection until it makes a thin black + sign. Double-click on the + sign to fill the entire column with a series of numbers from 1 to 6189 (the number of genes on the microarray).
- Insert a new worksheet and call it "Rounded_Normalized_Data". We are going to round the normalization results to four decimal places because of slight variations seen in different runs of the normalization script.
- Copy the first three columns of the "Compiled_Normalized_Data" sheet and paste it into the first three columns of the "Rounded_Normalized_Data" sheet.
- Copy the first row of the "Compiled_Normalized_Data" sheet and paste it into the first row of the "Rounded_Normalized_Data" sheet.
- In cell C2, type the equation =ROUND(Compiled_Normalized_Data!C2,4).
- Copy and paste this equation in the rest of the cells of row 2.
- Select all of the cells of row 2 and hover your mouse over the bottom right corner of the selection. When the cursor changes to a thin black "plus" sign, double-click on it to paste the equation to all the rows in the worksheet. Save your work.
- Insert a new worksheet and call it "Master_Sheet".
- Go back to the "Rounded_Normalized_Data" sheet and Select All and Copy.
- Click on cell A1 of the "Master_Sheet" worksheet. Select Paste special > Paste values to paste the values, but not the formulas from the previous sheet. Save your work.
- There will be some #VALUE! errors in cells where there was missing data for genes that existed on the Ontario chips, but not the GCAT chips.
- Select the menu item Find/Replace and Find all cells with "#VALUE!" and replace them with a single space character. Record how many replacements were made to your electronic lab notebook. Save your work.
- This will be the starting point for our statistical analysis below.