Dahlquist:Microarray Data Processing in R: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Normalizing the Data: Edited discussion about creating csv file with gene ID's.)
(→‎Normalizing the Data: Edited discussion about how to create gene ID CSV file for GCAT chips.)
Line 7: Line 7:
Into the R Console window, type in "library(limma)".
Into the R Console window, type in "library(limma)".


*Load the target file into R and read the first GPR file so that R can Loess normalize it. Set the directory (File > Change dir...)to the folder containing the Ontario GPR files.
*Set the directory (File > Change dir...)to the folder containing the Ontario GPR files. Load the target file into R and read the first GPR file so that R can Loess normalize it.  
  Targets<-read.csv("Targets.csv",sep=",")
  Targets<-read.csv("Targets.csv",sep=",")
  f<-function(x) as.numeric(x$Flags > -99)
  f<-function(x) as.numeric(x$Flags > -99)
Line 27: Line 27:


*Write a table to your directory so you can save the names for your Gene IDs target file.
*Write a table to your directory so you can save the names for your Gene IDs target file.
  write.table(MM,"Ontario_ID.csv",sep=",",row.names=TRUE,append=FALSE)
  write.table(MM,"ONT_Index_ID.csv",sep=",",row.names=TRUE,append=FALSE)
 
Open the CSV file you just created. Delete the second column with the numerical values. Relabel cell AI as "ID" and sevae the file.


For the GCAT chips use these lines of code:
For the GCAT chips use these lines of code:


*Read the GCAT target file into R.
*Set the directory (File > Change dir...)to the folder containing the GCAT GPR files. Read the GCAT target file into R.
  targets<-read.csv("GCAT_Targets.csv",sep=",")
  targets<-read.csv("GCAT_Targets.csv",sep=",")
  f<-function(x) as.numeric(x$Flags > -99)
  f<-function(x) as.numeric(x$Flags > -99)
Line 48: Line 50:


*Average all duplicate spots so that only unique spots remain.
*Average all duplicate spots so that only unique spots remain.
  RR<-tapply(MAG$M[,1],as.factor(MAG$genes[,4]),mean)}
  RR<-tapply(MAG$M[,1],as.factor(MAG$genes[,4]),mean)


*Write this to a table and save it as a CSV file for later use.
*Write this to a table and save it as a CSV file for later use.
  write.table(RR,"GCAT_ID.csv",sep=",",row.names=TRUE,append=FALSE
  write.table(RR,"GCAT_ID.csv",sep=",",row.names=TRUE,append=FALSE)
 
Open the CSV file you just created. Delete the second column with the numerical values. Relabel cell AI as "ID" and sevae the file.


You will also need a CSV file consisting of a single column with the correct order of the Headers. This must be ordered by Strain (starting with the wild type and then the deletion strains in alphabetical order) then by TimePoint and finally by Flask.  
You will also need a CSV file consisting of a single column with the correct order of the Headers. This must be ordered by Strain (starting with the wild type and then the deletion strains in alphabetical order) then by TimePoint and finally by Flask.  


Make sure the ID target files consist of only one column labeled "ID" that consists of the ordered gene IDs. Also make sure that the target file and the GPR files for the Ontario chips are in one folder and the the target file and the GPR files for the GCAT chips are in another folder.
Also make sure that the target file and the GPR files for the Ontario chips are in one folder and the the target file and the GPR files for the GCAT chips are in another folder.


Open up R. Change the directory (File > Change dir...) to the folder containing the target file and the GPR file for the Ontario chips.
Open up R. Change the directory (File > Change dir...) to the folder containing the target file and the GPR file for the Ontario chips.

Revision as of 16:59, 11 October 2011

Home        Research        Protocols        Notebook        People        Publications        Courses        Contact       


Normalizing the Data

First, target files must be created to input all of the GPR files into R. In order to do so, open up Microsoft Excel. The first column should be labeled "FileName", consisting of all the GPR file names. This should be followed by a column labeled "Header" that has the column names for the GPR files, then "Strain", "TimePoint", "Flask", and "DyeSwap" for each GPR. Make sure they match up to their respective GPR files in the excel sheet and save this as a CSV file. Repeat this step for the GCAT chips, placing the top nine chips in the row 2-10 and the bottom nine in rows 11-19. You will also need to load a CSV file containing the gene IDs into R for both the Ontario and GCAT chips. This can be accomplished by running a single GPR file through R until you duplicate the spots. This is where R alphabetizes the IDs and their corresponding spots with them, this can then be saved to a table and copied to a new CSV file. This file can be loaded into R and used to add the gene IDs to the data. Use these lines of code:

Into the R Console window, type in "library(limma)".

  • Set the directory (File > Change dir...)to the folder containing the Ontario GPR files. Load the target file into R and read the first GPR file so that R can Loess normalize it.
Targets<-read.csv("Targets.csv",sep=",")
f<-function(x) as.numeric(x$Flags > -99)
RG<-read.maimages(Targets[1,1],source="genepix.median",wt.fun=f)
  • Loess Normalize the Ontario data.
MA<-normalizeWithinArrays(RG, method="loess", bc.method="normexp")
  • Tell R how many rows the target matrix is going to have.
M1<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean)
n1<-length(M1)
  • Tell R how many columns the target matrix is going to have.
n0<-length(MA$M[1,])
  • Create the target matrix
MM<-matrix(nrow=n1,ncol=n0)
  • Average the duplicate spots so that only unique spots remain.
MM<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean)
  • Write a table to your directory so you can save the names for your Gene IDs target file.
write.table(MM,"ONT_Index_ID.csv",sep=",",row.names=TRUE,append=FALSE)

Open the CSV file you just created. Delete the second column with the numerical values. Relabel cell AI as "ID" and sevae the file.

For the GCAT chips use these lines of code:

  • Set the directory (File > Change dir...)to the folder containing the GCAT GPR files. Read the GCAT target file into R.
targets<-read.csv("GCAT_Targets.csv",sep=",")
f<-function(x) as.numeric(x$Flags > -99)
RT<-read.maimages(targets[1:9,1],source="genepix.median",wt.fun=f)

  • Loess normalize the GCAT GPR files
MAG<-normalizeWithinArrays(RT,method="loess",bc.method="normexp")
  • Tell R how many rows the target matrix will have.
R1<-tapply(MAG$M[,1],as.factor(MAG$genes[,4]),mean)
r1<-length(R1)
  • Tell R how many columns the target matrix will have.
r0<-length(MAG$M[1,])
  • Create the target matrix
RR<-matrix(nrow=r1,ncol=r0)
  • Average all duplicate spots so that only unique spots remain.
RR<-tapply(MAG$M[,1],as.factor(MAG$genes[,4]),mean)
  • Write this to a table and save it as a CSV file for later use.
write.table(RR,"GCAT_ID.csv",sep=",",row.names=TRUE,append=FALSE)

Open the CSV file you just created. Delete the second column with the numerical values. Relabel cell AI as "ID" and sevae the file.

You will also need a CSV file consisting of a single column with the correct order of the Headers. This must be ordered by Strain (starting with the wild type and then the deletion strains in alphabetical order) then by TimePoint and finally by Flask.

Also make sure that the target file and the GPR files for the Ontario chips are in one folder and the the target file and the GPR files for the GCAT chips are in another folder.

Open up R. Change the directory (File > Change dir...) to the folder containing the target file and the GPR file for the Ontario chips.

  • Read the Ontario "Target" CSV file and the "ID" CSV file into R.
Targets<-read.csv("Targets.csv",sep=",")
Names<-read.csv("ONT_Index_ID.csv",sep=",")
f<-function(x) as.numeric(x$Flags > -99)
  • Separate the individual columns into their own locations in R so they may be called on later with ease.
ds<-Targets[,6]
row<-Names[,2]
col<-Targets[,2]
  • Read the GPR files into R so they can be normalized.
RG<-read.maimages(Targets[,1],source="genepix.median",wt.fun=f)
  • Loess normalize the GPR files and allow R to process all the GPR files, the more GPR files the more time this will take.
MA<-normalizeWithinArrays(RG, method="loess", bc.method="normexp")
  • Tell R how many rows the target matrix is going to have.
M1<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean)
n1<-length(M1)
  • Tell R how many columns the target matrix is going to have.
n0<-length(MA$M[1,])
  • Create the new matrix.
MM<-matrix(nrow=n1,ncol=n0)
  • Average all duplicate spots so that only unique spots remain.
for(i in 1:94) {MM[,i]<-tapply(MA$M[,i],as.factor(MA$genes[,5]),mean)}
  • Tell R how many rows the target matrix will have.
M2<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean)
n3<-length(M2)
  • Tell R how many columns the target matrix will have.
n2<-length(MA$M[1,])
  • Create the target matrix.
MN<-matrix(nrow=n3,ncol=n2)
  • Dye swap the GPR files.
for(i in 1:94) {MN[,i]<-ds[i]*MM[,i]}
  • Tell R that the resulting matrix should be a data frame.
MO<-as.data.frame.matrix(MN)
  • Assign Headers to the data frames columns.
colnames(MO)<-col
  • Assign IDs to the data frames rows.
rownames(MO)<-row
  • Tell R to dispose of the two Ontario controls.
ont1<-subset(MO,row.names(MO)!="Arabidopsis")
MP<-subset(ont1,row.names(ont1)!="3XSSC")

Switch to GCAT directory and start on the GCAT chips.

  • Read the GCAT target file into R.
targets<-read.csv("GCAT_Targets.csv",sep=",")
f<-function(x) as.numeric(x$Flags > -99)
  • Separate the Top and Bottom chips into their own locations in R.
RT<-read.maimages(targets[1:9,1],source="genepix.median",wt.fun=f)
RB<-read.maimages(targets[10:18,1],source="genepix.median",wt.fun=f)
  • Combine the Top GPRs with the Bottom GPRs so that there are only 9 chips left.
RGG<-rbind(RT,RB)
  • Loess normalize the GCAT data.
MAG<-normalizeWithinArrays(RGG,method="loess",bc.method="normexp")
  • Tell the R how rows the target matrix should have.
R1<-tapply(MAG$M[,1],as.factor(MAG$genes[,4]),mean)
r1<-length(R1)
  • Tell R how many columns the target matrix should have.
r0<-length(MAG$M[1,])
  • Create the target matrix.
RR<-matrix(nrow=r1,ncol=r0)
  • Average any duplicate spots in the GCAT data so that only unique spots remain.
for(i in 1:9) {RR[,i]<-tapply(MAG$M[,i],as.factor(MAG$genes[,4]),mean)}
  • Read the GCAT IDs file into R.
GNames<-read.csv("GCAT_ID.csv",sep=",")
  • Separate the column Headers into their own location.
Gcol<-targets[1:9,2]
  • Separate the row ID names into their own location.
Grow<-GNames[,2]
  • Tell R the target matrix should be a data frame instead.
GD<-as.data.frame.matrix(RR)
  • Assign column names to the data frame.
colnames(GD)<-Gcol
  • Assign row names to the data frame.
rownames(GD)<-Grow
  • Merge the GCAT and Ontario data together into a single data frame, any IDs that appear in one chip and not the other will appear as NA's in the data frame.
Q<-merge(MP,GD,by="row.names",all=T)
  • Tell R to get rid of the spots that are only in the GCAT chips, and keep all the spots that are in the Ontario chips, for the entire data set.
Z<-subset(Q,Q[,1] %in% Names[,2])
  • Specify the number of columns in the target matrix.
x0<-length(Z[1,])
  • Specify the number of rows in the target matrix.
x1<-length(Z[,1])
  • Create the target matrix.
XX<-matrix(nrow=x1,ncol=x0)
  • Tell R the row names from the merged data are the same as the row names for the new target matrix.
XX[,1]=Z[,1]
  • Tell R that the column Headers from the merged data are the same as the Headers for the new target matrix.
colnames(XX)=colnames(Z)
  • Divide each chip by its own MAD to scale the data.
for(i in 2:104) {XX[,i]<-Z[,i]/mad(Z[,i],na.rm=TRUE)}

Make sure you pick the correct directory where your ordered header CSV file is located

  • Read the correctly ordered Headers into R.
XZ<-read.csv("Ordered_headers.csv",sep=",")
  • Tell R that the chip data should be a data frame instead of a matrix.
XV<-as.data.frame.matrix(XX)
  • Sort the columns from the data frame into a new data frame using the ordered headers as the sorting criteria.
XY<-XV[,match(XZ[,1],colnames(XV))]
  • Write the final data set to a table, this should consist of all of the data, Loess normalized, scaled after the fact that the controls were gone, and then sorted into their correct order.
write.table(XY,"Master_.csv",sep=",",col.names=NA,row.names=TRUE,append=FALSE)

Alternate Way to Filter GCAT Chips

There is an alternate set of code that can be used to filter genes from the GCAT chips. The code below can be used to replace the code above for the GCAT chips at the point at which the GCAT and Ontario is merged together.

  • Get rid of the GCAT genes that are not on the Ontario chips.
names.to.keep<-row
GO<-subset(RR,row.names(RR) %in% names.to.keep)
  • Take out the controls (Arabidopsis and 3XSSC) from the Ontario chip.
ont1<-subset(row,Names[,2]!="Arabidopsis") 
ont2<-subset(ont1,Names[,2]!="3XSSC")
  • Find the Ontario genes that the GCAT chips do not have data for.
ONT<-subset(ont2,!Names[,2] %in% GCATgenes)
  • Create a new matrix for the subset of Ontario genes that are not on the GCAT chips. There should be as many rows as there Ontario genes in the subset (ONT). There should be as many columns as there are GCAT GPR files.
subO<-matrix(nrow=length(ONT[,1]),ncol=length(GNames[,2]))

  • Convert the matrix to a data frame.
subO2<-as.data.frame.matrix(subO)
  • Label the rows with the names of the Ontario genes in the subset (ONT).
rownames(subO2)<-ONT$Names[,2]
  • Label the columns with the names of the GPR files.
colnames(subO2)<-GNames[,2]

  • Bind the filtered GCAT data (GO) to the data frame with the Ontario genes not on the GCAT chips (subO2).
G<-rbind(GO,subO2)
  • Sort the data frame final so that the genes are in alphabetical order.
G.sort<-G[order(row.names(G)),]
  • Merge GCAT and Ontario within array normalized data.
merged<-cbind(G.sort,MP)
  • Read .csv file with the list of all the headers (the GPR files) in the correct order.
MasterList<-read.csv(file.choose())
  • Rearrange the columns so that they are ordered by strain (wildtype then deletion strains in alphabetical order), timepoint, and then flask.
merged.sort<-merged[,match(MasterList[,1],colnames(merged))]
  • Write the within array normalized data for all chips to a table.
write.table(merged.sort,"GCAT_and_Ontario_WAnorm.csv",sep=",",col.names=NA,row.names=TRUE)
  • Create a blank matrix the as many rows (r) and as many columns (col) as there are in the data frame with the GCAT and Ontario chip within array normalized data merged.
r<-length(merged.sort[,1])
col<-length(merged.sort[1,])
MADM<-matrix(nrow=r,ncol=col)
  • Scale each GPR file by its own MAD.
for (i in 1:103) {MADM[,i]<-merged.sort[,i]/mad(merged.sort[,i],na.rm=TRUE)}
  • Convert the matrix into a data frame.
merged.MAD<-as.data.frame.matrix(MADM)
  • Label the rows with the row names from the merged and sorted (merged.sort) data frame.

rownames(merged.MAD)<-row.names(merged.sort)

  • Label the columns with the column names from the merged and sorted (merged.sort) data frame.

colnames(merged.MAD)<-colnames(merged.sort)

  1. Write scale normalized data for all chips to a table.
write.table(merged.MAD,"ONT_and_GCAT_final_scaled_data.csv",sep=",",col.names=NA,row.names=TRUE)

Generating MA Plots and Boxplots

Use the following lines of code to create MA plots and boxplots for the GCAT chips.

First, you will create MA plots for the data before the normalization has occurred.

  • Set the dimensions of the window in which the graphs will appear to reflect the number of graphs that need to be fit into the window. Originally, there were 9 GCAT chips so in the line of code below there are 3 columns and 3 rows of graphs.
par(mfrow=c(3,3))
  • Set a variable (GeneList) for all of the GCAT gene IDs before the controls have been taken out and before replicates have been averaged.
GeneList<-RGG$genes$ID
  • Calculate the log fold changes (M values) for each spot on each chip before normalization has occurred.
lr<-(log2((RGG$R-RGG$Rb)/(RGG$G-RGG$Gb)))
  • Create a blank matrix with as many columns as there are GPR files and as rows as there are genes after replicates have been averaged.
r0<-length(lr[1,])
RX<-tapply(lr[,1],as.factor(GeneList),mean)
r1<-length(RX)
MG<-matrix(nrow=r1,ncol=r0)
  • Calculate the log fold changes (M values) for each spot on each chip after averaging duplicate genes. In the for loop, alter the range to reflect the number of GPR files.
for(i in 1:9) {MG[,i]<-tapply(lr[,i],as.factor(GeneList),mean)}
  • Calculate the intensity values (A values) for each spot on each chip before normalization has occurred.
la<-(1/2*log2((RGG$R-RGG$Rb)*(RGG$G-RGG$Gb)))
  • Create a blank matrix with as many columns as there are GPR files and as many rows as there are genes after replicates have been averaged.
r3<-length(la[1,])
RQ<-tapply(la[,1],as.factor(GeneList),mean)
r4<-length(RQ)
AG<-matrix(nrow=r4,ncol=r3)
  • Calculate the intensity values (A values) after averaging duplicate genes. In the for loop, make sure that the range reflects the number of GPR files.
for(i in 1:9) {AG[,i]<-tapply(la[,i],as.factor(GeneList),mean)}
  • Plot the log fold changes (M) against the intensities (A). In the for loop, make sure that the range reflects the number of GPR files.
for(i in 1:9) {plot(AG[,i],MG[,i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15))}

Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.

Next, you will create MA plots for the data after within array normalization has been performed.

  • Set the dimensions of the window in which the graphs will appear to reflect the number of graphs that need to be fit into the window.
par(mfrow=c(3,3))

The log fold changes after normalization is saved in R's memory under the variable RR. Therefore, just the intensity values have to be calculated after within array normalization has occurred.

  • Create a blank matrix with as many columns as there are columns in GPR files and as many rows as there are averaged duplicate genes.
X1<-tapply(MAG$A[,1],as.factor(MAG$genes[,4]),mean)
y0<-length(MAG$A[1,])
y1<-length(X1)
AAG<-matrix(nrow=y1,ncol=y0)
  • Calculate the intensity values (A) after normalization has occurred and after duplicate genes have been averaged. In the for loop, make sure that the range reflects the number of GPR files.
for(i in 1:9) {AAG[,i]<-tapply(MAG$A[,i],as.factor(MAG$genes[,4]),mean)}
  • Plot the log fold changes (M) against the intensities (A). In the for loop, make sure that the range reflects the number of GPR files.
for(i in 1:9) {plot(AAG[,i],RR[,i],ylab="M",xlab="A",ylim=c(-5,5),xlim=c(0,15))}

Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.

Use the following code to generate boxplots of the log fold changes for the GCAT chips before normalization has occurred, after within array normalization has been performed, and after scale normalization (dividing each chip by its MAD) has occurred.

  • Change the dimensions of the window in which the graphs will appear to reflect how many graphs need to be fit into the window. Since you will be generating three graphs, one for each stage in the normalization process, you can set the dimensions to one row with three columns.
par(mfrow=c(1,3))
  • Create a boxplot of the log fold changes before normalization has occurred. The number within the brackets next to the variable designating the matrix of nonnormalized log fold changes denotes a GPR file. Also, set the range of the y-axis (ylim) so that the range of the boxplot for each GPR file is visible.
boxplot(MG[,1],MG[,2],MG[,3],MG[,4],MG[,5],MG[,6],MG[,7],MG[,8],MG[,9],ylim=c(-5,5))
  • Create a boxplot of the log fold changes after within array normalization has occurred. The number within the brackets next to the variable designating the matrix of within array normalized log fold changes denotes a GPR file. Also, make sure that the range of the y axis (ylim) is the same as in the previous set of boxplots of the nonnormalized data.
boxplot(RR[,1],RR[,2],RR[,3],RR[,4],RR[,5],RR[,6],RR[,7],RR[,8],RR[,9],ylim=c(-5,5))
  • Create a boxplot of the log fold changes after scale normalization has occurred. The number within the brackets next to the variable designating the matrix of scale normalized log fold changes denotes a GCAT GPR file within the matrix of all of the scale normalized data for all of the chips (both Ontario and GCAT). Therefore, it is important to make sure that you have the right order of GCAT GPR files. Also, make sure that the range of the y axis (ylim) is the same as in the previous set of boxplots.
boxplot(XY[,1],XY[,5],XY[,6],XY[,10],XY[,11],XY[,14],XY[,15],XY[,19],XY[,20],ylim=c(-5,5))

Maximize the window in which the plots have appeared. Save the plots as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.

Use the following lines of code to create MA plots and boxplots for the Ontario chips.

First, you will create MA plots for the wildtype data before the normalization has occurred.

  • Set the dimensions of the window in which the graphs will appear to reflect the number of graphs that need to be fit into the window. There will be one graph for each GPR file. Since there were originally 14 GPR files for the wildtype the code below creates a a window to fit four rows and four columns of graphs.
par(mfrow=c(4,4))
  • Set a variable (genelist) for all of the Ontario gene IDs before the controls have been taken out and before replicates have been averaged.
genelist<-RG$genes$Name
  • Calculate the log fold changes (M values) for each spot on each chip before normalization has occurred. The log fold changes should also by multiplied by the list of dyeswaps taken from the targets file previously imported into R. In the for loop, alter the range to reflect the number of GPR files in RG for all strains.
for(i in 1:94) {lfm<-ds[i]*(log2((RG$R-RG$Rb)/(RG$G-RG$Gb)))}
  • Create a blank matrix with as many columns as there are GPR files for the wildtype and as many rows as there are genes after replicates have been averaged.
z0<-length(lfm[1,])
ZX<-tapply(lfm[,1],as.factor(genelist),mean)
z1<-length(ZX)
MZ<-matrix(nrow=z1,ncol=z0)
  • Calculate the log fold changes (M values) for each spot on each chip for the wildtype after averaging duplicate genes. In the for loop, alter the range to reflect the number of GPR files for the wildtype.
for(i in 1:14) {MZ[,i]<-tapply(lf[,i],as.factor(genelist),mean)}
  • Calculate the intensity values (A values) for each spot for each chip for the wildtype before normalization has occurred.
lfa<-(1/2*log2((RG$R-RG$Rb)*(RG$G-RG$Gb)))
  • Create a blank matrix with as many columns as there are GPR files for the wildtype and as many rows as there are genes after replicates have been averaged.
z3<-length(lfa[1,])
ZQ<-tapply(lfa[,1],as.factor(genelist),mean)
z4<-length(ZQ)
AZ<-matrix(nrow=z4,ncol=z3)
  • Calculate the intensity values (M values) for each spot on each chip for the wildtype after averaging duplicate genes. In the for loop, alter the range to reflect the number of GPR files for the wildtype.
for(i in 1:14) {AZ[,i]<-tapply(lfa[,i],as.factor(genelist),mean)}
  • Plot the log fold changes (M) against the intensities (A). In the for loop, make sure that the range reflects the number of GPR files for the wildtype.
for(i in 1:14) {plot(AZ[,i],MZ[,i],xlab="A",ylab="M",ylim=c(-5,5),xlim=c(0,15))}

Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.

Next, you will create MA plots for the wildtype data after within array normalization has been performed.

  • Set the dimensions of the window in which the graphs will appear to reflect the number of graphs that need to be fit into the window. There will be one graph for each GPR file.
par(mfrow=c(4,4))

The within array normalized log fold changes are already in R's memory under the variable MN. Therefore, just the intensity values have to be calculated after within array normalization has occurred.

  • Create a blank matrix with as many columns as there are GPR files for the wildtype and as many rows as there are genes after replicates have been averaged.
v1<-tapply(MA$A[,1],as.factor(MA$genes[,5]),mean)
w0<-length(MA$A[1,])
w1<-length(v1)
AAO<-matrix(nrow=w1,ncol=w0)
  • Calculate the intensity values (A) after normalization has occurred and after duplicate genes have been averaged. In the for loop, make sure that the range reflects the number of GPR files for the wildtype.
for(i in 1:14) {AAO[,i]<-tapply(MA$A[,i],as.factor(MA$genes[,5]),mean)}
  • Plot the log fold changes (M) against the intensities (A). In the for loop, make sure that the range reflects the number of GPR files for the wildtype.
for(i in 1:14) {plot(AAO[,i],MN[,i],ylab="M",xlab="A",ylim=c(-5,5),xlim=c(0,15))}

Maximize the window in which the graphs have appeared. Save the graphs as a JPEG (File>Save As>JPEG>100% quality...). Once the graphs have been saved, close the window.

Use the following code to generate boxplots of the log fold changes for the wildtype chips before normalization has occurred, after within array normalization has been performed, and after scale normalization (dividing each chip by its MAD) has occurred.

  • Create a boxplot of the log fold changes before normalization has occurred. The number within the brackets next to the variable designating the matrix of nonnormalized log fold changes denotes a GPR file. Also, set the range of the y-axis (ylim) so that the range of the boxplot for each GPR file is visible.
boxplot(MZ[,1],MZ[,2],MZ[,3],MZ[,4],MZ[,5],MZ[,6],MZ[,7],MZ[,8],MZ[,9],MZ[,10],MZ[,11],MZ[,12],MZ[,13],MZ[,14],ylim=c(-5,5))
  • Create a boxplot of the log fold changes after within array normalization has occurred. The number within the brackets next to the variable designating the matrix of within array normalized log fold changes denotes a GPR file. Also, make sure that the range of the y axis (ylim) is the same as in the previous set of boxplots of the nonnormalized data.
boxplot(MN[,1],MN[,2],MN[,3],MN[,4],MN[,5],MN[,6],MN[,7],MN[,8],MN[,9],MN[,10],MN[,11],MN[,12],MN[,13],MN[,14],ylim=c(-5,5))
  • Create a boxplot of the log fold changes after scale normalization has occurred. The number within the brackets next to the variable designating the matrix of scale normalized log fold changes denotes a Ontario GPR file within the matrix of all of the scale normalized data for all of the chips (both Ontario and GCAT). Therefore, it is important to make sure that you have the right order of Ontario GPR files. Also, make sure that the range of the y axis (ylim) is the same as in the previous set of boxplots.
boxplot(XY[,2],XY[,3],XY[,4],XY[,7],XY[,8],XY[,9],XY[,12],XY[,13],XY[,16],XY[,17],XY[,18],XY[,21],XY[,22],XY[,23],ylim=c(-5,5))

After MA plots and boxplots for the wildtype have been generated, you should make the same types of plots for the deletion strains. Work with one strain first creating the MA Plots and the three different boxplots for that strain before moving on to another strain. The same code as depicted above for the Ontario chips can be used for the deletion strains with some modifications. When designating the dimensions of the window in which the plots will appear, make sure that there are enough rows and columns to fit a graph for each GPR file for the strain. You do not have to reinput the code assigning the Ontario gene ID's to a variable nor the code that calculates the log fold changes before normalization nor the code that calculates intensities before normalization. For the MA plots, the range of the for loop must match the number of GPR files for the strain you are working on. For the boxplots, the number in the bracket next to the variable must correspond to the correct GPR for the strain you are working on. When generating the boxplot for the nonnormalized data, refer to the target file for the correct order of the GPR files for the strain you are working on. When generating the boxplot for the within array normalized data, refer to the data frame with the within array normalized data (MN) for the correct order of the GPR files for the strain you are working on. When generating the boxplot for the scale normalized data, refer to the final R output with the scale normalized data for all the chips for the correct order of the GPR files for the strain you are working on. When generating MA plots and boxplots for different strain, keep the x and y limits of the MA plot and the y limits of the boxplot the same for all the strains.