Nick Rohacz: Summer 2011 Week 2

From OpenWetWare
Jump to navigationJump to search

May 25, 2011

Spent the morning trying to match up the Log Fold Changes before global normalization and after global normalization from the .gpr files. Ideally the graph should have a linear trend line if global normalization had subtracted the same constant from every spot. However, some of the files output a fuzzy region around the origin, this can be seen in the separate powerpoint with the graphs. The individual red/green intensities were graphed next. The normalized red intensities was graphed against the non-normalized red intensities and the same for the green intensities. These graphs should all be a linear line as well, however some awkward points occurred. The fuzzy region from the LogFC graph is thought to be a resultant from the "error" messages that are scattered throughout the .gpr files. These are a result of the intensity ratio being negative and the Log of any negative number does not exist. The number of "error" messages in each file seemed to correlate with the fuzzy section around the origin, as the number of "error"'s went up, the fuzziness increased to an extent were it almost was not linear anymore.

May 26, 2011

This morning was spent troubleshooting a line of code given by Dr. Fitzpatrick, M1<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean), meant to average the duplicate spots in the .gpr files after they were locally normalized using loess. We had to come back later fix this code using a for loop so that R would normalize all of the .gpr files instead of the one we had specified, 5 given that that was the first column with gene names when we were working on this yesterday. However, this was a .gpr file so the first four columns were just 1's, were as the MA data frame that was created did not incoporate these columns into the data frame. Once this was all figured out, I went to lunch and to check on the S. cerevisiae cultures (see lab notebook for details). Once we got back, we spent the remainder of the time trying to make sure the loess normalization worked, after talking with Dr. Fitzpatrick for a bit we switched our determined this was best done by graphing the normalized vs. nonnormalized log fold changes, after they had both been averaged using the tapply function. We spent a short amount of time trouble shooting this, mostly clerical errors when typing the source code, after which we were able to graph the relationship, which came out to be a relatively linear line, showing that loess normalization was working correctly. With this we are able to go back and do the loess normalization on the data given by the rest of strains.

May 27, 2011

We ran into a few problems at our first attempts at running the code even though we had had successful previous attempts on the previous day. The main problem we found was for the first for loop, the MM must have the [,i] afterwards, or else the output sheet will only be one column, and not all the columns that were put into the loop. After a few more typos that were taken care of this was the outcome:

par(mfrow=c(2,1))
  • targets<-readTargets(file.choose())
  • f<-function(x) as.numeric(x$Flags > -99)
  • RG<-read.maimages(targets, source="genepix.median", wt.fun=f)
  • plotMA(RG,main="before within array normalization")
  • MA<-normalizeWithinArrays(RG, method="loess", bc.method="normexp")
  • M1<-tapply(MA$M[,1],as.factor(MA$genes[,5]),mean)
  • n0<-length(MA$M[1,])
  • n1<-length(M1)
  • MM<-matrix(nrow=n1,ncol=n0)
  • MM[,1]=M1
  • for(i in 1:20) {MM[,i]<-tapply(MA$M[,i],as.factor(MA$genes[,5]),mean)}
  • write.table(MM,"wt_M_avg.csv",sep=",")
  • plotMA(MA,main="after within array normalization")
  • MAB<-normalizeBetweenArrays(MA,method="scale",targets=NULL)
  • b0<-length(MAB$M[1,])
  • N2<-tapply(MAB$M[,1],as.factor(MAB$genes[,5]),mean)
  • b1<-length(M1)
  • MMB<-matrix(nrow=b1,ncol=b0)
  • MMB[,1]=N2
  • for(i in 1:9) {MMB[,i]<-tapply(MAB$M[,i],as.factor(MAB$genes[,5]),mean)}
  • write.table(MMB,"between_array_norm.csv",sep=",")

The point of all this code is to take the .gpr files that we were given, put them into the R statistics database, create an MA plot for before normalization and then normalize them using the locally weighted scatterplot smoother (Loess), note the e in loess may stand for estimated seeing as how the we in lowess stands for weighted. After this was done the data was placed into a matrix, because R has trouble translating a MAlist to a data frame, this matrix was then averaged for every duplicate gene using the line of code discussed on May 26th. This was all written into an excel table and saved along with the accompanying picture of the MAplots. This data was then put through a between array normalization, placed into a separate matrix, boxplots were done of the before and after between array normalization, these were all saved. Some notes on the code is that all the for loops must be changed for the number of .gpr files that are put into the target text file. Work is still progressing on the GCAT chips, first of all, the flask numbering for the GCAT chip as well as flask 4 must be double checked because of some discrepancies as well as a want to not have to run all the data twelve times. However, I was able to place both targets onto R, normalize them, and merge them into a single MAlist. This can be done for the RG list before, at the moment, I am only testing this code to make sure it will accomplish what I need by testing segments and checking the attributes of the outcomes. Once the flask discrepancies can be sorted out with Dr. Dahlquist, a majority of the "New Master List" can be assembled and the GCAT can be focused on from there.