User:Timothee Flutre/Notebook/Postdoc/2011/11/16
From OpenWetWare
(Difference between revisions)
(Autocreate 2011/11/16 Entry for User:Timothee_Flutre/Notebook/Postdoc) 
(try pkg snpStats) 

Line 7:  Line 7:  
<! ##### DO NOT edit above this line unless you know what you are doing. ##### >  <! ##### DO NOT edit above this line unless you know what you are doing. ##### >  
==Entry title==  ==Entry title==  
  *  +  
+  * try the R/Bioconductor package [http://www.bioconductor.org/packages/devel/bioc/html/snpStats.html snpStats]:  
+  
+  library(snpStats)  
+  tmp < matrix(c(1,3,2,1,3,0,1,3,0,1), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep="")))  
+  tmp  
+  tmp2 < new("SnpMatrix", t(tmp))  
+  tmp2  
+  summary(tmp2)  
+  print(as(t(tmp2), 'character'))  
+  print(as(t(tmp2), 'numeric'))  
+  
+  Unfortunately, it doesn't seem possible to convert a matrix of characters into SnpMatrix, assuming 1=AA, 2=AB, 3=BB and 0=NC:  
+  
+  tmp < matrix(c("A/A","B/B","A/B","A/A","B/B","","A/A","B/B","","A/A"), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep="")))  
+  tmp  
+  tmp2 < new("SnpMatrix", t(tmp))  
+  
+  Thus, in the case where one has a matrix of genotypes obtained by Illumina (whether we have AA or A/A), we need to convert it first to the 1/2/3/0 encoding:  
+  
+  tmp < gsub("A/A", 1, tmp)  
+  tmp < gsub("A/B", 2, tmp)  
+  tmp < gsub("B/B", 3, tmp)  
+  tmp < gsub("^$", 0, tmp)  
+  tmp < matrix(as.numeric(tmp), ncol=ncol(tmp), dimnames=list(rownames(tmp), colnames(tmp)))  
+  tmp  
+  tmp2 < new("SnpMatrix", t(tmp))  
+  tmp2  
+  summary(tmp2)  
+  
+  Then, one can easily look at summary statistics, eg. the histogram of minor allele frequencies, of zscore for HWE, etc, and filter data accordingly:  
+  
+  hist(col.summary(tmp2)$MAF)  
+  hist(col.summary(tmp2)$z.HWE)  
Revision as of 15:43, 16 November 2011
Project name  Main project page Previous entry Next entry 
Entry title
library(snpStats) tmp < matrix(c(1,3,2,1,3,0,1,3,0,1), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep=""))) tmp tmp2 < new("SnpMatrix", t(tmp)) tmp2 summary(tmp2) print(as(t(tmp2), 'character')) print(as(t(tmp2), 'numeric')) Unfortunately, it doesn't seem possible to convert a matrix of characters into SnpMatrix, assuming 1=AA, 2=AB, 3=BB and 0=NC: tmp < matrix(c("A/A","B/B","A/B","A/A","B/B","","A/A","B/B","","A/A"), ncol=2, dimnames=list(paste("snp", 1:5, sep=""), paste("ind", 1:2, sep=""))) tmp tmp2 < new("SnpMatrix", t(tmp)) Thus, in the case where one has a matrix of genotypes obtained by Illumina (whether we have AA or A/A), we need to convert it first to the 1/2/3/0 encoding: tmp < gsub("A/A", 1, tmp) tmp < gsub("A/B", 2, tmp) tmp < gsub("B/B", 3, tmp) tmp < gsub("^$", 0, tmp) tmp < matrix(as.numeric(tmp), ncol=ncol(tmp), dimnames=list(rownames(tmp), colnames(tmp))) tmp tmp2 < new("SnpMatrix", t(tmp)) tmp2 summary(tmp2) Then, one can easily look at summary statistics, eg. the histogram of minor allele frequencies, of zscore for HWE, etc, and filter data accordingly: hist(col.summary(tmp2)$MAF) hist(col.summary(tmp2)$z.HWE)
