User:Lindenb/Notebook/UMR915/20101210: Difference between revisions
From OpenWetWare
Line 4: | Line 4: | ||
MosaikText -in align2.sorted.mka -bam sample2.bam</pre> | MosaikText -in align2.sorted.mka -bam sample2.bam</pre> | ||
recalibrate with GATK: | ===recalibrate with GATK:=== | ||
create a subset of dbsnp_129.rod for my ranges | create a subset of dbsnp_129.rod for my ranges | ||
<pre>java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXX.fa -T CountCovariates -l INFO -recalFile recal_data1.csv -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --DBSNP dbsnp_129_chrXXXX.rod -U </pre> | <pre>java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXX.fa -T CountCovariates -l INFO -recalFile recal_data1.csv -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --DBSNP dbsnp_129_chrXXXX.rod -U </pre> | ||
sample1:<pre>INFO 15:31:10,995 TraversalEngine - [PROGRESS] Traversed 5,625,164 sites in 73.96 secs (13.15 secs per 1M sites) | |||
INFO 15:31:10,996 TraversalEngine - Total runtime 73.96 secs, 1.23 min, 0.02 hours | |||
INFO 15:31:10,998 TraversalEngine - 89 reads were filtered out during traversal out of 187197 total (0.05%) | |||
INFO 15:31:10,998 TraversalEngine - -> 89 reads (0.05% of total) failing ZeroMappingQualityReadFilter</pre> | |||
sample2:<pre></pre> | |||
calling with GATK | what recal_data1.csv does look like ?<pre># Counted Sites 4619191 | ||
# Counted Bases 80003987 | |||
# Skipped Sites 18142 | |||
# Fraction Skipped 1 / 255 bp | |||
ReadGroup,QualityScore,Cycle,Dinuc,nObservations,nMismatches,Qempirical | |||
ZDID8XTBKGO,7,106,AA,2,0,40 | |||
ZDID8XTBKGO,7,107,AT,2,0,40 | |||
ZDID8XTBKGO,7,107,NN,1,0,40 | |||
ZDID8XTBKGO,7,107,TT,5,0,40 | |||
ZDID8XTBKGO,7,108,AA,1,0,40 | |||
ZDID8XTBKGO,7,108,CT,2,0,40 | |||
ZDID8XTBKGO,7,108,GA,1,0,40 | |||
ZDID8XTBKGO,7,108,GT,1,0,40 | |||
ZDID8XTBKGO,7,108,TA,1,1,1 | |||
ZDID8XTBKGO,7,108,TT,1,0,40</pre> | |||
===calling with GATK=== | |||
<pre> java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L "chrX:x1-x2" </pre> | <pre> java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L "chrX:x1-x2" </pre> | ||
or with a list of position | or with a list of position | ||
<pre>java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L ranges.list</pre> | <pre>java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L ranges.list</pre> |
Revision as of 07:35, 10 December 2010
Belgium
/usr/local/package/mosaik-aligner/bin/MosaikSort -in align2.mka -out align2.sorted.mka MosaikText -in align2.sorted.mka -bam sample2.bam
recalibrate with GATK:
create a subset of dbsnp_129.rod for my ranges
java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXX.fa -T CountCovariates -l INFO -recalFile recal_data1.csv -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate --DBSNP dbsnp_129_chrXXXX.rod -U
sample1:
INFO 15:31:10,995 TraversalEngine - [PROGRESS] Traversed 5,625,164 sites in 73.96 secs (13.15 secs per 1M sites) INFO 15:31:10,996 TraversalEngine - Total runtime 73.96 secs, 1.23 min, 0.02 hours INFO 15:31:10,998 TraversalEngine - 89 reads were filtered out during traversal out of 187197 total (0.05%) INFO 15:31:10,998 TraversalEngine - -> 89 reads (0.05% of total) failing ZeroMappingQualityReadFilter
sample2:
what recal_data1.csv does look like ?
# Counted Sites 4619191 # Counted Bases 80003987 # Skipped Sites 18142 # Fraction Skipped 1 / 255 bp ReadGroup,QualityScore,Cycle,Dinuc,nObservations,nMismatches,Qempirical ZDID8XTBKGO,7,106,AA,2,0,40 ZDID8XTBKGO,7,107,AT,2,0,40 ZDID8XTBKGO,7,107,NN,1,0,40 ZDID8XTBKGO,7,107,TT,5,0,40 ZDID8XTBKGO,7,108,AA,1,0,40 ZDID8XTBKGO,7,108,CT,2,0,40 ZDID8XTBKGO,7,108,GA,1,0,40 ZDID8XTBKGO,7,108,GT,1,0,40 ZDID8XTBKGO,7,108,TA,1,1,1 ZDID8XTBKGO,7,108,TT,1,0,40
calling with GATK
java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L "chrX:x1-x2"
or with a list of position
java -jar GenomeAnalysisTK.jar -I sample1.bam -R chrXXXXXXXXXXXXX.fa -T UnifiedGenotyper -o sample1.vcf -stand_call_conf 50.0 -U -S SILENT -L ranges.list