User:Lindenb/Notebook/UMR915/20110103
From OpenWetWare
Bonne année !
VCFAnnotator
wrote a tool to annotate VCf files. see my blog: http://plindenbaum.blogspot.com/2011/01/my-tool-to-annotate-vcf-files.html Running this tool with belgium data.
Comparing mask/no mask
mysql -D hg18 -u anonymous -e 'select distinct chrom,txStart,txEnd from knownGene where chrom="chr1" ' -N > /tmp/jeter.txt java -jar dist/maskfasta.jar -r /tmp/jeter.txt -e 10000 /GENOTYPAGE/data/users/lindenb/MASKBWA/NO_MASK/chr1.fa > /GENOTYPAGE/data/users/lindenb/MASKBWA/WITH_MASK/chr1.fa
[lindenb@srv-clc-02 NO_MASK]$ sed 's/N/\nN\n/g' NO_MASK/chr1.fa | grep N | wc -l 22250000 [lindenb@srv-clc-02 NO_MASK]$ sed 's/N/\nN\n/g' WITH_MASK/chr1.fa | grep N | wc -l 108399153
for each chr1.fa:
/usr/local/package/bwa-0.5.9rc1/bwa index chr1.fa /usr/local/package/bwa-0.5.9rc1/bwa aln chr1.fa ../FASTQ/ERR004053_1.recal.fastq.gz > aln1.sai /usr/local/package/bwa-0.5.9rc1/bwa aln chr1.fa ../FASTQ/ERR004053_2.recal.fastq.gz > aln2.sai /usr/local/package/bwa-0.5.9rc1/bwa sampe chr1.fa aln1.sai aln2.sai ../FASTQ/ERR004053_1.recal.fastq.gz ../FASTQ/ERR004053_2.recal.fastq.gz > file.sam /usr/local/package/samtools-0.1.10/samtools faidx chr1.fa /usr/local/package/samtools-0.1.10/samtools view -b -t chr1.fa.fai file.sam > file.bam /usr/local/package/samtools-0.1.10/samtools sort file.bam sorted /usr/local/package/samtools-0.1.10/samtools index sorted.bam /usr/local/package/samtools-0.1.10/samtools pileup -vcf chr1.fa sorted.bam | awk '($3=="*"&&$6>=50)||($3!="*"&&$6>=20)' | cut -d ' ' -f 1-4 | sort | uniq > pileup.txt
[lindenb@srv-clc-02 WITH_MASK]$ wc -l pileup.txt 24921 pileup.txt [lindenb@srv-clc-02 NO_MASK]$ wc -l pileup.txt 26062 pileup.txt [lindenb@srv-clc-02 MASKBWA]$ comm -12 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 13573 54292 266327 [lindenb@srv-clc-02 MASKBWA]$ comm -13 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 12489 49956 249431 [lindenb@srv-clc-02 MASKBWA]$ comm -23 WITH_MASK/pileup.txt NO_MASK/pileup.txt | wc 11348 45392 229152
on position appears masked but not w/o mask:
chr1 100005960 c A
in Masked
/usr/local/package/samtools-0.1.10/samtools tview WITH_MASK/sorted.bam WITH_MASK/chr1.fa
100005921 100005931 100005941 100005951 100005961 100005971 tgctaattggtcagattggagatggaatca*tggggggtcgacgtgaggttttcttgctgtcttct ....G.........G............... MM.......R.A....RK................. .. ,,,,,,,,,,,,,,cac,,,,,,,,,a,,,,a,,, ,,,,,,,,,,,,, .... ..........*CA.......A.A.......N....... ,,,,,, .. ..........*CA.......A.A............... ,,,, G...G..G.. ..........*CA.......A.A............... ....G.........G..T.. ...........T................. ....G....... .....A.....T................. ,,,,g,,,,,,,,,g,,,,,,,,,,,,,,,*,,,,, ..............G........
NO MASK:
/usr/local/package/samtools-0.1.10/samtools tview NO_MASK/sorted.bam NO_MASK/chr1.fa
100005931 100005941 100005951 100005961 100005971 100005981 tcagattggagatggaatcatggggggtcgacgtgaggttttcttgctgtcttctgttcctgggtg ..........CA.......R.A.....K............................ ..........CA.......A.A.......N....... ...........T......................... .....A.....T......................... .A.................................. ..............G...................