RRedon:Protocols/Variation pipeline/BWA: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
mNo edit summary
 
(4 intermediate revisions by the same user not shown)
Line 1: Line 1:
{{RRedon}}
{{RRedon}}
=Mapping with BWA=
=Download=


* [http://bio-bwa.sourceforge.net http://bio-bwa.sourceforge.net]
* [http://bio-bwa.sourceforge.net http://bio-bwa.sourceforge.net]


Merge all the reference sequences into one fasta file hg18.fasta (? {{fix-this|need merge ?}})
=Indexing the reference sequence=


Index the reference genome:
See Main article : [[RRedon:Protocols/Variation_pipeline/Reference_genome#BWA|Reference genome]]
 
  bwa index -a bwtsw hg18.fasta


(....)
=Map=
  [bwt_gen] Finished constructing BWT in 311 iterations.
* Align one fastq files
  [bwa_index] 2229.02 seconds elapse.
'''-l''' Take the first INT subsequence as seed
  [bwa_index] Update BWT... 15.79 sec
  [bwa_index] Update reverse BWT... 15.97 sec
  [bwa_index] Construct SA from BWT and Occ... 1001.58 sec
  [bwa_index] Construct SA from reverse BWT and Occ... 987.96 sec
 
    ls -la
  -rw-r--r-- 1 root root 3,0G jun  2 14:47 hg18.fa
  -rw-r--r-- 1 root root 123K jun  2 15:15 hg18.fa.amb
  -rw-r--r-- 1 root root 1,8K jun  2 15:15 hg18.fa.ann
  -rw-r--r-- 1 root root 1,1G jun  2 16:30 hg18.fa.bwt
  -rw-r--r-- 1 root root 739M jun 2 15:15 hg18.fa.pac
  -rw-r--r-- 1 root root 1,1G jun  2 16:30 hg18.fa.rbwt
  -rw-r--r-- 1 root root 739M jun  2 15:15 hg18.fa.rpac
  -rw-r--r-- 1 root root 370M jun  2 17:03 hg18.fa.rsa
  -rw-r--r-- 1 root root 370M jun  2 16:47 hg18.fa.sa
 
 
will create the following files : hg18.fasta.amb, hg18.fasta.ann, hg18.fasta.bwt, hg18.fasta.pac, hg18.fasta.rbwt, hg18.fasta.rpac, hg18.fasta.rsa, hg18.fasta.sa.


'''-q'''  Parameter for read trimming.


==Pre mapping==
{{fix-this|why do we need a pre-mapping ?}}
* extract every 500th read from fastq file{{fix-this|easy, but any standard tool for this ?}}


'''-t'''  number of threads = 10 on server 2


==Map==
   bwa aln -l 32 -q 15 -t 10 -foutput1.aln hg18.fasta file1.fastq.gz
* Align one fastq files
   bwa aln -l 32 -q 15 -t 10 -foutput2.aln hg18.fasta file2.fastq.gz
'''-l'''  Take the first INT subsequence as seed
 
'''-q'''  Parameter for read trimming.
 
   bwa aln -l 32 -q 15 -foutput1.aln hg18.fasta file1.fastq.gz
   bwa aln -l 32 -q 15 -foutput2.aln hg18.fasta file2.fastq.gz


* Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.  
* Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.  


   bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz >  output.sam  
   bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz | gzip --best >  output.sam.gz





Latest revision as of 00:49, 9 June 2010

Home        Contact        Internal        Lab Members        Protocols        Publications        Research        Talks       


Download

Indexing the reference sequence

See Main article : Reference genome

Map

  • Align one fastq files

-l Take the first INT subsequence as seed

-q Parameter for read trimming.


-t number of threads = 10 on server 2

 bwa aln -l 32 -q 15 -t 10 -foutput1.aln hg18.fasta file1.fastq.gz
 bwa aln -l 32 -q 15 -t 10 -foutput2.aln hg18.fasta file2.fastq.gz
  • Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.
 bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz | gzip --best >  output.sam.gz


  • export to bam ?
 samtools view output.sam >  output.bam 


← Fix this! I used this ? Use the reference genome indexed by samtools

 samtools import hg18.fa.fai output.sam output.bam
 samtools sort output.bam output.bam.sorted
 samtools index chr1.sorted.bam
  • sort bam
 samtools sort output.bam sorted_prefix 

do insert size stats e.g. 99.8 percentile for MAQ max insert size ← Fix this! what does that mean ?