RRedon:Protocols/Variation pipeline/BWA: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 15: Line 15:
===Pre mapping===
===Pre mapping===
{{fix-this|why do we need a pre-mapping ?}}
{{fix-this|why do we need a pre-mapping ?}}
* extract every 500th read from fastq file
* extract every 500th read from fastq file{{fix-this|easy, but any standard tool for this ?}}
 
* Align one fastq files
* Align one fastq files
'''-l'''  Take the first INT subsequence as seed
'''-l'''  Take the first INT subsequence as seed
Line 29: Line 30:




[[Category:NGS]]
[[Category:Bioinformatics]]
* export to bam ?
* export to bam ?


Line 48: Line 47:


do insert size stats e.g. 99.8 percentile for MAQ max insert size {{fix-this|what does that mean ?}}
do insert size stats e.g. 99.8 percentile for MAQ max insert size {{fix-this|what does that mean ?}}
[[Category:NGS]]
[[Category:Bioinformatics]]

Revision as of 00:31, 2 June 2010

Home        Contact        Internal        Lab Members        Protocols        Publications        Research        Talks       


Mapping with BWA

Merge all the reference sequences into one fasta file hg18.fasta (? ← Fix this! need merge ?)

Index the reference genome:

  bwa index -a bwtsw hg18.fasta 

will create the following files : hg18.fasta.amb, hg18.fasta.ann, hg18.fasta.bwt, hg18.fasta.pac, hg18.fasta.rbwt, hg18.fasta.rpac, hg18.fasta.rsa, hg18.fasta.sa.


Pre mapping

← Fix this! why do we need a pre-mapping ?

  • extract every 500th read from fastq file← Fix this! easy, but any standard tool for this ?
  • Align one fastq files

-l Take the first INT subsequence as seed

-q Parameter for read trimming.

 bwa aln -l 32 -q 15 -foutput1.aln hg18.fasta file1.fastq.gz
 bwa aln -l 32 -q 15 -foutput2.aln hg18.fasta file2.fastq.gz
  • Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.
 bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz >  output.sam 


  • export to bam ?
 samtools view output.sam >  output.bam 


← Fix this! I used this ? Use the reference genome indexed by samtools

 samtools import hg18.fa.fai output.sam output.bam
 samtools sort output.bam output.bam.sorted
 samtools index chr1.sorted.bam
  • sort bam
 samtools sort output.bam sorted_prefix 

do insert size stats e.g. 99.8 percentile for MAQ max insert size ← Fix this! what does that mean ?