RRedon:Protocols/Variation pipeline/BWA: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Line 15: | Line 15: | ||
===Pre mapping=== | ===Pre mapping=== | ||
{{fix-this|why do we need a pre-mapping ?}} | {{fix-this|why do we need a pre-mapping ?}} | ||
* extract every 500th read from fastq file | * extract every 500th read from fastq file{{fix-this|easy, but any standard tool for this ?}} | ||
* Align one fastq files | * Align one fastq files | ||
'''-l''' Take the first INT subsequence as seed | '''-l''' Take the first INT subsequence as seed | ||
Line 29: | Line 30: | ||
* export to bam ? | * export to bam ? | ||
Line 48: | Line 47: | ||
do insert size stats e.g. 99.8 percentile for MAQ max insert size {{fix-this|what does that mean ?}} | do insert size stats e.g. 99.8 percentile for MAQ max insert size {{fix-this|what does that mean ?}} | ||
[[Category:NGS]] | |||
[[Category:Bioinformatics]] |
Revision as of 00:31, 2 June 2010
Mapping with BWA
Merge all the reference sequences into one fasta file hg18.fasta (? ← Fix this! need merge ?)
Index the reference genome:
bwa index -a bwtsw hg18.fasta
will create the following files : hg18.fasta.amb, hg18.fasta.ann, hg18.fasta.bwt, hg18.fasta.pac, hg18.fasta.rbwt, hg18.fasta.rpac, hg18.fasta.rsa, hg18.fasta.sa.
Pre mapping
← Fix this! why do we need a pre-mapping ?
- extract every 500th read from fastq file← Fix this! easy, but any standard tool for this ?
- Align one fastq files
-l Take the first INT subsequence as seed
-q Parameter for read trimming.
bwa aln -l 32 -q 15 -foutput1.aln hg18.fasta file1.fastq.gz bwa aln -l 32 -q 15 -foutput2.aln hg18.fasta file2.fastq.gz
- Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.
bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz > output.sam
- export to bam ?
samtools view output.sam > output.bam
← Fix this! I used this ?
Use the reference genome indexed by samtools
samtools import hg18.fa.fai output.sam output.bam samtools sort output.bam output.bam.sorted samtools index chr1.sorted.bam
- sort bam
samtools sort output.bam sorted_prefix
do insert size stats e.g. 99.8 percentile for MAQ max insert size ← Fix this! what does that mean ?