RRedon:Protocols/Variation pipeline
From OpenWetWare
Get Reference Genome
Download the hg18/build36 from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes
get SamTools
http://samtools.sourceforge.net/
With BWA
Merge all the reference sequences into one fasta file hg18.fasta (?) Index the reference genome:
bwa index -a bwtsw hg18.fasta
Pre mapping
- extract every 500th read from fastq file
- Align one fastq files
-l Take the first INT subsequence as seed
-q Parameter for read trimming.
bwa aln -l 32 -q 15 -foutput1.aln hg18.fasta file1.fastq.gz bwa aln -l 32 -q 15 -foutput2.aln hg18.fasta file2.fastq.gz
- Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.
bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz > output.sam
- export to bam
samtools view output.sam > output.bam
- sort bam
samtools sort output.bam sorted_prefix
do insert size stats e.g. 99.8 percentile for MAQ max insert size ← Fix this! {{{1}}}