RRedon:Protocols/Variation pipeline: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
No edit summary |
No edit summary |
||
Line 1: | Line 1: | ||
{{RRedon}} | {{RRedon}} | ||
= | =get Reference Genome= | ||
Download the hg18/build36 from UCSC: [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/ http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes] | Download the hg18/build36 from UCSC: [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes/ http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes] | ||
=get SamTools= | =get SamTools= | ||
Line 41: | Line 41: | ||
* [http://maq.sourceforge.net/ http://maq.sourceforge.net/] | * [http://maq.sourceforge.net/ http://maq.sourceforge.net/] | ||
=Other tools= | |||
* [http://hannonlab.cshl.edu/fastx_toolkit/ FASTX-Toolkit:a collection of command line tools for Short-Reads FASTA/FASTQ files preprocessing] | |||
[[Category:NGS]] | [[Category:NGS]] | ||
[[Category:Protocols]] | [[Category:Protocols]] | ||
[[Category:Bioinformatics]] | [[Category:Bioinformatics]] |
Revision as of 02:18, 31 May 2010
get Reference Genome
Download the hg18/build36 from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes
get SamTools
http://samtools.sourceforge.net/
With BWA
Merge all the reference sequences into one fasta file hg18.fasta (? ← Fix this! need merge ?) Index the reference genome:
bwa index -a bwtsw hg18.fasta
Pre mapping
← Fix this! why do we need a pre-mapping ?
- extract every 500th read from fastq file
- Align one fastq files
-l Take the first INT subsequence as seed
-q Parameter for read trimming.
bwa aln -l 32 -q 15 -foutput1.aln hg18.fasta file1.fastq.gz bwa aln -l 32 -q 15 -foutput2.aln hg18.fasta file2.fastq.gz
- Generate alignments in the SAM format given paired-end reads. Repetitive read pairs will be placed randomly.
bwa sampe hg18.fasta output1.aln output2.aln file1.fastq.gz file2.fastq.gz > output.sam
- export to bam
samtools view output.sam > output.bam
- sort bam
samtools sort output.bam sorted_prefix
do insert size stats e.g. 99.8 percentile for MAQ max insert size ← Fix this! what does that mean ?