RRedon:Protocols/Variation pipeline: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 38: Line 38:
=SNP Calling=
=SNP Calling=
{{to-do|}}
{{to-do|}}
{{fix-this|10pb removal: how to ?}}
==Create the VCF==
==Create the VCF==


   java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -I markdup.bam -R hg18.fa  -varout markdup.bam.vcf -vf VCF  -pl SOLEXA
   java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -I markdup.bam -R hg18.fa  -varout markdup.bam.vcf -vf VCF  -pl SOLEXA
=View the content of a BAM=
=View the content of a BAM=



Revision as of 02:13, 2 June 2010

Home        Contact        Internal        Lab Members        Protocols        Publications        Research        Talks       


get Reference Genome

Download the hg18/build36 from UCSC: http://hgdownload.cse.ucsc.edu/goldenPath/hg18/chromosomes

get SamTools

http://samtools.sourceforge.net/

Index the genome for samtools

 samtools faidx hg18.fa


Align FASTQs vs the reference

BWA vs MAQ

  • repeat : (citing Biostarts) "Maq maps a repeat read randomly"
  • Mapping quality: (citing [ Biostars]):"If you want to find the SNPs, you do not really need to care about this. Maq will consider the mapping quality in genotype calling. If you want to pinpoint the structual variations with paired end reads, you should only pick up abnormal pairs with high mapping qualities (30, for example). If you are analysing ChIP-Seq data, setting a threshold on mapping quality may also be necessary."
  • dans le papier de Li and Durbin "Fast and Accurate Short Read Alignment with Burrows-Wheeler Transform" il est dit que MAQ surestime la mapping quality = proba(alignement incorrect) . BWA=a true hit can always be found. ← Fix this! english

With BWA

See main article: BWA

With MAQ

See main article about MAQ

Recalibrate

(citing)"After recalibration, the quality scores in the QUAL field in each read in the output BAM are more accurate in that the reported quality score is closer to its actual probability of mismatching the reference genome." ← Fix this! parameters, command line ?↓TODO


Remove Duplicates

(From Biostars:)Removing duplicates refers to multiple reads that match at the same position in the genome. This is different than one read (or read pair) mapping to multiple genome locations. MarkDuplicates finds sequence pairs that map to the same position, marking or removing the duplicates so you can work with unique pairs in downstream analyses. If you want them removed, use the REMOVE_DUPLICATES=true flag when running the program:


← Fix this! command line

java -jar MarkDuplicates.jar I=chr1.sorted.bam  O=chr.markdup.bam METRICS_FILE=jeter.metrics


SNP Calling

↓TODO

← Fix this! 10pb removal: how to ?

Create the VCF

 java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -I markdup.bam -R hg18.fa  -varout markdup.bam.vcf -vf VCF  -pl SOLEXA

View the content of a BAM

   samtools view output.sorted.bam chr1 | more

Consequences

Simple tool developed by Pierre.

Internal Sanger tool.

KG

Ensembl

SIFT

Polyphen

With SamTools

  samtools pileup -vcf hg18.fa  markdup.bam


Abbreviations

  • PTR: Primary target region: exons. Regions that we wanted to target
  • CTR: Capture target region (baits). Regions actually covered by baits

References

InDels

  • ParMap, an algorithm for the identification of small genomic insertions and deletions in nextgen sequencing data. Khiabanian H, Van Vlierberghe P, Palomero T, Ferrando AA, Rabadan R. BMC Res Notes. 2010 May 27;3(1):147. PMID: 20507604

Other tools