User talk:Darek Kedra/sandbox 29

From OpenWetWare

Jump to: navigation, search


Contents

EMBO Tunis 2014

From sequencing data to knowledge

00 Programs used

sequence pre-processing

general tools

mappers

  • BWA ver 0.7.10
  • LAST ver 475
  • Stampy stampy-1.0.23r2059.tgz (optional)

Splice reader mappings

viewers

quantification

SNPs discovery

01 Data files used

FASTQ files

L.amazonensis RNA-Seq

L mexicana genomic DNA

(extra set) L.enriettii genomic DNA

Stuff to read / compare

File formats


VCF

BED

GFF / GTF

Genomes and annotations

  • L mexicana

http://tritrypdb.org/common/downloads/release-8.0/LmexicanaMHOMGT2001U1103/fasta/data/TriTrypDB-8.0_LmexicanaMHOMGT2001U1103_Genome.fasta

http://tritrypdb.org/common/downloads/release-8.0/LmexicanaMHOMGT2001U1103/gff/data/TriTrypDB-8.0_LmexicanaMHOMGT2001U1103.gff

  • L.amazonensis

http://tritrypdb.org/common/downloads/release-8.0/LamazonensisMHOMBR71973M2269/fasta/data/TriTrypDB-8.0_LamazonensisMHOMBR71973M2269_Genome.fasta

  • L.enriettii

http://tritrypdb.org/common/downloads/release-8.0/LenriettiiLEM3045/fasta/data/TriTrypDB-8.0_LenriettiiLEM3045_Genome.fasta

  • L.major

http://tritrypdb.org/common/downloads/release-8.0/LmajorFriedlin/fasta/data/TriTrypDB-8.0_LmajorFriedlin_Genome.fasta

Extra material

(optional) Stampy

Stampy is a quite slow but at times more accurate mapper, allowing for improvement over simple BWA mappings. The basic usage is as follows:

#creating two special index files 

stampy.py --species=Lmex --assembly=Lmex_toyasembly -G Lmex_toygenome Lmex_genome.nfix.fa
#Result: 
Lmex_toygenome.stidx

stampy.py -g Lmex_toygenome -H Lmex_toyasembly   
#Result:
Lmex_toyasembly.sthash

#remapping reads already mapped with BWA (prefered option)
stampy.py -g Lmex_toygenome -h Lmex_toyasembly -t2 --bamkeepgoodreads -M LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.bam  > LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.stampy.sam

#convert SAM to BAM, sort and index BAM file:
java -jar ~/soft/picard_1.119/SortSam.jar \
I=LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.stampy.sam \ O=LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.stampy.bam \
SO=coordinate VALIDATION_STRINGENCY=SILENT CREATE_INDEX=true

#Result:
LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.stampy.bam
LmxM.01_ERR307343_12.Lmex.bwa_mem.Lmex.stampy.bam

Mapping looking worse with this data than bwa


Quantifications of mapped reads

  • Gene quantifications (DNA & RNA levels)

Finding gene ends by mapping post-splice leader and polyA sequences

Personal tools