Wikiomics:RNA-Seq

From OpenWetWare
Jump to navigationJump to search

This list is intended mostly for de novo splice site / transcript / gene prediction in newly sequenced genomes. At the same time tools listed below often are used in other pipelines such as transcript quantification or SNP discovery.


Mappers

Spliced Mappers (tested)

Tophat

http://tophat.cbcb.umd.edu/

current version: 1.1.4 release 2010.11.16

base mapper: bowtie input: fastq output: BAM

Currently the most widely used program for RNA-Seq mapping. Output often processed with Cufflinks.


HMMSplicer

http://derisilab.ucsf.edu/index.php?software=105

current version: 0.9.5 2010.11.25

base mapper: bowtie

input: fastq (converts quality values to phred scale)

output: bed file of junctions

Developed in Python. Requirements:

  • OS: tested on MacOS X (authors), Linux Fedora 8,
  • Python 2.6 (tested with 2.6.4)
  • numpy (tested by authors with version 1.3.0)
  • bowtie (works with 0.12.7)

Also completes running example with Python 2.7.1rc1, numpy-1.5.1 and bowtie 0.12.7 on in-house data.

Basic command:

python runHMM.py -o output_dir -i input_RNA-seq_data.qseq  -q quality_type -g genome4mapping  -j min_intron_size -k max_intron_size -p number_of_procesors_to_use 

type: python runHMM.py --help for more explanation

Tip: you can map your reads first in a non-spliced mode with a mapper of your choice, filter out all mapped reads and feed HMMsplicer with just unmapped reads.

Caveat: due to training process you have to use reads of the same length.

SOAPals

http://soap.genomics.org.cn/soapals.html

current version: 1.1 , 05-05-2010

The SOAPals website provides exact informations how to install and run it.

GEM

http://sourceforge.net/apps/mediawiki/gemlibrary/index.php?title=The_GEM_library

current version: GEM-binaries-Linux-x86_64-20100419-003425.tbz2 base mapper: GEM split-mapper

Developed in Erlang and Python. Two step mapping (unspliced mode first, then unmapped reads are mapped with splicing).

GMAP/GSNAP

http://research-pub.gene.com/gmap/

current version: 2010-07-27

FastA and FASTQ input, support for paired ends.

ERANGE

http://woldlab.caltech.edu/rnaseq/

current version: 3.2.1 from 2010.08.10

base mapper: bowtie or blat


SOLiD data only

(untested)

SplitSeek

http://solidsoftwaretools.com/gf/project/splitseek/

current version: 1.3.2


Ameur A, Wetterbom A, Feuk L, Gyllensten U. Global and unbiased detection of splice junctions from RNA-seq data. Genome Biol. 2010 Mar 17;11(3):R34.

Developed in Perl.

RNA-mate

http://solidsoftwaretools.com/gf/project/rnamate

current version: 1.01

Spliced Mappers (in developement)

PALMapper (fusion of GenomeMapper & QPALMA)

http://www.fml.tuebingen.mpg.de/raetsch/suppl/palmapper current version: palmapper-0.4-rc3.tar.gz 05/09/10

Simple installation (run "make" in installation directory). To check the install go to "testcase" and run "make" again. This requires fast Internet connection as it downloads genome files. Output files in a tabulated, but no gff format.

Creating index:

pmindex -i genome_file.fa -v

Command used to map testcase:

palmapper -i data/c_elegans.WS209.dna.fa -q data/split_1m.000 -acc data/C_elegans_SpliceSitePred_WS209/acc_pred.bspf/contig_%i%c -don data/C_elegans_SpliceSitePred_WS209/don_pred.bspf/contig_%i%c  -o data/split_1m.000.mapped -u data/split_1m.000.unmapped -H data/split_1m.000.spliced -filter-max-mismatches 3 -filter-max-gaps 0 -filter-splice-min-edit 2 -filter-splice-region 5 -f bedx -qpalma data/parameters.qpalma -qpalma-use-map-max-len 2000 -report-map-read -report-spliced-read  -report-map-region -report-splice-sites 0.9 -M 6 -G 2 -E 6 -l 18 -L 35 -K 12 -C 55 -I 25000 -NI 2 -SA 5 -CT 10 -z 10 -S -seed-hit-cancel-threshold 10000"

CAVEAT: It requires QPALMA parameter files. These seem to be both species and tissue specific, plus there is a distinction between paired vs unpaired parameter files i.e. human_HepG2_left_l75.qpalma. For creating these one needs to install QPALMA itself (unsuccessful install in the past, not tested recently).

Mapsplice

http://www.netlab.uky.edu/p/bioinfo/MapSplice/

current version: MapSplice 1.14.1 2010.09.30

base mapper: bowtie

SpliceMap

http://www.stanford.edu/group/wonglab/SpliceMap/

current version: 3.3.5.2 2010.10.23

base mapper (preferred): bowtie (others possible) "Currently, only the cannoical GT-AG splice sites are identified."


Requirements:

  • 8GB minimum for human genome, 16GB recommended
  • input formats: RAW, FASTQ or FASTA
  • Read >= 50bp

Base mappers:

  • Bowtie (preferred)
  • others: SeqMap, Eland

Alexa-Seq

http://www.alexaplatform.org/alexa_seq/downloads.htm

Malachi Griffith, Griffith OL, Mwenifumbo J, Morin RD, Goya R, Tang MJ, Hou YC, Pugh TJ, Robertson G, Chittaranjan S, Ally A, Asano JK, Chan SY, Li I, McDonald H, Teague K, Zhao Y, Zeng T, Delaney AD, Hirst M, Morin GB, Jones SJM, Tai IT, Marco A. Marra. Alternative expression analysis by RNA sequencing. Nature Methods. 2010 Oct;7(10):843-847.

version: ALEXA_Seq_v.1.13.tar.gz 2010.12.02 (check)

Available configured virtual machines (for VMware) ver. 1.12

TAU

http://mocklerlab-tools.cgrb.oregonstate.edu/TAU.html

current version: 1.4 2010.09.06

Transcriptome Assembly Utility: requires already mapped input. Compatible mappers: Blat, Eland and HashMatch. Also accepts gff3 files.

SAW (method no software yet)

Ning K, Fermin D (2010) SAW: A Method to Identify Splicing Events from RNA-Seq Data Based on Splicing Fingerprints. PLoS ONE 5(8): e12047. doi:10.1371/journal.pone.0012047

Spliced Mappers (old)

GMORSE

http://www.genoscope.cns.fr/externe/gmorse/ Proper name: G-Mo.R-Se current version: 06-Nov-2009

It was used for Vitis vinifera genome project.

Not spliced

Mapping short reads to draft genome sequence with multiple contigs poses problems for current spliced mappers.

blat

http://genome.ucsc.edu/FAQ/FAQblat.html

Detailed description: http://genome.ucsc.edu/goldenPath/help/blatSpec.html

Options used to produce hints for Augustus gene prediction program: (based on: http://augustus.gobics.de/binaries/readme.rnaseq.html)

blat -noHead -stepSize=5 -minIdentity=93 genome.masked.fa rnaseq.fa ali.psl

bahlerlab (nature Protocols 200 Defining transcribed regions using RNA-seq Brian T Wilhelm, Samuel Marguerat, Ian Goodhead & Jürg Bähler http://www.bahlerlab.info/docs/nprot.2009.229.pdf

blat -noHead  -out=psl -oneOff=1  -tileSize=8 FASTA_genome.txt FASTA_sequences.txt Output.bsl

Pash

Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics. 2010 Nov 23;11(1):572 Authors: Coarfa C, Yu F, Miller CA, Chen Z, Harris RA, Milosavljevic A

Download: http://www.brl.bcm.tmc.edu/pash/pashDownload.rhtml

current version: 3.0.6.2

last

http://last.cbrc.jp/

Latest: last-149.zip 29-Nov-2010

requirements: min 2GB RAM/mammalian genome, 16-20GB recommended for optimal performance

Installation:

cd src; make

Creating genomic database and short reads mapping:

#db creation
lastdb  -m1111110 -s20G -v my_genome_db my_genome.fasta

#mapping
lastal -Q3 -o reads_vs_my_genome_db.out -f 0 -v my_genome_db reads.fastq

where: -Q3: fastq Illumina format -f 0: output in tabulated format -v: verbose (prints what it is doing)

last can map reads with indels and truncate large parts of the reads (highly sensitive but with lower specificity). For example it can report just 30 nucleotide long matches out of 54nn long queries. Output needs to be filtered from spurious matches.

It does not have multiple processor option, so for faster mapping one has to split fastq file(s), run last in parallel and combine the results (or use Hadoop).

Since version 149 it is possible to get SAM output by two step procedure:

#get MAF output first
lastal -Q3 -o reads_vs_my_genome_db.maf -f 1 -v my_genome_db reads.fastq

#convert MAF to SAM using maf-convert.py from scripts directory 
maf-convert.py sam reads_vs_my_genome_db.maf > reads_vs_my_genome_db.sam

last SAM file format not tested yet with other programs (dk 2010-11-29)