User talk:Emilio Palumbo/g2f rna

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
Line 5: Line 5:
===Analyses workflow===
===Analyses workflow===
-
An example of a standard workflow we use for RNAseq analyses can be seen [[http://genome.crg.es/~epalumbo/blueprint/crg-pipeline.pdf here]].
+
An example of a standard workflow we use for RNAseq analyses can be seen [[http://genome.crg.es/~epalumbo/gene2farm/crg-pipeline.pdf here]].
 +
 
 +
Common analyses:
 +
 
 +
* expression quantitative traits loci (eQTL)
 +
* splicing quantitative traits loci (sQTL)
 +
* differantial gene/isoform expression
 +
 
===Mapping===
===Mapping===
Line 32: Line 39:
===Gemtools===
===Gemtools===
-
The [[http://algorithms.cnag.cat/wiki/The_GEM_library GEM mapper]] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG institutes in Barcelona. Many high-performance standalone programs (splice mapper, concersion tool, etc.) are provided along with the mapper; in general, new algorithms and tools can be easily implemented on the top of these.
+
The [[http://algorithms.cnag.cat/wiki/The_GEM_library GEM mapper]] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG in Barcelona. Many high-performance standalone programs (splice mapper, concersion tool, etc.) are provided along with the mapper; in general, new algorithms and tools can be easily implemented on the top of these.
[[http://gemtools.github.io/ Gemtools]] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.
[[http://gemtools.github.io/ Gemtools]] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.
-
The default gemtools RNAseq pipeline is shown [[http://genome.crg.es/~epalumbo/gem-pipeline.pdf here]].
+
The default gemtools RNAseq pipeline is shown [[http://genome.crg.es/~epalumbo/gene2farm/gem-pipeline.pdf here]].
 +
 
 +
 
 +
 
 +
===Element quantification===
 +
 
 +
* exon
 +
* intron
 +
* splice junction
 +
* transcript
 +
* gene
 +
 
 +
===Transcript expression quantification===
 +
 
 +
Transcript quantification is a complex problem. Quantifying the expression of a gene is simple. We just need to count the RNA-seq reads that fall within the exons of this gene. However, to quantify expression of a transcript we can have reads mapping to an exon of the gene where multiple transcripts overlap. The process of assigning a read to a certain transcript is called read deconvolution or isoform expression quantification.
 +
 
 +
For transcript expression we use [[http://www.sammeth.net/confluence/display/FLUX/Home Flux Capacitor]] developed at the CRG in Barcelona.
 +
 
===Running the pipeline===
===Running the pipeline===
Line 45: Line 69:
<pre>
<pre>
-
gemtools index genome.fa
+
gemtools index -i genome.fa
</pre>
</pre>
Line 51: Line 75:
<pre>
<pre>
-
gemtools t-index annotation.gtf -m MAX_READ_LENGTH
+
gemtools t-index -i genome.gem -a annotation.gtf -m 80
</pre>
</pre>
Line 57: Line 81:
<pre>
<pre>
-
gemtools rna-pipeline -f FASTQ_FILE -q QUALITY_OFFSET -i GENOME_INDEX -a ANNOTATION_FILE -t NUMBER_OF_CORES -o OUTPUT_FOLDER -m MAXIMUM_READ_LENGHT_FOR_DENOVO_JUNCTIONS
+
gemtools rna-pipeline -f sample.fastq.gz -q 33 -i genome.gem -a annotation.gtf -m 110
 +
</pre>
 +
 
 +
To run the Flux Capacitor a bam file sorted by genomic position and indexed is needed. To do this two command hav to be run:
 +
 
 +
<pre>
 +
samtools sort my_file.bam my_file_sorted
 +
samtools index my_file_sorted.bam
 +
</pre>
 +
 
 +
You can then run the transcript quantifications in the following way:
 +
 
 +
<pre>
 +
flux-capacitor -i sample.bam -a annotation.gtf -o sample.gtf
</pre>
</pre>

Revision as of 06:24, 15 November 2013

Contents

RNAseq

image1.png

Analyses workflow

An example of a standard workflow we use for RNAseq analyses can be seen [here].

Common analyses:

  • expression quantitative traits loci (eQTL)
  • splicing quantitative traits loci (sQTL)
  • differantial gene/isoform expression


Mapping

Specific variables to consider when mapping RNAseq:

  • intron size
  • overhang (number of bases from each side of the junction that should be covered by a certain read)
  • splice site consensus (canonical, extended, non-canonical)
  • donor/acceptor splice site consensus sequences
  • junction “filtering”:
    • chromosome/strand
    • block order
    • min/max distance

Input files

To perform RNAseq analysis we need:

  • reference genome sequence
  • reference gene annotation
  • sequences

Important note: Please make sure the contig names for you reference genome and annotation correspond.

Gemtools

The [GEM mapper] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG in Barcelona. Many high-performance standalone programs (splice mapper, concersion tool, etc.) are provided along with the mapper; in general, new algorithms and tools can be easily implemented on the top of these.

[Gemtools] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.

The default gemtools RNAseq pipeline is shown [here].


Element quantification

  • exon
  • intron
  • splice junction
  • transcript
  • gene

Transcript expression quantification

Transcript quantification is a complex problem. Quantifying the expression of a gene is simple. We just need to count the RNA-seq reads that fall within the exons of this gene. However, to quantify expression of a transcript we can have reads mapping to an exon of the gene where multiple transcripts overlap. The process of assigning a read to a certain transcript is called read deconvolution or isoform expression quantification.

For transcript expression we use [Flux Capacitor] developed at the CRG in Barcelona.


Running the pipeline

The following step are needed to run the gemtools rnaseq pipeline:

  • Genome indexing:
gemtools index -i genome.fa
  • Transcriptome generation and indexing:
gemtools t-index -i genome.gem -a annotation.gtf -m 80

After those steps completed successfully you can run the pipeline:

gemtools rna-pipeline -f sample.fastq.gz -q 33 -i genome.gem -a annotation.gtf -m 110

To run the Flux Capacitor a bam file sorted by genomic position and indexed is needed. To do this two command hav to be run:

samtools sort my_file.bam my_file_sorted
samtools index my_file_sorted.bam

You can then run the transcript quantifications in the following way:

flux-capacitor -i sample.bam -a annotation.gtf -o sample.gtf
Personal tools