User talk:Emilio Palumbo/g2f rna: Difference between revisions
No edit summary |
No edit summary |
||
Line 5: | Line 5: | ||
===Analyses workflow=== | ===Analyses workflow=== | ||
An example of a standard workflow we use for RNAseq analyses can be seen [[http://genome.crg.es/~epalumbo/ | An example of a standard workflow we use for RNAseq analyses can be seen [[http://genome.crg.es/~epalumbo/gene2farm/crg-pipeline.pdf here]]. | ||
Common analyses: | |||
* expression quantitative traits loci (eQTL) | |||
* splicing quantitative traits loci (sQTL) | |||
* differantial gene/isoform expression | |||
===Mapping=== | ===Mapping=== | ||
Line 32: | Line 39: | ||
===Gemtools=== | ===Gemtools=== | ||
The [[http://algorithms.cnag.cat/wiki/The_GEM_library GEM mapper]] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG | The [[http://algorithms.cnag.cat/wiki/The_GEM_library GEM mapper]] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG in Barcelona. Many high-performance standalone programs (splice mapper, concersion tool, etc.) are provided along with the mapper; in general, new algorithms and tools can be easily implemented on the top of these. | ||
[[http://gemtools.github.io/ Gemtools]] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data. | [[http://gemtools.github.io/ Gemtools]] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data. | ||
The default gemtools RNAseq pipeline is shown [[http://genome.crg.es/~epalumbo/gem-pipeline.pdf here]]. | The default gemtools RNAseq pipeline is shown [[http://genome.crg.es/~epalumbo/gene2farm/gem-pipeline.pdf here]]. | ||
===Element quantification=== | |||
* exon | |||
* intron | |||
* splice junction | |||
* transcript | |||
* gene | |||
===Transcript expression quantification=== | |||
Transcript quantification is a complex problem. Quantifying the expression of a gene is simple. We just need to count the RNA-seq reads that fall within the exons of this gene. However, to quantify expression of a transcript we can have reads mapping to an exon of the gene where multiple transcripts overlap. The process of assigning a read to a certain transcript is called read deconvolution or isoform expression quantification. | |||
For transcript expression we use [[http://www.sammeth.net/confluence/display/FLUX/Home Flux Capacitor]] developed at the CRG in Barcelona. | |||
===Running the pipeline=== | ===Running the pipeline=== | ||
Line 45: | Line 69: | ||
<pre> | <pre> | ||
gemtools index genome.fa | gemtools index -i genome.fa | ||
</pre> | </pre> | ||
Line 51: | Line 75: | ||
<pre> | <pre> | ||
gemtools t-index annotation.gtf -m | gemtools t-index -i genome.gem -a annotation.gtf -m 80 | ||
</pre> | </pre> | ||
Line 57: | Line 81: | ||
<pre> | <pre> | ||
gemtools rna-pipeline -f | gemtools rna-pipeline -f sample.fastq.gz -q 33 -i genome.gem -a annotation.gtf -m 110 | ||
</pre> | |||
To run the Flux Capacitor a bam file sorted by genomic position and indexed is needed. To do this two command hav to be run: | |||
<pre> | |||
samtools sort my_file.bam my_file_sorted | |||
samtools index my_file_sorted.bam | |||
</pre> | |||
You can then run the transcript quantifications in the following way: | |||
<pre> | |||
flux-capacitor -i sample.bam -a annotation.gtf -o sample.gtf | |||
</pre> | </pre> |
Revision as of 03:24, 15 November 2013
RNAseq
http://rnaseq.uoregon.edu/img/image1.png
Analyses workflow
An example of a standard workflow we use for RNAseq analyses can be seen [here].
Common analyses:
- expression quantitative traits loci (eQTL)
- splicing quantitative traits loci (sQTL)
- differantial gene/isoform expression
Mapping
Specific variables to consider when mapping RNAseq:
- intron size
- overhang (number of bases from each side of the junction that should be covered by a certain read)
- splice site consensus (canonical, extended, non-canonical)
- donor/acceptor splice site consensus sequences
- junction “filtering”:
- chromosome/strand
- block order
- min/max distance
Input files
To perform RNAseq analysis we need:
- reference genome sequence
- reference gene annotation
- sequences
Important note: Please make sure the contig names for you reference genome and annotation correspond.
Gemtools
The [GEM mapper] is a mapping program for next generation sequencing developed in collaboration between CRG and CNAG in Barcelona. Many high-performance standalone programs (splice mapper, concersion tool, etc.) are provided along with the mapper; in general, new algorithms and tools can be easily implemented on the top of these.
[Gemtools] is a powerful set of high-level pipelines which greatly simplifies the use of the GEM mapper. Using gemtools one can index references and/or map several kinds of data from a simple command-line interface, without having to type complicated commands. In particular, gemtools contains a fast and accurate pipeline for mapping RNA-sequencing data.
The default gemtools RNAseq pipeline is shown [here].
Element quantification
- exon
- intron
- splice junction
- transcript
- gene
Transcript expression quantification
Transcript quantification is a complex problem. Quantifying the expression of a gene is simple. We just need to count the RNA-seq reads that fall within the exons of this gene. However, to quantify expression of a transcript we can have reads mapping to an exon of the gene where multiple transcripts overlap. The process of assigning a read to a certain transcript is called read deconvolution or isoform expression quantification.
For transcript expression we use [Flux Capacitor] developed at the CRG in Barcelona.
Running the pipeline
The following step are needed to run the gemtools rnaseq pipeline:
- Genome indexing:
gemtools index -i genome.fa
- Transcriptome generation and indexing:
gemtools t-index -i genome.gem -a annotation.gtf -m 80
After those steps completed successfully you can run the pipeline:
gemtools rna-pipeline -f sample.fastq.gz -q 33 -i genome.gem -a annotation.gtf -m 110
To run the Flux Capacitor a bam file sorted by genomic position and indexed is needed. To do this two command hav to be run:
samtools sort my_file.bam my_file_sorted samtools index my_file_sorted.bam
You can then run the transcript quantifications in the following way:
flux-capacitor -i sample.bam -a annotation.gtf -o sample.gtf