Short read toolbox
From OpenWetWare
Short read toolbox
This page has been created to help list resources for working with high throughput sequencing data.
Short Read Workshop
Download Presentations and Training Modules from our Recent Short Read Workshop, "An introduction to next-generation sequencing". Presented at the Botany 2010 Meeting in Providence, R.I.
- Meeting Agenda
- Presentation Summaries and Suggested Reading
- Example Module A: Assembling chloroplast genomes from short reads
- Example Module B: Programs and data to download
Platforms
Currently available platforms:
Anticipated technologies:
- Ion Torrent Semiconductor - Ion Torrent.
- SMRT - Pacific BioSciences.
- Nanopore - Oxford Nanopore Technologies.
Online short-read resources
- SEQanswers - Online forum for next generation sequencing.
- SEQanswers software post - Post of software available for next generation sequence data.
- SEQwiki - SEQ Answers wikilist of bioinformatic applications.
- De novo tips - Blog on de novo assembly.
- UCSC Bioinformatics - UC Santa Cruz's bioinformatics server.
- Cipres - Cipres.
- GMOD - Generic model organism database (GMOD) project collection of tools.
- [1] - Collection of bioinformatic tools
- Illumina Manuals username: guest password: illumina
Sequence format information
- Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
- FASTQ - Wikipedia's FASTQ page.
- FASTA - Wikipedia's FASTA page.
Alignment format information
Short-read quality control software
- TileQC - Requires R, RMySQL and MySQL.
- FastQC - A quality control tool for high throughput sequence data. A Java application.
- Short Read Toolbox - Scripts for quality control of Illumina data.
Open source de novo assemblers
- Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
- Edena - 32 and 64 bit Linux.
- ABySS - Multi-threaded de novo assembly.
- Ray - Multi-threaded de novo assembly.
- QSRA - Utilizes quality scores.
Open source reference guided assemblers
- SOAP - Short Oligonucleotide Analysis Package.
- MAQ - Mapping and Assembly with Qualities.
- Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
- BWA - Burrows-Wheeler aligner.
- RGA - Perl script which calls blat to assemble short reads.
Hybrid assemblers (reference guided & de novo)
- YASRA - Yet Another Short Read Aligner.
- Aakrosh Ratan dissertation - Description of YASRA.
- Liston:Computer_Scripts - Scripts for post-processing of YASRA contigs.
RNA-Seq / Transcriptome
- TopHat - A fast splice junction mapper for RNA-Seq reads.
- Cufflinks - Assembles transcripts, estimates their abundances, and tests for differential expression and regulation.
- SuperSplat - Splice junction discovery.
Assembly viewers
Alignment programs
- MAFFT - MAFFT.
- T-Coffee - T-Coffee.
- Muscle - Muscle.
- LASTZ - LASTZ, hosted at the Miller lab.
- MUMmer - MUMmer.
- Mulan Multiple Sequence Alignment and Visualization Tool.
- VISTA Tools for Comparative Genomics.
- mauve - Multiple (bacterial) genome aligment.
Sequence query programs
- BLAST - BLAST.
- PLAN - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
- BLAT - BLAT.
Linux
Perl
A very brief example to demonstrate file input/output.
Code:
#!/usr/bin/perl use strict; use warnings; my (@temp, $in, $out); my $inf = "data.fq"; my $outf = "data_out.fq"; open($in, "<", $inf) or die "Can't open $inf: $!"; open($out, ">", $outf) or die "Can't open $outf: $!"; while(<$in>){ chomp($temp[0]=$_); # First line is an identifier. chomp($temp[1]=<$in>); # Second line is sequence. chomp($temp[2]=<$in>); # Third line is an identifier. chomp($temp[3]=<$in>); # Fourth line is quality. print $out join("\t", @temp)."\n"; } close $in or die "$in: $!"; close $out or die "$out: $!";
- perlintro - Introduction to perl with links to other documentation.
- BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).
Python
R project
- R project - Statistical programming environment.
- Bioconductor - R for biologists (micro-array and next generation data).
- APE - Analysis of phylogenetics and evolution R package.
- HT Sequence Analysis with R and Bioconductor