Revision as of 18:18, 23 July 2010

Short read toolbox

This page has been created to help list resources for working with next generation sequence data.

Online short-read resources

SEQanswers - Online forum for next generation sequencing.
SEQanswers software post - Post of software avaliable for next generation sequence data.
SEQwiki - SEQ Answers wikilist of bioinformatic applications.
De novo tips - Blog on de novo assembly.
UCSC Bioinformatics - UC Santa Cruz's bioinformatics server.
Cipres - Cipres.
GMOD - Generic model organism database (GMOD) project collection of tools.

List of sequence format information

Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
FASTQ - Wikipedia's FASTQ page.
FASTA - Wikipedia's FASTA page.

List of alignment format information

SAMtools - SAMtools.
AMOS - AMOS.
UCSC - UCSC's faq on file formats.

List of short-read quality control software

TileQC - Requires R, RMySQL and MySQL.
Short Read Toolbox - Scripts for quality control of Illumina data.

List of open source de novo assemblers

Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
Edena - 32 and 64 bit Linux.
ABySS - Multi-threaded de novo assembly.
Ray - Multi-threaded de novo assembly.

QSRA - Utilizes quality scores.

List of open source reference guided assemblers

MAQ - Mapping and Assembly with Qualities.
Bowtie - Bowtie.
BWA - Burrows-Wheeler aligner.
RGA - Perl script which calls blat to assemble short reads.

Hybrid assemblers (reference guided & de novo)

YASRA - Yet Another Short Read Aligner.
Liston:Computer_Scripts - scripts for post-processing of YASRA contigs.

List of assembly viewers

Tablet - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
SAMtools - SAMtools.

List of alignment programs

MAFFT - MAFFT.
T-Coffee - T-Coffee.
Muscle - Muscle.
LASTZ - LASTZ, hosted at the Miller lab.
MUMmer - MUMmer.
Mulan Multiple Sequence Alignment and Visualization Tool.
VISTA Tools for Comparative Genomics.
mauve - Multiple (bacterial) genome aligment.

List of nucleotide sequence query programs

BLAST - BLAST.
BLAT - BLAT.

Perl

A very brief example to demonstrate file input/output.

Code:

#!/usr/bin/perl
use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
while(<$in>){
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
}
close $in or die "$in: $!";
close $out or die "$out: $!";

perlintro - Introduction to perl with links to other documentation.
BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).

Python

R project

R project - Statistical programming environment.
Bioconductor - R for biologists (micro-array and next generation data).
APE - Analysis of phylogenetics and evolution R package.
HT Sequence Analysis with R and Bioconductor

@@ Line 11: / Line 11: @@
 *[http://genome.ucsc.edu/index.html UCSC Bioinformatics] - UC Santa Cruz's bioinformatics server.
 *[http://www.phylo.org/ Cipres] - Cipres.
-*[http://gmod.org/wiki/Main_Page GMOD] - GMOD.
+*[http://gmod.org/wiki/Main_Page GMOD] - Generic model organism database (GMOD) project collection of tools.
-=List of short-read quality control software=
-*[http://www.science.oregonstate.edu/~dolanp/tileqc/index.html TileQC] - Requires R, RMySQL and MySQL.
-*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Scripts for quality control of Illumina data.
 =List of sequence format information=
-*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Descriptions and examples of qseq, scarf, fastq and fasta formats.  Includes scripts to translate these formats to fastq format.
+*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Descriptions and examples of qseq, scarf, fastq and fasta formats.  Includes scripts to translate these formats to the fastq format standard.
 *[http://en.wikipedia.org/wiki/FASTQ_format FASTQ] - Wikipedia's FASTQ page.
 *[http://en.wikipedia.org/wiki/FASTA_format FASTA] - Wikipedia's FASTA page.
@@ Line 26: / Line 22: @@
 *[http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS AMOS] - AMOS.
 *[http://genome.ucsc.edu/FAQ/FAQformat.html UCSC] - UCSC's faq on file formats.
+=List of short-read quality control software=
+*[http://www.science.oregonstate.edu/~dolanp/tileqc/index.html TileQC] - Requires R, RMySQL and MySQL.
+*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Scripts for quality control of Illumina data.
 =List of open source de novo assemblers=
@@ Line 46: / Line 46: @@
 =List of assembly viewers=
-*[http://bioinf.scri.ac.uk/tablet/ Tablet]
+*[http://bioinf.scri.ac.uk/tablet/ Tablet] - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
 *[http://samtools.sourceforge.net/ SAMtools] - SAMtools.
@@ Line 53: / Line 53: @@
 *[http://www.ebi.ac.uk/Tools/t-coffee/index.html T-Coffee] - T-Coffee.
 *[http://www.ebi.ac.uk/Tools/muscle/index.html Muscle] - Muscle.
+*[http://www.bx.psu.edu/miller_lab/ LASTZ] - LASTZ, hosted at the Miller lab.
 *[http://mummer.sourceforge.net/ MUMmer] - MUMmer.
 *[http://mulan.dcode.org/ Mulan] Multiple Sequence Alignment and Visualization Tool.

Short read toolbox: Difference between revisions

Revision as of 18:18, 23 July 2010

Contents

Short read toolbox

Online short-read resources

List of sequence format information

List of alignment format information

List of short-read quality control software

List of open source de novo assemblers

List of open source reference guided assemblers

Hybrid assemblers (reference guided & de novo)

List of assembly viewers

List of alignment programs

List of nucleotide sequence query programs

Perl

Python

R project

Useful links

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools