Short read toolbox

From OpenWetWare
Jump to navigationJump to search

Short read toolbox

This page has been created to help list resources for working with next generation sequence data.


Online short-read resources

List of sequence format information

  • Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
  • FASTQ - Wikipedia's FASTQ page.
  • FASTA - Wikipedia's FASTA page.

List of alignment format information

List of short-read quality control software

  • TileQC - Requires R, RMySQL and MySQL.
  • FastQC - A quality control tool for high throughput sequence data. A Java application.
  • Short Read Toolbox - Scripts for quality control of Illumina data.

List of open source de novo assemblers

  • Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
  • Edena - 32 and 64 bit Linux.
  • ABySS - Multi-threaded de novo assembly.
  • Ray - Multi-threaded de novo assembly.
  • QSRA - Utilizes quality scores.

List of open source reference guided assemblers

  • SOAP - Short Oligonucleotide Analysis Package.
  • MAQ - Mapping and Assembly with Qualities.
  • Bowtie - Bowtie. An ultrafast, memory-efficient short read aligner.
  • BWA - Burrows-Wheeler aligner.
  • RGA - Perl script which calls blat to assemble short reads.

Hybrid assemblers (reference guided & de novo)

List of assembly viewers

  • Tablet - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
  • SAMtools - SAMtools.

List of alignment programs

  • MAFFT - MAFFT.
  • T-Coffee - T-Coffee.
  • Muscle - Muscle.
  • LASTZ - LASTZ, hosted at the Miller lab.
  • MUMmer - MUMmer.
  • Mulan Multiple Sequence Alignment and Visualization Tool.
  • VISTA Tools for Comparative Genomics.
  • mauve - Multiple (bacterial) genome aligment.

List of nucleotide sequence query programs

Perl

A very brief example to demonstrate file input/output.

Code:

#!/usr/bin/perl
use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
while(<$in>){
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
}
close $in or die "$in: $!";
close $out or die "$out: $!";
  • perlintro - Introduction to perl with links to other documentation.
  • BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).

Python

R project

Useful links