Short read toolbox: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(11 intermediate revisions by 2 users not shown)
Line 1: Line 1:
=Short read toolbox=


This page has been created to help list resources for working with high throughput sequencing data.
This page was created to help list resources for working with high throughput sequencing data. You can also check out our individual lab pages to see updates on methods -- [[Cronn Lab]] or [[Liston:Lab]] -- to see updates on methods .


=Short Read Workshop=
=Disambiguation=
Download Presentations and Training Modules from our Recent Short Read Workshop, "An introduction to next-generation sequencing". Presented at the Botany 2010 Meeting in Providence, R.I.
Short read toolbox may refer to:
 
* [http://brianknaus.com Short read toolbox] - The website of Brian J. Knaus.
* [[media:Botany2010_workshop_agenda.pdf| Meeting Agenda]]
*[[Short read toolbox Botany2010]] - Resources provided at the Botany 2010 conference.
* [[media:Botany2010_workshop_summaries.pdf| Presentation Summaries and Suggested Reading]]
*[[Short read toolbox Botany2012]] - Resources provided at the Botany 2012 conference.
* [[media:Botany2010_workshop_training.pdf| Example Module A: Assembling chloroplast genomes from short reads]]
* [[media:Botany2010_workshop_training.pdf| Example Module B: Programs and data to download]]
 
=Platforms=
Currently available platforms:
*[http://www.illumina.com/ Illumina] - Illumina (formerly Solexa).
*[http://www.454.com/ 454] - 454/Roche.
*[http://www.appliedbiosystems.com/absite/us/en/home/applications-technologies/solid-next-generation-sequencing.html SOLiD] - ABI by Life Technologies.
 
Anticipated technologies:
*[http://www.iontorrent.com/ Ion Torrent Semiconductor]  - Ion Torrent.
*[http://www.pacificbiosciences.com/ SMRT] - Pacific BioSciences.
*[http://www.nanoporetech.com/ Nanopore] - Oxford Nanopore Technologies.
 
=Online short-read resources=
*[http://seqanswers.com/ SEQanswers] - Online forum for next generation sequencing.
*[http://seqanswers.com/forums/showthread.php?t=43 SEQanswers software post] - Post of software available for next generation sequence data.
*[http://seqanswers.com/wiki/Category:Bioinformatics_application SEQwiki] - SEQ Answers wikilist of bioinformatic applications.
*[http://pathogenomics.bham.ac.uk/blog/2009/09/tips-for-de-novo-bacterial-genome-assembly/ De novo tips] - Blog on de novo assembly.
*[http://genome.ucsc.edu/index.html UCSC Bioinformatics] - UC Santa Cruz's bioinformatics server.
*[http://www.phylo.org/ Cipres] - Cipres.
*[http://gmod.org/wiki/Main_Page GMOD] - Generic model organism database (GMOD) project collection of tools.
*[http://www.biopieces.org] - Collection of bioinformatic tools
*[ftp://ftp.illumina.com/ Illumina Manuals] username: guest password: illumina
 
=Sequence format information=
*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
*[http://en.wikipedia.org/wiki/FASTQ_format FASTQ] - Wikipedia's FASTQ page.
*[http://en.wikipedia.org/wiki/FASTA_format FASTA] - Wikipedia's FASTA page.
 
=Alignment format information=
*[http://samtools.sourceforge.net/ SAMtools] - SAMtools.
*[http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS AMOS] - AMOS.
*[http://genome.ucsc.edu/FAQ/FAQformat.html UCSC] - UCSC's faq on file formats.
 
=Short-read quality control software=
*[http://www.science.oregonstate.edu/~dolanp/tileqc/index.html TileQC] - Requires R, RMySQL and MySQL.
*[http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ FastQC] - A quality control tool for high throughput sequence data. A Java application.
*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Scripts for quality control of Illumina data.
 
=Open source de novo assemblers=
*[http://www.ebi.ac.uk/~zerbino/velvet/ Velvet] - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
*[http://www.genomic.ch/edena.php Edena] - 32 and 64 bit Linux.
*[http://www.bcgsc.ca/platform/bioinfo/software/abyss ABySS] - Multi-threaded de novo assembly.
*[http://sourceforge.net/apps/mediawiki/denovoassembler/index.php?title=Main_Page Ray] - Multi-threaded de novo assembly.
 
*[http://qsra.cgrb.oregonstate.edu/ QSRA] - Utilizes quality scores.
 
=Open source reference guided assemblers=
*[http://soap.genomics.org.cn/index.html SOAP] -  Short Oligonucleotide Analysis Package.
*[http://maq.sourceforge.net/ MAQ] - Mapping and Assembly with Qualities.
*[http://bowtie-bio.sourceforge.net/index.shtml Bowtie] - Bowtie. An ultrafast, memory-efficient short read aligner.
*[http://bio-bwa.sourceforge.net/ BWA] - Burrows-Wheeler aligner.
*[http://rga.cgrb.oregonstate.edu/ RGA] - Perl script which calls blat to assemble short reads.
 
=Hybrid assemblers (reference guided & de novo)=
*[http://www.bx.psu.edu/miller_lab/ YASRA] - Yet Another Short Read Aligner.
*[http://etda.libraries.psu.edu/theses/approved/WorldWideIndex/ETD-4698/index.html Aakrosh Ratan dissertation] - Description of YASRA.
*[[Liston:Computer_Scripts]] - Scripts for post-processing of YASRA contigs.
 
=RNA-Seq / Transcriptome=
*[http://tophat.cbcb.umd.edu/ TopHat] - A fast splice junction mapper for RNA-Seq reads.
*[http://cufflinks.cbcb.umd.edu/ Cufflinks] - Assembles transcripts, estimates their abundances, and tests for differential expression and regulation.
*[http://supersplat.cgrb.oregonstate.edu/ SuperSplat] - Splice junction discovery.
 
=Assembly viewers=
*[http://bioinf.scri.ac.uk/tablet/ Tablet] - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats.
*[http://samtools.sourceforge.net/ SAMtools] - SAMtools.
 
=Alignment programs=
*[http://align.bmr.kyushu-u.ac.jp/mafft/online/server/ MAFFT] - MAFFT.
*[http://www.ebi.ac.uk/Tools/t-coffee/index.html T-Coffee] - T-Coffee.
*[http://www.ebi.ac.uk/Tools/muscle/index.html Muscle] - Muscle.
*[http://www.bx.psu.edu/miller_lab/ LASTZ] - LASTZ, hosted at the Miller lab.
*[http://mummer.sourceforge.net/ MUMmer] - MUMmer.
*[http://mulan.dcode.org/ Mulan] Multiple Sequence Alignment and Visualization Tool.
*[http://genome.lbl.gov/vista/ VISTA] Tools for Comparative Genomics.
*[http://asap.ahabs.wisc.edu/software/mauve/ mauve] - Multiple (bacterial) genome aligment.
 
=Sequence query programs=
*[http://blast.ncbi.nlm.nih.gov/Blast.cgi BLAST] - BLAST.
*[http://bioinfo.noble.org/plan/ PLAN] - A web application for conducting, organizing, and mining large-scale BLAST searches (limited to 1,000 queries).
*[http://genome.ucsc.edu/goldenPath/help/hgTracksHelp.html#BLATAlign BLAT] - BLAT.
 
=Linux=
*[[media:Essential_Linux.pdf | Essential Linux]]
 
=Perl=
A very brief example to demonstrate file input/output.
 
Code:<br>
<pre>
#!/usr/bin/perl
use strict;
use warnings;
my (@temp, $in, $out);
my $inf = "data.fq";
my $outf = "data_out.fq";
open($in, "<", $inf) or die "Can't open $inf: $!";
open($out, ">", $outf) or die "Can't open $outf: $!";
while(<$in>){
  chomp($temp[0]=$_); # First line is an identifier.
  chomp($temp[1]=<$in>); # Second line is sequence.
  chomp($temp[2]=<$in>); # Third line is an identifier.
  chomp($temp[3]=<$in>); # Fourth line is quality.
  print $out join("\t", @temp)."\n";
}
close $in or die "$in: $!";
close $out or die "$out: $!";
</pre>
*[http://perldoc.perl.org/perlintro.html perlintro] - Introduction to perl with links to other documentation.
*[http://www.bioperl.org/wiki/HOWTO:Beginners BioPerl beginners] - Introduction to BioPerl (be prepared for object oriented code).
 
=Python=
*[http://docs.python.org/tutorial/ Python tutorial]
*[http://biopython.org/wiki/Biopython Biopython]
 
=R project=
*[http://www.r-project.org/ R project] - Statistical programming environment.
*[http://www.bioconductor.org/ Bioconductor] - R for biologists (micro-array and next generation data).
*[http://ape.mpl.ird.fr/ APE] - Analysis of phylogenetics and evolution R package.
*[http://manuals.bioinformatics.ucr.edu/home/ht-seq/ HT Sequence Analysis with R and Bioconductor]
 
=Useful links=
*[[User:Brian J. Knaus]]
*[[Cronn Lab]]
*[[Liston:Lab | Liston Lab]]

Latest revision as of 14:38, 7 June 2012

This page was created to help list resources for working with high throughput sequencing data. You can also check out our individual lab pages to see updates on methods -- Cronn Lab or Liston:Lab -- to see updates on methods .

Disambiguation

Short read toolbox may refer to: