Short read toolbox: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
mNo edit summary |
No edit summary |
||
Line 11: | Line 11: | ||
*[http://genome.ucsc.edu/index.html UCSC Bioinformatics] - UC Santa Cruz's bioinformatics server. | *[http://genome.ucsc.edu/index.html UCSC Bioinformatics] - UC Santa Cruz's bioinformatics server. | ||
*[http://www.phylo.org/ Cipres] - Cipres. | *[http://www.phylo.org/ Cipres] - Cipres. | ||
*[http://gmod.org/wiki/Main_Page GMOD] - GMOD | *[http://gmod.org/wiki/Main_Page GMOD] - Generic model organism database (GMOD) project collection of tools. | ||
=List of sequence format information= | =List of sequence format information= | ||
*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to fastq format. | *[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard. | ||
*[http://en.wikipedia.org/wiki/FASTQ_format FASTQ] - Wikipedia's FASTQ page. | *[http://en.wikipedia.org/wiki/FASTQ_format FASTQ] - Wikipedia's FASTQ page. | ||
*[http://en.wikipedia.org/wiki/FASTA_format FASTA] - Wikipedia's FASTA page. | *[http://en.wikipedia.org/wiki/FASTA_format FASTA] - Wikipedia's FASTA page. | ||
Line 26: | Line 22: | ||
*[http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS AMOS] - AMOS. | *[http://sourceforge.net/apps/mediawiki/amos/index.php?title=AMOS AMOS] - AMOS. | ||
*[http://genome.ucsc.edu/FAQ/FAQformat.html UCSC] - UCSC's faq on file formats. | *[http://genome.ucsc.edu/FAQ/FAQformat.html UCSC] - UCSC's faq on file formats. | ||
=List of short-read quality control software= | |||
*[http://www.science.oregonstate.edu/~dolanp/tileqc/index.html TileQC] - Requires R, RMySQL and MySQL. | |||
*[http://brianknaus.com/software/srtoolbox/shortread.html Short Read Toolbox] - Scripts for quality control of Illumina data. | |||
=List of open source de novo assemblers= | =List of open source de novo assemblers= | ||
Line 46: | Line 46: | ||
=List of assembly viewers= | =List of assembly viewers= | ||
*[http://bioinf.scri.ac.uk/tablet/ Tablet] | *[http://bioinf.scri.ac.uk/tablet/ Tablet] - Tablet, visualizes ACE, AFG, MAQ, SOAP, SAM and BAM formats. | ||
*[http://samtools.sourceforge.net/ SAMtools] - SAMtools. | *[http://samtools.sourceforge.net/ SAMtools] - SAMtools. | ||
Line 53: | Line 53: | ||
*[http://www.ebi.ac.uk/Tools/t-coffee/index.html T-Coffee] - T-Coffee. | *[http://www.ebi.ac.uk/Tools/t-coffee/index.html T-Coffee] - T-Coffee. | ||
*[http://www.ebi.ac.uk/Tools/muscle/index.html Muscle] - Muscle. | *[http://www.ebi.ac.uk/Tools/muscle/index.html Muscle] - Muscle. | ||
*[http://www.bx.psu.edu/miller_lab/ LASTZ] - LASTZ, hosted at the Miller lab. | |||
*[http://mummer.sourceforge.net/ MUMmer] - MUMmer. | *[http://mummer.sourceforge.net/ MUMmer] - MUMmer. | ||
*[http://mulan.dcode.org/ Mulan] Multiple Sequence Alignment and Visualization Tool. | *[http://mulan.dcode.org/ Mulan] Multiple Sequence Alignment and Visualization Tool. |
Revision as of 18:18, 23 July 2010
Short read toolbox
This page has been created to help list resources for working with next generation sequence data.
Online short-read resources
- SEQanswers - Online forum for next generation sequencing.
- SEQanswers software post - Post of software avaliable for next generation sequence data.
- SEQwiki - SEQ Answers wikilist of bioinformatic applications.
- De novo tips - Blog on de novo assembly.
- UCSC Bioinformatics - UC Santa Cruz's bioinformatics server.
- Cipres - Cipres.
- GMOD - Generic model organism database (GMOD) project collection of tools.
List of sequence format information
- Short Read Toolbox - Descriptions and examples of qseq, scarf, fastq and fasta formats. Includes scripts to translate these formats to the fastq format standard.
- FASTQ - Wikipedia's FASTQ page.
- FASTA - Wikipedia's FASTA page.
List of alignment format information
List of short-read quality control software
- TileQC - Requires R, RMySQL and MySQL.
- Short Read Toolbox - Scripts for quality control of Illumina data.
List of open source de novo assemblers
- Velvet - Implements De Bruijn Graphs in C. Requires 64 bit Linux OS.
- Edena - 32 and 64 bit Linux.
- ABySS - Multi-threaded de novo assembly.
- Ray - Multi-threaded de novo assembly.
- QSRA - Utilizes quality scores.
List of open source reference guided assemblers
- MAQ - Mapping and Assembly with Qualities.
- Bowtie - Bowtie.
- BWA - Burrows-Wheeler aligner.
- RGA - Perl script which calls blat to assemble short reads.
Hybrid assemblers (reference guided & de novo)
- YASRA - Yet Another Short Read Aligner.
- Liston:Computer_Scripts - scripts for post-processing of YASRA contigs.
List of assembly viewers
List of alignment programs
- MAFFT - MAFFT.
- T-Coffee - T-Coffee.
- Muscle - Muscle.
- LASTZ - LASTZ, hosted at the Miller lab.
- MUMmer - MUMmer.
- Mulan Multiple Sequence Alignment and Visualization Tool.
- VISTA Tools for Comparative Genomics.
- mauve - Multiple (bacterial) genome aligment.
List of nucleotide sequence query programs
Perl
A very brief example to demonstrate file input/output.
Code:
#!/usr/bin/perl use strict; use warnings; my (@temp, $in, $out); my $inf = "data.fq"; my $outf = "data_out.fq"; open($in, "<", $inf) or die "Can't open $inf: $!"; open($out, ">", $outf) or die "Can't open $outf: $!"; while(<$in>){ chomp($temp[0]=$_); # First line is an identifier. chomp($temp[1]=<$in>); # Second line is sequence. chomp($temp[2]=<$in>); # Third line is an identifier. chomp($temp[3]=<$in>); # Fourth line is quality. print $out join("\t", @temp)."\n"; } close $in or die "$in: $!"; close $out or die "$out: $!";
- perlintro - Introduction to perl with links to other documentation.
- BioPerl beginners - Introduction to BioPerl (be prepared for object oriented code).
Python
R project
- R project - Statistical programming environment.
- Bioconductor - R for biologists (micro-array and next generation data).
- APE - Analysis of phylogenetics and evolution R package.
- HT Sequence Analysis with R and Bioconductor