Wikiomics:DNA sequencing

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
m (Base calling (ABI))
Current revision (17:34, 12 February 2010) (view source)
m (Genome assemblers used in current genomic projects)
 
(8 intermediate revisions not shown.)
Line 1: Line 1:
=Base calling (ABI)=
=Base calling (ABI)=
-
* [http://www.phrap.com/phred/ phred] giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly.  
+
* [http://www.phrap.com/phred/ phred] giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. Quality scores range from 4 to about 60, the "high quality bases" are these with scores > 20. The latest beta version (0.071220.b) supports ABI_3730 as well as older ABI models (373, 377, and 3700),      Molecular Dynamics MegaBACE and LI-COR 4000.
 +
 
#To run it you need to set up PHRED_PARAMETER_FILE variable.  
#To run it you need to set up PHRED_PARAMETER_FILE variable.  
Line 22: Line 23:
-
 
+
* [http://www.nucleics.com/ Long Trace & Peak Trace] from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. (not tested)
-
* [http://www.genome.org/cgi/content/full/11/5/875 LifeTrace] from Incyte. Usage info from [https://www.cebitec.uni-bielefeld.de/cgi-bin/man.cgi?section=1&topic=lifetrace U.Bielefeld]. Comparable with phred, better than phred on MegaBACE sequences. Not available from original Incyte web site (080624).
+
* there has been a number of other papers describing algorithms supposedly superior to phred but working software is not easily obtainable if at all.
-
 
+
-
* [http://www.nucleics.com/ Long Trace & Peak Trace] from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer.
+
=Sequence assembly=
=Sequence assembly=
Line 41: Line 40:
* JAZZ -> @JGI in house only
* JAZZ -> @JGI in house only
-
* RAMEN (not published yet as for 08-6-17), used for medaka and silkworm genome sequencing projects
+
* RAMEN (not published yet as for 10-02-12), used for medaka and silkworm genome sequencing projects
===New Programs===  
===New Programs===  
Line 47: Line 46:
* [http://amos.sourceforge.net/ AMOS] A Modular Open-Source Assembler
* [http://amos.sourceforge.net/ AMOS] A Modular Open-Source Assembler
-
* [http://nbcr.sdsc.edu/euler/euler2/ EULER] P.Pevzner graph algorithm producing superior contigs
+
* [http://nbcr.sdsc.edu/euler/euler2/ EULER] P.Pevzner graph algorithm producing superior contigs. Requires phrap and patched [ftp://ftp.cs.arizona.edu/realigner/ ReAligner]
-
requires phrap and patched [ftp://ftp.cs.arizona.edu/realigner/ ReAligner]
+
*  [http://chevreux.org/projects_mira.html MIRA] latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads).
*  [http://chevreux.org/projects_mira.html MIRA] latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads).
 +
 +
* [http://www.bcgsc.ca/platform/bioinfo/software/ssake SSAKE] program for assembly milions of short sequences
* [http://www.454.com/enabling-technology/the-software.asp Newbler Assembler] software from 454 for de novo sequence assembly.
* [http://www.454.com/enabling-technology/the-software.asp Newbler Assembler] software from 454 for de novo sequence assembly.
 +
 +
====Experimental====
 +
* [http://sharcgs.molgen.mpg.de/index.shtml SHARCGS], a DNA assembly program designed for de novo assembly of 25-40mer input fragments and deep sequence coverage.
 +
 +
* [http://www.genome.org/cgi/content/full/18/5/810 ALLPATHS (HTML)] algorithm only
 +
 +
* [http://maq.sourceforge.net/maq-man.shtml Maq] mapping short reads to an existing genomic sequence
 +
 +
 +
* [http://soap.genomics.org.cn/ SOAP] suite of programs, no de novo assembly (yet).
 +
 +
* [http://bowtie-bio.sourceforge.net/index.shtml Bowtie] "fast and memory efficient"
 +
 +
The most complete list to date is [http://seqanswers.com/forums/showthread.php?t=43 @seqanswers]
See also software from  
See also software from  
* [http://www.bcgsc.ca/platform/bioinfo/software/ GSC Software Centre] at Canada's Michael Smith Genome Sciences Centre.
* [http://www.bcgsc.ca/platform/bioinfo/software/ GSC Software Centre] at Canada's Michael Smith Genome Sciences Centre.
 +
 +
===Sequence databases & formats===
 +
* [http://srf.sourceforge.net/ SRF] a generic format for DNA sequence data
 +
* [http://www.ncbi.nlm.nih.gov/Traces/sra/ The Short Read Archive @NCBI]
===Short reads assembly (Solexa etc)===
===Short reads assembly (Solexa etc)===
Line 62: Line 80:
===Contig ordering/finishing===
===Contig ordering/finishing===
* [http://genomebiology.com/2007/8/3/R34 Hawkeye] interactive visual analytics tool for genome assemblies
* [http://genomebiology.com/2007/8/3/R34 Hawkeye] interactive visual analytics tool for genome assemblies
 +
 +
=Quality control=
 +
 +
* [http://web.science.oregonstate.edu/~dolanp/tileqc/  TileQC] R based program for quality control of Solexa reads
 +
 +
[[Category:Protocol]]
 +
[[Category:In silico]]
 +
[[Category:Sequence analysis]]

Current revision

Contents

Base calling (ABI)

  • phred giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. Quality scores range from 4 to about 60, the "high quality bases" are these with scores > 20. The latest beta version (0.071220.b) supports ABI_3730 as well as older ABI models (373, 377, and 3700), Molecular Dynamics MegaBACE and LI-COR 4000.


  1. To run it you need to set up PHRED_PARAMETER_FILE variable.

Bash shell:

export PHRED_PARAMETER_FILE=/path/to/your/file/phredpar.dat
  1. To see all the options:
phred -doc | less
  1. To do simple basecalling on _all_ files in a input_directory and store the SCF files in scf_output_directory:
phred -id input_directory -cd scf_output_directory:

Caveat: names of the new SCF files will be the same as input files.


  • Long Trace & Peak Trace from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. (not tested)
  • there has been a number of other papers describing algorithms supposedly superior to phred but working software is not easily obtainable if at all.

Sequence assembly

See and read!: http://www.cbcb.umd.edu/software/

First generation

Genome assemblers used in current genomic projects

  • JAZZ -> @JGI in house only
  • RAMEN (not published yet as for 10-02-12), used for medaka and silkworm genome sequencing projects

New Programs

  • Minimus suitable for bacterial genomes, part of AMOS
  • AMOS A Modular Open-Source Assembler
  • EULER P.Pevzner graph algorithm producing superior contigs. Requires phrap and patched ReAligner
  • MIRA latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads).
  • SSAKE program for assembly milions of short sequences

Experimental

  • SHARCGS, a DNA assembly program designed for de novo assembly of 25-40mer input fragments and deep sequence coverage.
  • Maq mapping short reads to an existing genomic sequence


  • SOAP suite of programs, no de novo assembly (yet).
  • Bowtie "fast and memory efficient"

The most complete list to date is @seqanswers

See also software from

Sequence databases & formats

Short reads assembly (Solexa etc)

Contig ordering/finishing

  • Hawkeye interactive visual analytics tool for genome assemblies

Quality control

  • TileQC R based program for quality control of Solexa reads
Personal tools