Wikiomics:DNA sequencing: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Darek Kedra (talk | contribs) |
Darek Kedra (talk | contribs) |
||
(9 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
=Base calling (ABI)= | =Base calling (ABI)= | ||
* [http://www.phrap.com/phred/ phred] giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. | * [http://www.phrap.com/phred/ phred] giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. Quality scores range from 4 to about 60, the "high quality bases" are these with scores > 20. The latest beta version (0.071220.b) supports ABI_3730 as well as older ABI models (373, 377, and 3700), Molecular Dynamics MegaBACE and LI-COR 4000. | ||
* [http://www.nucleics.com/ Long Trace & Peak Trace] from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. | #To run it you need to set up PHRED_PARAMETER_FILE variable. | ||
Bash shell: | |||
<pre> | |||
export PHRED_PARAMETER_FILE=/path/to/your/file/phredpar.dat | |||
</pre> | |||
#To see all the options: | |||
<pre> | |||
phred -doc | less | |||
</pre> | |||
#To do simple basecalling on _all_ files in a input_directory and store the SCF files in scf_output_directory: | |||
<pre> | |||
phred -id input_directory -cd scf_output_directory: | |||
</pre> | |||
Caveat: names of the new SCF files will be the same as input files. | |||
* [http://www.nucleics.com/ Long Trace & Peak Trace] from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. (not tested) | |||
* there has been a number of other papers describing algorithms supposedly superior to phred but working software is not easily obtainable if at all. | |||
=Sequence assembly= | =Sequence assembly= | ||
Line 21: | Line 40: | ||
* JAZZ -> @JGI in house only | * JAZZ -> @JGI in house only | ||
* RAMEN (not published yet as for | * RAMEN (not published yet as for 10-02-12), used for medaka and silkworm genome sequencing projects | ||
===New Programs=== | ===New Programs=== | ||
Line 27: | Line 46: | ||
* [http://amos.sourceforge.net/ AMOS] A Modular Open-Source Assembler | * [http://amos.sourceforge.net/ AMOS] A Modular Open-Source Assembler | ||
* [http://nbcr.sdsc.edu/euler/euler2/ EULER] P.Pevzner graph algorithm producing superior contigs | * [http://nbcr.sdsc.edu/euler/euler2/ EULER] P.Pevzner graph algorithm producing superior contigs. Requires phrap and patched [ftp://ftp.cs.arizona.edu/realigner/ ReAligner] | ||
* [http://chevreux.org/projects_mira.html MIRA] latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads). | * [http://chevreux.org/projects_mira.html MIRA] latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads). | ||
* [http://www.bcgsc.ca/platform/bioinfo/software/ssake SSAKE] program for assembly milions of short sequences | |||
* [http://www.454.com/enabling-technology/the-software.asp Newbler Assembler] software from 454 for de novo sequence assembly. | * [http://www.454.com/enabling-technology/the-software.asp Newbler Assembler] software from 454 for de novo sequence assembly. | ||
====Experimental==== | |||
* [http://sharcgs.molgen.mpg.de/index.shtml SHARCGS], a DNA assembly program designed for de novo assembly of 25-40mer input fragments and deep sequence coverage. | |||
* [http://www.genome.org/cgi/content/full/18/5/810 ALLPATHS (HTML)] algorithm only | |||
* [http://maq.sourceforge.net/maq-man.shtml Maq] mapping short reads to an existing genomic sequence | |||
* [http://soap.genomics.org.cn/ SOAP] suite of programs, no de novo assembly (yet). | |||
* [http://bowtie-bio.sourceforge.net/index.shtml Bowtie] "fast and memory efficient" | |||
The most complete list to date is [http://seqanswers.com/forums/showthread.php?t=43 @seqanswers] | |||
See also software from | See also software from | ||
* [http://www.bcgsc.ca/platform/bioinfo/software/ GSC Software Centre] at Canada's Michael Smith Genome Sciences Centre. | * [http://www.bcgsc.ca/platform/bioinfo/software/ GSC Software Centre] at Canada's Michael Smith Genome Sciences Centre. | ||
===Sequence databases & formats=== | |||
* [http://srf.sourceforge.net/ SRF] a generic format for DNA sequence data | |||
* [http://www.ncbi.nlm.nih.gov/Traces/sra/ The Short Read Archive @NCBI] | |||
===Short reads assembly (Solexa etc)=== | ===Short reads assembly (Solexa etc)=== | ||
Line 42: | Line 80: | ||
===Contig ordering/finishing=== | ===Contig ordering/finishing=== | ||
* [http://genomebiology.com/2007/8/3/R34 Hawkeye] interactive visual analytics tool for genome assemblies | * [http://genomebiology.com/2007/8/3/R34 Hawkeye] interactive visual analytics tool for genome assemblies | ||
=Quality control= | |||
* [http://web.science.oregonstate.edu/~dolanp/tileqc/ TileQC] R based program for quality control of Solexa reads | |||
[[Category:Protocol]] | |||
[[Category:In silico]] | |||
[[Category:Sequence analysis]] |
Latest revision as of 15:34, 12 February 2010
Base calling (ABI)
- phred giving more accurate calls for less accurate part of the sequence (like at the end of the run, say 600bp and more) . Phred also gives a probability/quality values for each of the bases allowing more accurate assembly. Quality scores range from 4 to about 60, the "high quality bases" are these with scores > 20. The latest beta version (0.071220.b) supports ABI_3730 as well as older ABI models (373, 377, and 3700), Molecular Dynamics MegaBACE and LI-COR 4000.
- To run it you need to set up PHRED_PARAMETER_FILE variable.
Bash shell:
export PHRED_PARAMETER_FILE=/path/to/your/file/phredpar.dat
- To see all the options:
phred -doc | less
- To do simple basecalling on _all_ files in a input_directory and store the SCF files in scf_output_directory:
phred -id input_directory -cd scf_output_directory:
Caveat: names of the new SCF files will be the same as input files.
- Long Trace & Peak Trace from Nucleics. Claims to increase the lenght of readable bases by ca 80bp. Separate software module for increasing daily throughput of a capillary sequencer. (not tested)
- there has been a number of other papers describing algorithms supposedly superior to phred but working software is not easily obtainable if at all.
Sequence assembly
See and read!: http://www.cbcb.umd.edu/software/
First generation
Genome assemblers used in current genomic projects
- JAZZ -> @JGI in house only
- RAMEN (not published yet as for 10-02-12), used for medaka and silkworm genome sequencing projects
New Programs
- MIRA latest version 2.9.25 enables true hybrid sequence assembly (454 data [GS20 or GS FLX], Solexa with Sanger reads).
- SSAKE program for assembly milions of short sequences
- Newbler Assembler software from 454 for de novo sequence assembly.
Experimental
- SHARCGS, a DNA assembly program designed for de novo assembly of 25-40mer input fragments and deep sequence coverage.
- ALLPATHS (HTML) algorithm only
- Maq mapping short reads to an existing genomic sequence
- SOAP suite of programs, no de novo assembly (yet).
- Bowtie "fast and memory efficient"
The most complete list to date is @seqanswers
See also software from
- GSC Software Centre at Canada's Michael Smith Genome Sciences Centre.
Sequence databases & formats
- SRF a generic format for DNA sequence data
- The Short Read Archive @NCBI
Short reads assembly (Solexa etc)
- Velvet Paper(HTML) De Bruijn Graphs based asembler from EBI (Zerbin & Birney)
Contig ordering/finishing
- Hawkeye interactive visual analytics tool for genome assemblies
Quality control
- TileQC R based program for quality control of Solexa reads