Wikiomics:Genome aligners: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(+few progs)
 
(25 intermediate revisions by the same user not shown)
Line 4: Line 4:


For introduction read: Lyons and Freeling "How to usefully compare homologous plant genes and chromosomes as DNA sequences" 2008
For introduction read: Lyons and Freeling "How to usefully compare homologous plant genes and chromosomes as DNA sequences" 2008
Also: Parameters for accurate genome alignment by Frith et al BMC Bioinformatics 2010, 11:80 http://www.biomedcentral.com/1471-2105/11/80


=Aligners=
=Aligners=


* LAGAN Toolkit
===MUMmer===
web: http://mummer.sourceforge.net/
 
version: MUMmer3.22.tar.gz from 2009-09-21
 
===LAGAN Toolkit===
 
http://lagan.stanford.edu/lagan_web/index.shtml
http://lagan.stanford.edu/lagan_web/index.shtml
ver 2.0 from 2006
ver 2.0 from 2006
Line 14: Line 21:
** Shuffle-LAGAN
** Shuffle-LAGAN


* Vmatch
===Vmatch===
http://www.vmatch.de/
http://www.vmatch.de/


Free of charge non-commercial license (requires faxing).
===lastz (successor of blastz)===
web site: http://www.bx.psu.edu/~rsharris/lastz/
latest stable release: 2010-Jan-12


* lastz (successor of blastz)
documentation: http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html
latest release: 2010-Jan-12
http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html


* last
New releases:  http://www.bx.psu.edu/~rsharris/lastz/newer/
 
===last===
http://last.cbrc.jp/
http://last.cbrc.jp/
last release: last-103.zip from 05-Apr-2010
last release: last-159.zip 14-Feb-2011 18:59 340K


Compare two vertebrate genomes (Human vs. mouse: 1 day on 1 CPU)
Compare two vertebrate genomes (Human vs. mouse: 1 day on 1 CPU)
Line 30: Line 44:
can align a large number of sequences (i.e. next gen sequencing data to genome)
can align a large number of sequences (i.e. next gen sequencing data to genome)


* YASS  
Use softmasked input sequences.
http://bioinfo.lifl.fr/yass/
<pre>
#create database from one of the genomes (larger?) on a machine with > 20GB free RAM to speed up the process
lastdb -c -s20G -v genome1_db genome1_sequence.fa
 
#align the genomes with maf output
lastal -o genome2_vs_genome1.maf -v genome1_db  genome2_sequence.fa
</pre>
 
===YASS===
web: http://bioinfo.lifl.fr/yass/
 
last release: pre-release v1.14 build Apr 15, 2010
last release: pre-release v1.14 build Apr 15, 2010
paper; doi:10.1093/nar/gki478
paper; doi:10.1093/nar/gki478


spliced seeds, see also links to hedera & iedera programs on YASS page.
spliced seeds, see also links to hedera & iedera programs on YASS page.


* MAUVE  (multiple genome alignment) http://asap.ahabs.wisc.edu/mauve/
===Cgaln===
web: http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user/cgaln/
 
last release: Cgaln-1.0.0.tar.gz
 
Two step aligner (first at the blocks then nucleotide levels). According to authors, fast and memory efficient. Suitable for bacterial genomes and mammalian chromosomes on a desktop computer( untested dk).
 
===FEAST===
web: http://monod.uwaterloo.ca/feast/
 
last release:  feast-105-bin.tar.gz
 
more sensitive but slower than lastz, new tool not widely tested.
 
===MAUVE===  
multiple genome alignment   
http://asap.ahabs.wisc.edu/mauve/
last release: 2.3.1, from November 11th 2009.
last release: 2.3.1, from November 11th 2009.
Java application with GUI. Simple to use, producing colorful graphic. Output gets too cluttered with too many / too divergent sequences.
Java application with GUI. Simple to use, producing colorful graphic. Output gets too cluttered with too many / too divergent sequences.  
 
===Spines===
software collection from Broad
http://www.broadinstitute.org/science/programs/genome-biology/spines
latest release: spines-1.11.tar.gz from 2010-10-28
* Satsuma "highly parallelized program for high-sensitivity, genome-wide synteny"
* Papaya "an all-purpose alignment tool for less diverged sequences"
* SLAP "context-sensitive local aligner for diverged sequences with large gaps"
 
 
===Mercator===
Multiple Whole-Genome Orthology Map Construction
 
http://www.biostat.wisc.edu/~cdewey/mercator/
 
latest release: cndsrc-2010.10.11.tar.gz
 
 
===Enredo-Pecan-Ortheus pipeline===
Several programs used for aligning eukariotic genomes at ENSEMBL.
 
* Enredo: http://www.ebi.ac.uk/~jherrero/downloads/enredo/
* Pecan: http://www.ebi.ac.uk/~bjp/pecan/
* Ortheus: http://www.ebi.ac.uk/~bjp/ortheus/


* Spines (software collection from Broad) http://www.broadinstitute.org/science/programs/genome-biology/spines
===FSA===
latest release: spines-1.09.tar.gz from 2010-04-01
http://orangutan.math.berkeley.edu/fsa/
** Satsuma "highly parallelized program for high-sensitivity, genome-wide synteny"
** Papaya "an all-purpose alignment tool for less diverged sequences"
** SLAP "context-sensitive local aligner for diverged sequences with large gaps"


latest version: fsa-1.15.5.tar.gz (10.1 MB)
paper: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000392
===Mugsy===
http://mugsy.sourceforge.net/
(bacterial genomes)
===AuberGene===
http://www.ibi.vu.nl/programs/aubergenewww/
Probably most suitable for aligning a particular gene locus.


=Alignment visualisation=
=Alignment visualisation=
* VISTA http://genome.lbl.gov/vista
* [http://genome.lbl.gov/vista VISTA]
* MULAN
* [http://mulan.dcode.org/ MULAN]
* GeLo http://synteny.cnr.berkeley.edu/CoGe/GEvo.p
* [http://synteny.cnr.berkeley.edu/CoGe/GEvo.pl GeLo]
* Gmaj http://globin.bx.psu.edu/dist/gmaj/
* [http://globin.bx.psu.edu/dist/gmaj/ Gmaj]
 
Novel:
* Strudel http://bioinf.scri.ac.uk/strudel/


=Supporting tools=
=Supporting tools=
Line 62: Line 138:
=Useful links=
=Useful links=
* how to create a synteny map between two genomes: http://synteny.cnr.berkeley.edu/wiki/index.php/SynMap
* how to create a synteny map between two genomes: http://synteny.cnr.berkeley.edu/wiki/index.php/SynMap
=Conservation scores=
==phastCons==
http://compgen.bscb.cornell.edu/phast/
==GERP==
http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html
==Scone==
http://ika.bwh.harvard.edu/scone/
=Varia=
==SiPhy==
web: http://www.broadinstitute.org/genome_bio/siphy/
article: http://bioinformatics.oxfordjournals.org/content/25/12/i54.short

Latest revision as of 03:50, 17 March 2011


List of programs used for large scale DNA alignment. At the moment the statements are mostly from web sites of programs in question.

For introduction read: Lyons and Freeling "How to usefully compare homologous plant genes and chromosomes as DNA sequences" 2008 Also: Parameters for accurate genome alignment by Frith et al BMC Bioinformatics 2010, 11:80 http://www.biomedcentral.com/1471-2105/11/80

Aligners

MUMmer

web: http://mummer.sourceforge.net/

version: MUMmer3.22.tar.gz from 2009-09-21

LAGAN Toolkit

http://lagan.stanford.edu/lagan_web/index.shtml ver 2.0 from 2006

    • LAGAN
    • M-LAGAN
    • Shuffle-LAGAN

Vmatch

http://www.vmatch.de/

Free of charge non-commercial license (requires faxing).

lastz (successor of blastz)

web site: http://www.bx.psu.edu/~rsharris/lastz/

latest stable release: 2010-Jan-12

documentation: http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html

New releases: http://www.bx.psu.edu/~rsharris/lastz/newer/

last

http://last.cbrc.jp/ last release: last-159.zip 14-Feb-2011 18:59 340K

Compare two vertebrate genomes (Human vs. mouse: 1 day on 1 CPU) copes more efficiently with repeat-rich sequences can align a large number of sequences (i.e. next gen sequencing data to genome)

Use softmasked input sequences.

#create database from one of the genomes (larger?) on a machine with > 20GB free RAM to speed up the process
lastdb -c -s20G -v genome1_db genome1_sequence.fa

#align the genomes with maf output
lastal -o genome2_vs_genome1.maf -v genome1_db  genome2_sequence.fa

YASS

web: http://bioinfo.lifl.fr/yass/

last release: pre-release v1.14 build Apr 15, 2010

paper; doi:10.1093/nar/gki478

spliced seeds, see also links to hedera & iedera programs on YASS page.

Cgaln

web: http://www.genome.ist.i.kyoto-u.ac.jp/~aln_user/cgaln/

last release: Cgaln-1.0.0.tar.gz

Two step aligner (first at the blocks then nucleotide levels). According to authors, fast and memory efficient. Suitable for bacterial genomes and mammalian chromosomes on a desktop computer( untested dk).

FEAST

web: http://monod.uwaterloo.ca/feast/

last release: feast-105-bin.tar.gz

more sensitive but slower than lastz, new tool not widely tested.

MAUVE

multiple genome alignment http://asap.ahabs.wisc.edu/mauve/ last release: 2.3.1, from November 11th 2009. Java application with GUI. Simple to use, producing colorful graphic. Output gets too cluttered with too many / too divergent sequences.

Spines

software collection from Broad http://www.broadinstitute.org/science/programs/genome-biology/spines latest release: spines-1.11.tar.gz from 2010-10-28

  • Satsuma "highly parallelized program for high-sensitivity, genome-wide synteny"
  • Papaya "an all-purpose alignment tool for less diverged sequences"
  • SLAP "context-sensitive local aligner for diverged sequences with large gaps"


Mercator

Multiple Whole-Genome Orthology Map Construction

http://www.biostat.wisc.edu/~cdewey/mercator/

latest release: cndsrc-2010.10.11.tar.gz


Enredo-Pecan-Ortheus pipeline

Several programs used for aligning eukariotic genomes at ENSEMBL.

FSA

http://orangutan.math.berkeley.edu/fsa/

latest version: fsa-1.15.5.tar.gz (10.1 MB)

paper: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000392

Mugsy

http://mugsy.sourceforge.net/

(bacterial genomes)

AuberGene

http://www.ibi.vu.nl/programs/aubergenewww/ Probably most suitable for aligning a particular gene locus.

Alignment visualisation

Novel:

Supporting tools

  • DAGchainer: Computing Chains of Syntenic Genes in Complete Genomes (Perl)

http://dagchainer.sourceforge.net/

Useful links

Conservation scores

phastCons

http://compgen.bscb.cornell.edu/phast/

GERP

http://mendel.stanford.edu/SidowLab/downloads/gerp/index.html

Scone

http://ika.bwh.harvard.edu/scone/

Varia

SiPhy

web: http://www.broadinstitute.org/genome_bio/siphy/

article: http://bioinformatics.oxfordjournals.org/content/25/12/i54.short