m (→External courses/tutorials)
m (→stuff to incorporate)
|Line 263:||Line 263:|
==stuff to incorporate ==
==stuff to incorporate ==
* [http://biodev.hgen.pitt.edu/footer_php/Footerv2_0.php FOOTER] and [http://nar.oxfordjournals.org/cgi/content/abstract/33/suppl_2/W442 FOOTER paper]
* [http://biodev.hgen.pitt.edu/footer_php/Footerv2_0.php FOOTER] and [http://nar.oxfordjournals.org/cgi/content/abstract/33/suppl_2/W442 FOOTER paper
== Credits ==
== Credits ==
Revision as of 11:00, 10 December 2007
Sequence motifs - for more general background see Wikipedias Sequence_motif.
It is important to use few complementary programs as well as allow predictions of several motifs per program. There is threshold when it comes to number of sequences used, above which there is no improvement in sensitivity.
Motif finding programs
There are obvious trade offs between speed and accuracy/sensitivity when running these programs. Few rules based on review by Hu et al.:
- avoid using longer sequencess than necessary. It does increase noise signal and increases running time.
- for programs like MEME there is a plateu of motifs found after about 10 input sequences, so one can randomly select 10 sequences out of a larger set and
then check for occurence of detected motifs in the remaining ones.
- scores between different programs are incompatible
Gibbs sampling algorithm (stochastic), requires multiple runs to get all top hits.
- Web server:  Gibbs sampling algorithm
- command line C++, requires compilation
- to run it:
./AlignACE -i GAL.seq > test.ace
- Accessory programs for motif comparisons and motif finding
CompareACE  Compares set of found motifs to a database of TFs (in yeast on the web)
- Command line
meme -nmotifs numberof_motifs_2_find -model oops -protein -sequence your_fasta_file -outfile your_fasta_file.meme
Where 'model' could be oops/zoops/anr.
oops: One Occurrence Per Sequence
zoops: Zero or One Occurrence Per Sequence
anr: Any Number of Repetitions
./weederlauncher.out your_promoters_file.fasta MM large A M S t15
Where MM stands for Mus musculus, HS Homo Sapiens atc. See ./FreqFiles/ directory for more.
The output files: your_promoters_file.mix your_promoters_file.wee (output as text) your_promoters_file.html (output summary in HTML)
- Web server: 
accepts only FastA files with single sequence line
>sequence1 name GGTGACGAC sequence1 as ONE LINE >sequence2 name GTAGCCTCATG sequence2 as ONE LINE
- fixed widith of motifs (default 10, range: 4-50)
- binaries for Linux, Sun and Cygwin.
- creata background file:
./genomebg.linux -i your_background -o your_background.genomeBG
- run BioProspector
./BioProspector.linux -n 200 -d 1 -r 30 -i target_fasta_seqs -f your_background.genomeBG -o outputbiop1
Web takes into account position of the motifs Command line utility called ameme
./ameme good=your_test_set.fasta bad=set_of_random_promoter_sequences.fasta numMotifs=10 mo tifOutpu=ameme.out outputLogo background=m2
(java + C program, requires compilation ) 
- Calculate background:
makemosaicbg -seqs your_input_sequences.fasta -mosaicClasses 1 -mosaicOrder 1 -out your_sequences_background.sbg
- Calculate motifs:
motiffinder -seqs your_input_sequences.fasta -backgroundModel your_sequences_background.sbg -numMotifs 2
- View the motifs:
Output in xms (an XML variant) format.
- MotifExplorer (Java motif viewer compatible with NestedMica): 
as for Jan 22nd 2007 works on MacOS. Problems on Windows and Ubuntu Linux.
Self Organising Maps 
- create background model (takec >15min on 35Mbytes fasta file/2.26GHz Pentium) :
perl ./BackExtract.pl -seq your_background_sequences.fasta
- Create SOM with motif lenght 8, 10, 12, 14 &16 nucleotides:
./SOMBRERO -t your_target_sequences.fasta -b out.back -lm 8 16 -out target_sombrero.out
- View the SOM (requires installation of Tkperl)
perl ./SOMBREROView.pl target_sombrero.ou
- uses restricted fasta format!
>sequence1 name ATGGTGACGAC sequence1 as ONE LINE >sequence2 name GTAGCCTCATG sequence2 as ONE LINE
- requires sequence background file
./genomebg -i inputSequenceFile -o outputDistributionFile
- running it:
./MDscan -i inseq -w 15 -f yeast_all.bg -t 10 -c 80 -r 10 -n 5 -g 1
find motif of width 15 from sequence file inseq, use yeast_all.bg as the background distribution. Find candidate motifs from top 10 sequences, and refine 5 iterations from the top 0 sequences. Report the final best 5 motifs to stdout, and do not print out progress messages on the way.
YMF (not working)
- C++ program, requires compilation
- running it:
./stats stats.config 800 6 ../ymftables/yeast -sort ../examples/abf/abf1 ../examples/abf/cha1
Obtainable from author. Requires compilation
- example ( ~50 Kbp takes 70mins)
dmotif -positive positive.fna -negative negative.fna -len 9 -bkg fly_background.fna -niter 5 -nmotif 1 > dmotifoutput 2> dmotiflog
Change '-nmotif 1' to a '-nmotif 10' if you want to get top ten motifs.
- The AMADEUS Motif Discovery Tool (whole platform)
- GAME (java, genetic algorithm) 
- MotifCut (maximum density subgraphs) 
Paper: http://bioinformatics.oxfordjournals.org/cgi/reprint/22/14/e150 motif lenght: fixed, between 6,31
- GibbsST  Not working yet
- THEME (ChIP-chip only??)
- PhyME 
- PhyloGibbs 
- PhyloCon 
Multiple algorithms / metaservers
- Credo 
Visualisation of AlignACE, DIALIGN, FootPrinter, MEME and MotifSampler results. Paper: 
- Multifinder http://the_brain.bwh.harvard.edu/MultiFinderSuppl/ (download)
- RgS-Miner (web, as of 2007.05.22: uses gene list but not sequences yet) http://rgsminer.csie.ncu.edu.tw/
- CompareACE  supporting program for AlignAce
- STAMP 
Handles 12 various output formats from a wide range of motif finding programs. Compares these motifs to known TFBS from JASPAR and other, also user-defined databases of motifs. Converts the output to intermediate format accessible on the server under "X motifs loaded".
- Cistematic http://cistematic.caltech.edu/
Python package with interfaces to i.e. MEME, AlignACE, Co-Bind, and FootPrinter Paper 
Python/C++ package with Interfaces to MDscan, AlignACE, and MEME. Paper: 
- ORegAnno database as a dynamic collection of literature-curated regulatory regions, transcription factor binding sites and regulatory mutations (polymorphisms and haplotypes).
- Tompa review of 14 programs Nat Biotech. 2005 g
- Maximilian Haeussler's master thesis and his wiki:
- Comparison of several programs: Hu NAR 2005 
- Erich Schwarz's list from 2002
- Motif Tool Assessment Platform (MTAP) wiki from Omaha
stuff to incorporate
- Melina II Metaserver (uses four out of five programs: CONSENSUS, MEME, Gibbs Sampler, MDScan and Weeder) by Kenta Nakai @Tokyo University. Peper
- Darek Kedra wrote this tutorial