Tregwiki:Matrix Scanning

From OpenWetWare
Revision as of 09:58, 7 February 2007 by 129.70.128.109 (talk) (some small corrections concerning possumsearch)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Transfac Match
Match is the younger brother of MatInspector. They are almost identical, except for the last step: Where MatInspector uses Score=SeqScore/MaxScore, Match calculates Score = SeqScore-MinScore / MaxScore-MinScore. In addition, Match is using matrix-derived thresholds instead of global ones: Hits are only signaled when the threshold for a certain matrix is exceeded. Biobase is calculting these thresholds and storing in profile (prf) files that can be given to Match. Cutoffs are calculated in such a way that 90% of real binding sites are recognized with these thresholds (which makes their application doubtful for matrices with only few known sites).
MatInspector (paper)
Seems there is no version to download on the internet anymore. MatInspector's result is seqscore / maxscore. MaxScore the maximum possible score for a matrix and seq-score is the score of current sequence. Seq-score is defined as the sum over (frequencies over the whole length * the Information at this position). Information is calculated as the sum of (freq * ln(freq)) for all nucleotides at a position.
TRES
looks promising, searches conserved matches for PWMs, finds conserved k-mers or repeats
CREAD
CREAD is C++ library for scanning sequences. looks certainly good, Andrew sent a comment on it. STORM is the part of it that searches for matrices (transfac format) in sequences (fasta, using a suffix tree). Reads Transfac (Hurra!) format and outputs a P-Value relative to background.
  • tffind (searches matrices in a multiple alignment, problem: no gaps allowed)
  • MONKEY searches also in multiple alignment, but rates against phylogenetic model (some changes are more likely than others), ?: gaps?
  • Possumsearch based on an index based algorithm called ESASearch, Possumsearch is using enhanced suffix arrays to speed up scanning of matrices.It's still scanning, only faster. You can scan whole genomes in a couple of minutes. (To the best of my knowledge, UCSC was scanning its human genome without any tree.) It's REALLY fast! Various matrix match values (like transfac's) and output formats can be returned. If you ever want to scan longer sequences, you should use Possumsearch. (Thanks to Michael Beckstette for his feedback on this.)
  • MAPPER will scan your sequence against a database of HMMs generated from Transfac matrix.dat and Jaspar.
  • ScanACE from the alignace package will scan your sequence against a library of motifs
  • Clover scans and gives P-values relative to background sequences
  • MotifScanner from the Toucan package gives a P-value relative to an HMM trained on background sequences.