Wikiomics:Protein mass spectrometry: Difference between revisions
Darek Kedra (talk | contribs) m (→File formats) |
Darek Kedra (talk | contribs) (+test data sets, tutorial) |
||
Line 3: | Line 3: | ||
* quantification | * quantification | ||
A good introductory tutorial from USC Computational Biology group is [http://msms.cmb.usc.edu/tutorial.html here]. | |||
=Protein/peptide identification= | =Protein/peptide identification= | ||
Line 20: | Line 21: | ||
* algorithms (most commonly used): | * algorithms (most commonly used): | ||
** [http://fields.scripps.edu/sequest/index.html Sequest] $$$ | ** [http://fields.scripps.edu/sequest/index.html Sequest] $$$ | ||
** [http://www.matrixscience.com/home.html Mascot] $$$ | ** [http://www.matrixscience.com/home.html Mascot] $$$, free but limited [http://www.matrixscience.com/cgi/search_form.pl?FORMVER=2&SEARCH=MIS web server form] | ||
** [http://pubchem.ncbi.nlm.nih.gov/omssa/ OMSSA] Open Mass Spectrometry Search Algorithm, open source | ** [http://pubchem.ncbi.nlm.nih.gov/omssa/ OMSSA] Open Mass Spectrometry Search Algorithm, open source | ||
** [http://thegpm.org/ XTandem] open source effort from Canada | ** [http://thegpm.org/ XTandem] open source effort from Canada | ||
Line 84: | Line 85: | ||
* [http://p3.thegpm.org/tandem/ppp.html P3 (server)] from Global Proteomics Machine (free) | * [http://p3.thegpm.org/tandem/ppp.html P3 (server)] from Global Proteomics Machine (free) | ||
** [http://www.thegpm.org/PPP/index.html description] | ** [http://www.thegpm.org/PPP/index.html description] | ||
* [http://www.peptideatlas.org/spectrast/ SpectraST] from ISB, Seattle (not as many species/options as P3) | * [http://www.peptideatlas.org/spectrast/ SpectraST] from ISB, Seattle (not as many species/options as P3). Ca 500x faster than Sequest on the same set. | ||
[http://www.proteomecenter.org/course/spectraST.11.07.pdf lecture notes] by Henry Lam from ISB | |||
* [http://proteome.gs.washington.edu/software/bibliospec/documentation/index.html BiblioSpec] from MacCoss lab. (free for non-profit, online licence) | * [http://proteome.gs.washington.edu/software/bibliospec/documentation/index.html BiblioSpec] from MacCoss lab. (free for non-profit, online licence) | ||
** command line only | ** command line only | ||
Spectral libraries available [http://www.peptideatlas.org/speclib/ here@PeptideAtlas] | |||
=Protein quantification= | =Protein quantification= | ||
Line 113: | Line 118: | ||
* '''mzData''' (standard set by HUPO Proteomics Standard Initiative) | * '''mzData''' (standard set by HUPO Proteomics Standard Initiative) | ||
=Spectrum datasets= | |||
Good for testing programs: | |||
* [http://www.peptideatlas.org/ PeptideAtlas@ Seattle Proteome Center] | |||
* Open Proteomics Database [http://bioinformatics.icmb.utexas.edu/OPD/ OPD] | |||
* [http://www.ebi.ac.uk/pride/ppp_links.do HUPO Plasma Proteome Project files] PRIDE@EBI | |||
=Web sites= | =Web sites= | ||
Line 119: | Line 130: | ||
* [http://www.proteomecommons.org/tools.jsp Proteome Commons] collection of tools & links | * [http://www.proteomecommons.org/tools.jsp Proteome Commons] collection of tools & links | ||
* [http://www.broad.mit.edu/cancer/software/genepattern/desc/proteomics.html GenePattern] proteomics modules from Broad Inst. | * [http://www.broad.mit.edu/cancer/software/genepattern/desc/proteomics.html GenePattern] proteomics modules from Broad Inst. | ||
* [http://msms.cmb.usc.edu/ USC in LA] several programs: PepHMM, Sub-DeNovo, SuffixTree-MS. | |||
=Reviews= | =Reviews= | ||
For a good review of programs and aspects of protein identification by mass spectrometry | For a good review of programs and aspects of protein identification by mass spectrometry | ||
Line 143: | Line 154: | ||
==Other== | ==Other== | ||
Needs to be sorted out. | |||
* [http://www.bioinfo.no/software/massSorter massSorter ] | * [http://www.bioinfo.no/software/massSorter massSorter ] | ||
* [http://prospector.ucsf.edu/ ProteinProspector] | * [http://prospector.ucsf.edu/ ProteinProspector] | ||
===Experimental=== | |||
* [http://bioinformatics.genomicsolutions.com/service/prowl/sonar.html Sonar ] | * [http://bioinformatics.genomicsolutions.com/service/prowl/sonar.html Sonar ] | ||
* DeNovoID [http://proteomics.mcw.edu/denovoid web] | * DeNovoID [http://proteomics.mcw.edu/denovoid web] | ||
* SPIDER [http://ieeexplore.ieee.org/iel5/9262/29416/01332434.pdf?tp=&isnumber=&arnumber=1332434 (PDF)] de novo + homology search in other species | * SPIDER [http://ieeexplore.ieee.org/iel5/9262/29416/01332434.pdf?tp=&isnumber=&arnumber=1332434 (PDF)] de novo + homology search in other species based on a set of tags | ||
* OpenSea [http://pubs.acs.org/cgi-bin/article.cgi/jprobs/2005/4/i02/html/pr049781j.html (HTML)] Java program available from authors | * OpenSea [http://pubs.acs.org/cgi-bin/article.cgi/jprobs/2005/4/i02/html/pr049781j.html (HTML)] Java program available from authors | ||
ModifiComb [http://www.mcponline.org/cgi/content/full/5/5/935 (HTML)] (available from authors?) | |||
* [http://prix.uos.ac.kr/modi/ MODi] web server for PTMs discovery | * [http://prix.uos.ac.kr/modi/ MODi] web server for PTMs discovery | ||
* [http://llama.med.harvard.edu/cgi/SILVER/silver.cgi?id=916487 SILVER] view your spectra with LOD scores | |||
<!-- | <!-- |
Revision as of 11:41, 11 December 2007
Protein mass spectrometry can be divided into:
- identification of proteins/peptides
- quantification
A good introductory tutorial from USC Computational Biology group is here.
Protein/peptide identification
Peptide Mass Fingerprinting (PMF) or (MS)
Old method, superseded by MS/MS
- algorithms:
- Mascot (gives probabilistic score)
- Aldente
- ProFound ProFound
- caveats
- no sequence information
- journals started to require that at least one peptide of a protein identified by PMF should be confirmed by MS/MS
Peptide fragment fingerprinting (PFF) or (MS/MS)
- algorithms (most commonly used):
- Sequest $$$
- Mascot $$$, free but limited web server form
- OMSSA Open Mass Spectrometry Search Algorithm, open source
- XTandem open source effort from Canada
- algorithms (other/new/experimental):
- Spectrum Mill $$$
- MASPIC
- this paper claims 5-15% more confident hits than Sequest: [1]
- InsPecT A new variable mods search from Pevzner & Tanner @UCSD (free?)
- filtering bad quality spectra
- filtering of the results
- Trans Proteomic Pipeline [2] (free?)
- download from Sourceforge (TPP Cygwin Setup for Windows or 'Trans-Proteomic Pipeline' for Linux)
- commercial offshot IPP
- wiki devoted to TPP TPP_Wiki
- dynamic newsgroup: spctools-discuss
- Trans Proteomic Pipeline [2] (free?)
- DTASelect it seems to be in a semi-frozen state (free for nonprofit but requires signed MTA)
Databases
Protein databases
Use (if possible):
- IPI International Protein Index
- always use target-decoy database (i.e. concatenated: human_IPI + reversed_human_IPI)
- decoy databases creation methods:
- protein reversal (simple to perform. does not scramble fortunately quite rare palindromic sequences)
- MEGGAYGAGKAGGAFDPYTL -=> LTYPDFAGGAKGAGYAGGEM
- peptide pseudo-reversal (used in Sorcerer by Sage-N Research)
- MEGGAYGAGKAGGAFDPYTL => (trypsin digest, used Ms-Digest) MEGGAYGAGK AGGAFDPYTL => GAGYAGGEMK-ALTYPDFAGG (each peptide reversed, but trypsin digestion site preserved -> guessed from the Elias 2007 paper)
- shuffled
- MEGGAYGAGKAGGAFDPYTL => FYAGADEAGMGTYKGGAGLP (used SMS, results differ each time) -> recommended by EBI ppl
- random (i.e. creating database of random proteins based on frequency of AA in source fasta file)
- protein reversal (simple to perform. does not scramble fortunately quite rare palindromic sequences)
- to create decoy database use DBToolkit free java standalone
Modification databases
- Unimod (> 500 natural + labels)
- Delta Mass A Database of Protein Post Translational Modifications (in vivo)
- RESID detailed descriptions of > 400 modifications
Peptide Tag Searching
"Designed to characterize peptides with mutations or unexpected post-translational modifications." (from Popitam page)
- GutenTag free for non-profit, MTA required. Assigns fewer peptides than Sequest but with fewer false positives. Occupies a middle ground between mainstream search algorithms and de novo sequencing.
de novo sequence determination algorithms
Spectral matching
The idea is that if one can match spectrum of an unknown peptide to a very similar MS/MS spectrum in a database with a determined sequence/annotation then one can annotate unknown peptide in a process similar to orthologue annotation in protein sequence databases. Caveat: bad annotations will also get propagated.
- P3 (server) from Global Proteomics Machine (free)
- SpectraST from ISB, Seattle (not as many species/options as P3). Ca 500x faster than Sequest on the same set.
lecture notes by Henry Lam from ISB
- BiblioSpec from MacCoss lab. (free for non-profit, online licence)
- command line only
Spectral libraries available here@PeptideAtlas
Protein quantification
- approaches
- isotopic labeling (ICAT, ITRAQ, SILAC, 18O- or 15N-labeling)
- label-free methods
- COFRADIC
- software
- ASAPRatio from Trans Proteomics Pipeline:
"calculates the relative abundances of proteins and the corresponding confidence intervals from ICAT-type ESI-LC/MS data"
- MSQuant Parser for Mascot results for quantitation (Windows only)
Frameworks/pipelines
- Trans Proteomic Pipeline (TPP) most popular, included in Sorcerer from Sage-N Research. Windows/Cygwin/Perl or Linux based.
- Open-MS German, C++ based
- Sorcerer $$$ FPGA-based fast hardware solution for SEQUEST & Tandem searches with TPP on top of it.
File formats
- .bdx from Bruker Daltonics
- .dta SEQUEST/Thermoelectron. Two versions:
- single pectra
- multiple spectra concateneted in one file
- .mgf multiple spectra, Mascot (Matrix science)
- mzXML (used by Trans Proteomics Pipeline)
- mzData (standard set by HUPO Proteomics Standard Initiative)
Spectrum datasets
Good for testing programs:
- Open Proteomics Database OPD
- HUPO Plasma Proteome Project files PRIDE@EBI
Web sites
- UCSD (Pevzner)
- U. of Washington (MacCoss)
- Proteome Commons collection of tools & links
- GenePattern proteomics modules from Broad Inst.
- USC in LA several programs: PepHMM, Sub-DeNovo, SuffixTree-MS.
Reviews
For a good review of programs and aspects of protein identification by mass spectrometry see:
Tutorials
- Frédérique Lisacek's @Proteomics Web-based MS/MS Data Analysis on the web: Mascot, Phenyx and X!Tandem
Other tools to be sorted out
Ortholog searches using sequence tags/ambigous sequence
$$$ programs
- ProteinLynx Global SERVER $$$, from Waters Waters Corporation
- Phenyx from GeneBio (online web server)
Other
Needs to be sorted out.
Experimental
- DeNovoID web
- SPIDER (PDF) de novo + homology search in other species based on a set of tags
- OpenSea (HTML) Java program available from authors
ModifiComb (HTML) (available from authors?)
Credits
- Darek Kedra wrote this tutorial