Wikiomics:Protein function prediction
There are now plenty of proteins which have a totally unknown function. Automated function prediction is an active research field, with a growing community of bioinformaticians as observed at the AFP-SIG that took place at the ISMB 2005 conference, and at University of California San Diego in 2006.
Most often, only the sequence of the protein is known, but there are also hundreds of protein structures of unknown function which are provided by the structural genomics centers. Sometimes the proteins come from prokaryotes where the operons make it possible to infer the function of a protein from its genomic context, but this is more complicated in eukaryotes. And more generally, it is easier to guess right when a given protein has well-described homologs than when it belongs to a family of unknown biological role.
Of course, the notion of protein function is pretty broad and cannot easily be encoded without relying on a complex vocabulary. For that matter, the Gene Ontology aka GO provides hierarchical set of keywords called GO terms which describe different aspects of protein function with different levels of precision. GO is currently imposing itself as a standard for proteome annotation and function prediction of proteins.
Among the current software tools that exist today, several main strategies can be distinguished:
- homology search and transfer of annotations:
- sequence alignment
- structure alignment
- function inference by genomic context
- phylogenomic approaches
- prediction from structure using similarities that are not homology-based:
- local sequence patterns
- physico-chemical sequence features
- 3D local sites
- 3D physico-chemical features
Servers which competed at the AFP-SIG 2005
See also the short summaries by the authors themselves at the official site of AFP-SIG 2005.
These servers are based on transfer of function based on homology:
And the other servers are:
- SpearMint and RuleBase (not public yet) 
- PhydBac [7, 8, 9, 10, 11] analyzes bacterial proteins using genomic context.
- ProKnow  searches for known 3D folds, sequences, motifs, and functional linkages
- BLAST and PSI-BLAST [13, 14] are commonly used to search for homologous protein sequences by sequence alignment.
- Prosite [15, 16] is a searchable database of sequence patterns that are associated with some biological functions.
Other protein function prediction servers
JAFA is a meta-server for function prediction of proteins: it produces a prediction based on an aggregate from other servers. You might want to start with JAFA since it queries 5 servers (GOFigure, GOblet, InterproScan, GOtcha, PhydBac) and shows you where their results agree and differ.
- Protein Function Prediction Server - Protein function predictions from PDB structures . An enzyme/non-enzyme predictor, and an enzyme class predictor are available.
- GoFigure  predicts the function of a gene or protein
- ProFunc [19, 20] performs predictions from a protein structure
Methods using non-sequential sequence features:
These methods are based on function transfer after homology searches:
- Blast2GO 
- OntoBlast 
- GOblet [26, 27]
- GOtcha 
- Phunctioner  is a method based on the association of GO terms with conserved residues in 3D structural alignments
- Automated function prediction of genes and proteins, our local community pages
- Martí-Renom MA, Ilyin VA, and Sali A. . pmid:11524379.
- Hawkins T, Luban S, and Kihara D. . pmid:16672240.
- Szafron D, Lu P, Greiner R, Wishart DS, Poulin B, Eisner R, Lu Z, Anvik J, Macdonell C, Fyshe A, and Meeuwis D. . pmid:15215412.
- Lu P, Szafron D, Greiner R, Wishart DS, Fyshe A, Pearcy B, Poulin B, Eisner R, Ngo D, and Lamb N. . pmid:15608166.
- Vinayagam A, König R, Moormann J, Schubert F, Eils R, Glatting KH, and Suhai S. . pmid:15333146.
- Wieser D, Kretschmann E, and Apweiler R. . pmid:15262818.
- Enault F, Suhre K, Abergel C, Poirot O, and Claverie JM. . pmid:12855445.
- Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. . pmid:12824402.
- Enault F, Suhre K, Poirot O, Abergel C, and Claverie JM. . pmid:15215406.
- Suhre K and Claverie JM. . pmid:14681411.
- Enault F, Suhre K, and Claverie JM. . pmid:16221304.
- Pal D and Eisenberg D. . pmid:15642267.
- Altschul SF, Gish W, Miller W, Myers EW, and Lipman DJ. . pmid:2231712.
- Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. . pmid:9254694.
- Bucher P and Bairoch A. . pmid:7584418.
- Hulo N, Sigrist CJ, Le Saux V, Langendijk-Genevaux PS, Bordoli L, Gattiker A, De Castro E, Bucher P, and Bairoch A. . pmid:14681377.
- Dobson PD and Doig AJ. . pmid:12850146.
- Khan S, Situ G, Decker K, and Schmidt CJ. . pmid:14668239.
- Laskowski RA, Watson JD, and Thornton JM. . pmid:15980588.
- Laskowski RA, Watson JD, and Thornton JM. . pmid:16019027.
- Jensen LJ, Gupta R, Blom N, Devos D, Tamames J, Kesmir C, Nielsen H, Staerfeldt HH, Rapacki K, Workman C, Andersen CA, Knudsen S, Krogh A, Valencia A, and Brunak S. . pmid:12079362.
- Jensen LJ, Gupta R, Staerfeldt HH, and Brunak S. . pmid:12651722.
- Hobohm U and Sander C. . pmid:7650738.
- Conesa A, Götz S, García-Gómez JM, Terol J, Talón M, and Robles M. . pmid:16081474.
- Zehetner G. . pmid:12824422.
- Hennig S, Groth D, and Lehrach H. . pmid:12824400.
- Groth D, Lehrach H, and Hennig S. . pmid:15215401.
- Martin DM, Berriman M, and Barton GJ. . pmid:15550167.
- Pazos F and Sternberg MJ. . pmid:15456910.
- Storm CE and Sonnhammer EL. . pmid:11836216.
- Zmasek CM and Eddy SR. . pmid:12028595.
- Engelhardt BE, Jordan MI, Muratore KE, and Brenner SE. . pmid:16217548.
- Gouret P, Vitiello V, Balandraud N, Gilles A, Pontarotti P, and Danchin EG. . pmid:16083500.
- Friedberg I. . pmid:16772267.
- Martin Jambon: introduction plus the initial list of tools and papers, put together after the AFP-SIG 2005 conference (at ISMB 2005)
- other Wikiomics authors