BioSysBio:abstracts/2007/Manuel Corpas: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 19: Line 19:


==Results==
==Results==
A structural annotated database has been generated in order to collect analyses
derived from several unrelated algorithms. Besides the classical information expected, we propose insights into positions known or predicted to be important for the folding nucleus and potential functional sites. We also deal with fragments in terms of stability regarding potential mutations, or as constituting autonomous folding units.


From an initial analysis of the data, we found, not surprisingly, that certain results were strongly correlated: e.g., residue accessibility values (denoting the degree of internal constraint on flexibility), Fold-X scores (denoting the stabilising contributions to the fold), Popmusic values (denoting destabilising contributions), and lattice simulations (denoting the number of close neighbours or interaction partners within the fold). We used these values to synthesise a ‘folding score’.  
From an initial analysis of the data, we found, not surprisingly, that certain results were strongly correlated: e.g., residue accessibility values (denoting the degree of internal constraint on flexibility), Fold-X scores (denoting the stabilising contributions to the fold), Popmusic values (denoting destabilising contributions), and lattice simulations (denoting the number of close neighbours or interaction partners within the fold). We used these values to synthesise a ‘folding score’.  
Folding scores were created for every residue in the database and subsequently compared to the database components as shown in Fig.1. Low folding scores were found to discriminate significantly regions rich in residues annotated in the literature as folding nucleus, critical for the correct folding of the protein. Folding score troughs were observed to be highly conserved, while peaks had variable degree of conservation. We found however, that highly conserved regions with folding score peaks are indicative of functional role attached to the region. 





Revision as of 08:42, 29 September 2006

  • Add or delete the sections that you require.

PFF – An integrated database of residues and fragments critical for protein folding

Author(s): Corpas M., Sinnott J., Thorne D., Pettifer S., and Attwood T., and the PFF consortium
Affiliations: Faculty of Life Sciences and Computer Science, University of Manchester
Contact: email: corpas@bioinf.man.ac.uk
Keywords: 'Protein folding' 'Protein Data Bank' 'Folding Nucleus' 'Residue Stability'

Background/Introduction

Despite decades of work, understanding how proteins fold remains a major research challenge. The fruits of this massive research effort have been: development of (i) methods for predicting the likely structures that protein sequences will adopt, or for simulating the folding process itself; and (ii) databases of structural information (e.g., containing 3D coordinates, fold classifications, structure summary data, and so on). As part of the ongoing endeavour to understand the principles of protein folding, we have been involved in the development of a new, integrated structure information resource, based on a small subset of the PDB (1). The resource contains information derived from a combination of sequence analysis tools, structure analysis software and fold simulation algorithms; to make the contents more accessible to the wider community, we have also developed a user-friendly front-end for visualising the integrated data. The motivation for combining data from these various approaches is to offer insights into the role of particular types of residues and fragments in protein folding, and hence to improve our understanding of factors that are critical to the folding process in general.

Materials/Methods

As part of the European Protein Folding Fragments consortium, we have created a database of structural information (PFF) derived from 116 representative folds from the PDB. The integrated resource is augmented by tools from the UTOPIA project, which have been adapted to interactively visualise the PFF annotations on their respective 3D structures. The visual toolkit includes features for searching and browsing the dataset, and for displaying the relationships between annotated 3D structures and multiple sequence alignments. The database contains information such as the locations of tightened-end fragments -TEFS- (2), foldons (3), most interacting residues (4), topohydrophobic residues (5), fingerprints (6), and stability data derived from PoPMusic (7) and Fold-X (8). For each entry, both the sequence from Swiss-Prot (9) and its corresponding nucleotide sequence are included; secondary structure assignment derived from DSSP, and atomic and internal coordinates (including pseudo dihedral and valence angles) are also provided.

Results

A structural annotated database has been generated in order to collect analyses derived from several unrelated algorithms. Besides the classical information expected, we propose insights into positions known or predicted to be important for the folding nucleus and potential functional sites. We also deal with fragments in terms of stability regarding potential mutations, or as constituting autonomous folding units.

From an initial analysis of the data, we found, not surprisingly, that certain results were strongly correlated: e.g., residue accessibility values (denoting the degree of internal constraint on flexibility), Fold-X scores (denoting the stabilising contributions to the fold), Popmusic values (denoting destabilising contributions), and lattice simulations (denoting the number of close neighbours or interaction partners within the fold). We used these values to synthesise a ‘folding score’.

Folding scores were created for every residue in the database and subsequently compared to the database components as shown in Fig.1. Low folding scores were found to discriminate significantly regions rich in residues annotated in the literature as folding nucleus, critical for the correct folding of the protein. Folding score troughs were observed to be highly conserved, while peaks had variable degree of conservation. We found however, that highly conserved regions with folding score peaks are indicative of functional role attached to the region.


Fig.1 Chloramphenicol Acetyltransferase Type III (PDB code: 3cla). The folding score is shown in purple across the 2D representation of the protein. Folding score troughs (blue rectangles) suggest regions likely to form part of the folding nucleus; these correlate well to regions rich in topohydrophobic residues ('T'). The conservation score (red) relates to the alignment associated to the PDB sequence using the Scorecons server (9). High conservation folding score peaks (green rectangles) indicate potential functional regions. Tightened end fragments (blue lines) tend to map to folding score peaks, independently of the conservation. At the bottom, conserved peaks and troughs were compared to manually selected motifs (black rectangles), conserved regions outside motifs (grey rectangles) and gap regions (blank), suggesting a means for automatic motif discovery and characterisation.

Conclusion

These results and their companion resources have been designed for the characterisation of critical fragments for folding identified within the PFF Consortium. A goal of PFF was to create a consensus "prediction" tool combining the strengths of different methods. Coupled with the degree of conservation of residues, a folding score was used to delineate regions that are likely to contribute to (i) the stability of the fold (and hence may contribute to the folding nucleus), and (ii) the function of the protein. The folding score offers a means of automatic motif detection, which can be used for protein family characterisation and functional/structural annotation of evolutionarily conserved regions. We present here a simple case-study to illustrate how the combined data can be used to pinpoint such motifs with potential structural and functional roles. We found that integration of different methods has indeed added value over individual methods.

Availability

Version 1.0 of the PFF dataset is accessible in a DSSP-flat-file format from http://www.proteinfoldingfragments.net; it is also available in an XML format through the UTOPIA toolkit. The UTOPIA visualisation tools are freely available for OS X, Windows and Linux at http://utopia.cs.manchester.ac.uk. The Web resource for calculating combined folding scores is accessible at http://umber.sbs.man.ac.uk/~corpas/db/.

References

1. Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N. and Bourne, P.E. (2000) Nucleic Acids Res, 28, 235-242.

2. Lamarine, M., Mornon, J.P., Berezovsky, N. and Chomilier, J. (2001) Cell Mol Life Sci, 58, 492-498.

3. Maity, H., Maity, M., Krishna, M.M., Mayne, L. and Englander, S.W. (2005) Proc Natl Acad Sci U S A, 102, 4741-4746.

4. Papandreou, N., Berezovsky, I.N., Lopes, A., Eliopoulos, E. and Chomilier, J. (2004) Eur J Biochem, 271, 4762-4768.

5. Poupon, A. and Mornon, J.P. (1998) Proteins, 33, 329-342.

6. Attwood, T.K., Bradley, P., Flower, D.R., Gaulton, A., Maudling, N., Mitchell, A.L., Moulton, G., Nordle, A., Paine, K., Taylor, P. et al. (2003) Nucleic Acids Res, 31, 400-402.

7. Gilis, D. and Rooman, M. (2000) Protein Eng, 13, 849-856.

8. Schymkowitz, J.W., Rousseau, F., Martins, I.C., Ferkinghoff-Borg, J., Stricher, F. and Serrano, L. (2005) Proc Natl Acad Sci U S A, 102, 10147-10152.

9. Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M. et al. (2005) Nucleic Acids Res, 33 Database Issue, D154-159.

10. Valdar, W.S., (2002) Proteins, 48, 227-241.


[link title]