Periodicity in proteins

A few years ago I did some work looking for periodic structures in proteins. The initial motivation was to investigate an old theory that proteins originally arose from concatentation of much smaller peptide fragments, which goes back to Susumu Ohno in the 1960s at least. Ohno in his lifetime could only get together enough sequence to demonstrate a degree of periodicity in heat-shock proteins (which would be prime candidates to show it as they are so ultra-conserved). According to the theory, the primordial proteins would have been rather simply repetitive, then a few billion years of drift and selection would gradually rub out the traces of their origins. The question is: is there anything still left that might support the primordial concatenation theory?

The answer is, mostly no. Where we could see periodicity, it seemed to be related to periodic structures (at least in those proteins where we could get structure), and usually fairly obviously due to selection. However, there were a few proteins where there was subtle periodicity at the sequence level that could not be explained away like that, so perhaps there is something in it still. Another odd thing that came out is that archaea don't have much periodicity at all, whereas eukaryotes have a lot (comparatively speaking - it's generally rare in all phyla), and eubacteria somewhere in the middle. I wondered if perhaps if the high temperatures that archaea often live in, make periodic proteins unstable....

  • Gatherer D & McEwan NR (2003) Analysis of sequence periodicity in E.coli proteins: empirical investigation of the ‘duplication and divergence’ theory of protein evolution. J. Mol. Evol. 57 149-158.
  • Gatherer D & McEwan NR (2005) Phylogenetic differences in content and intensity of periodic proteins. J. Mol. Evol. 60 447-461.

There is one thing I'd still like to do on this subject which is to look for sub-regions of periodicity within proteins. It might be that expecting to see traces of primordial periodicity over the whole length of a protein is actually asking for too much. The only problem with this is that decreasing the window size makes false positives more likely. Some day, I'd like to dot this last "t" in this subject.

