CH391L/S12/Unnatural Amino Acids
Unnatural Amino Acids
The genetic code for the translation of RNA into protein is one of the most ancient and universal innovations in the evolution of life on earth. Nearly all life forms use the same redundant code for the incorporation of the 20 canonical amino acids into proteins. In two unique exceptions, selenocysteine and pyrrolysine, stop codons have been retooled to code for a 21st amino acid.  Expansion of the genetic code to include noncanonical, or unnatural amino acids (UAAs) holds promise for improving and diversifying protein function, generating proteins that normally would require postranslational modification, and the study of the genetic code itself. Technology for the creation of proteins bearing UAAs has progressed steadily over the last ~30 years, including both in vitro and in vivo methods.
In vitro Synthesis
Solid-phase Synthesis and Chemical Ligation
Solid-phase peptide synthesis (SPPS) was developed by Bruce Merrifield in the early '60s, for which he received the Nobel Prize in 1984. In this method, the C-terminal amino acid is anchored via a linker to an insoluble support. Both the N-terminus and the side-chain are protected from reaction. The N-terminus is typically protected by Boc or Fmoc groups, and the side-chains can be protected by a variety of groups. In the first step, the N-terminus is deprotected. The next desired amino acid in the chain is added to the column. This process of deprotection and addition is repeated until the chain is completed. The side-chains are then deprotected, and the completed peptide is eluted from the column. This method is useful for producing peptides of up to ~50 amino acids, and noncanonical amino acids may be readily incorporated. The possible length of polypeptides produced is limited primarily by the yield at each incorporation step. While each incorporation step in SPPS is nearly 99% efficient, it is still substantially less efficient than the technologies used to synthesize DNA oligos, which can be produced in lengths over 100 bases. This is partially due to the numerous side reactions possible at each step in SPPS.
Native chemical ligation is used to produce larger peptides and full proteins. This method requires uniquely reactive functionalities incorporated into each peptide at the N- and C-terminus, and allows the use of unprotected peptide segments. In one method of native chemical ligation, the thiolate group of an N-terminal cysteine residue peptide attacks the C-terminal thioester of a second unprotected peptide. This reversible transthioesterification step is chemoselective and regioselective and leads to form a thioester intermediate. This intermediate rearranges irreversibly by an intramolecular S,N-acyl shift that results in the formation of a native peptide bond at the ligation site. This method may be repeated to make long peptides and proteins.
Synthesis via Chemically Aminoacylated tRNAs
Proteins with unnatural amino acids may also be produced biosynthetically. In this technique, truncated tRNAs are enzymatically ligated to chemically aminoacylated nucleotides, effectively decoupling the identity of the tRNA from that of the attached amino acid. One can then use a cell-free translation system to synthesize proteins with unnatural amino acids incorporated at the codon complementary to the tRNA used - typically an unused stop codon like Amber (UAG).  The protein may then be isolated and its properties analyzed.
In vivo Approaches
Amino Acid Auxotroph Substitution
Strains auxotrophic for a canonical amino acid can incorporate close structural analogs into proteins. Cells may be grown in the presence of the canonical amino acid, and then removed from the growth medium and inoculated into growth medium containing none of the canonical amino acid, but an overabundance of a close structural analog. While this analog is usually not able to sustain exponential growth, nondividing cells are still viable and able to overexpress proteins containing this analog.
Bacher et al. evolved tryptophan auxotrophic E. coli long-term via serial transfer in normally toxic tryptophan analogs. Evolved strains effectively replaced tryptophan in their genetic code with fluorotryptophan within detection limits when it was the only option available. Strains still grew better on tryptophan than any of the analogs when it was provided.
In vivo Amber Codon Suppression
In the late 1990s and early 2000s, the Schultz group at Scripps developed the technology to generate organisms with an expanded 21 amino acid genetic code.[10, 11, 12] UAAs have been successfully genetically encoded in organisms including a variety of bacteria, yeast, and human cells. These systems are diagrammed at right and generally consist of the following components:
- An orthongonal Amber stop codon (UAG) suppressor tRNA
- Evolved aminoacyl tRNA synthetase (aaRS) to charge specific unnatural amino acids on Amber suppressor tRNA
- A selectable (Ab resistance) marker with at least one in-frame Amber codon
- Exogenously supplied unnatural amino acid
The primary challenge to overcome in the development of these systems is the fulfilling the criteria of aaRS/tRNA orthogonality and aminoacylation specificity. The best starting point for this goal is to import an aaRS/tRNA pair from a different domain of life. The orthogonality of the pair must then be improved by rounds of positive selection to obtain individuals that successfully incorporate the unnatural amino acid of choice and negative rounds of selection to ensure that canonical amino acids are not incorporated at the Amber codon. These rounds of selection typically take place on the tRNA and synthetase individually. This process is diagrammed below.
Many orthogonal tRNA/aaRS pairs have been developed, and the source organism for each pair will typically be from a different domain of life than the organism for which the pair will be engineered. Different pairs require varying degrees of engineering and directed evolution.
The tyrosyl tRNA/synthetase pair of Methanocaldococcus jannaschii, an archaebacterium, is one of the most commonly evolved orthogonal pairs for the incorporation of unnatural amino acids. This pair was originally chosen because the identity elements of its tyrosyl tRNA differ from those of the E. coli tyrosyl tRNA, and the aaRS contains a very minimal anticodon loop binding domain and lacks any editing mechanism which would proofread UAA-tRNA ligation. This tRNA-aaRS pair is used for the incorporation of UAAs 1-15, 17-26, 31, 32, 34-36, 41-44, 46, and 48-50 in figure at the bottom of the page.
A number of methanogen archaea including Mathanosarcina barkeri naturally encode pyrrolysine as a 21st amino acid at Amber (UAG) codons.  This system is unique in that it evolved naturally, and is highly orthogonal and efficient in other species of bacteria even without extensive optimization. This system has been used to incorporate UAAs 40, 51, 59, 60,61-68 and 69-71 in the figure at the bottom of the page.
While the 20 canonical amino acids are clearly sufficient for the vast diversity of form and function already observed in living systems, an expanded genetic code may still be advantageous to organisms under certain conditions and allows us to further manipulate and study protein properties and functions. Many of the unnatural amino acids have chemical groups unique to the genetic code, and could confer unique capabilities on proteins harboring them (1-11, 17, 18, 64, 67, 68, 70, 71). Other UAAs contain in vitro or cellular probes of protein structure and function. These may be used in IR or NMR, as fluorescent probes, or may have heavy atoms for X-ray crystallography (12-33, 36, 44, 45-47, 53-63, 65, 66, 69). Other UAAs correspond to or mimic the product of a post-translational modification, enabling the purification and study of such proteins.
Limits on Orthogonality
While papers describing incorporation systems for specific UAAs will typically profess a high degree of orthogonality of the system and fidelity of UAA incorporation, the reproducibility of these results varies severely from system to system. While fidelity of incorporation can be directly measured using mass spectrometry and N-terminal (Edman) sequencing of proteins, a standard rough measure of orthogonality is the ability of a strain carrying a selectable marker (Cam) with in-frame Ambers to grow with and without the UAA of choice. If the tRNA/aaRS pair is highly orthogonal, bacteria should only be able to grow in the presence of the UAA under these circumstances. If canonical amino acids may be charged by the introduced aaRS, or if an endogenous aaRS recognizes the introduced tRNA, then canonical amino acids will be incorporated at Amber and the bacteria will grow with or without the presence of the UAA. Different pairs fare variably in this test, with some such as L-DOPA(21) growing better in the absence of the UAA than in its presence, and others (22,23) displaying better orthogonality from personal experience. These challenges may be overcome by using advanced techniques for the evolution of better orthogonal pairs, or by reengineering strains to encourage orthogonality.
Release Factor 1 (RF1) recognizes the termination codons UAA and UAG, and is responsible for stopping translation at these codons. While obviously important for proper functioning of translation, the presence of RF1 also limits the amount of full-length protein produced if the gene contains an in-frame stop codon by competing with the Amber suppressor tRNA at the ribosome. This problem is compounded with each additional Amber in the gene, leading to a rapid dropoff of full-length protein isolated with greater than one stop codon.
Until recently, it was thought that RF1 was essential for cell survival. Several methods have recently been used to make RF1 conditionally inessential, enabling its knockout. Mukai et al. introduced all seven essential genes normally ending in Amber codons on a plasmid, instead ending in UAA.  Johnson et al. "fixed" the expression of RF2, the other primary release factor in E. coli. Both these measures enabled the knockout of RF1. The benefit of this knockout can be seen by the amount of full-length GFP produced in these knockout strains. Multiple Amber stop codons may exist in the GFP reading frame and still result in functional, full-length GFP when RF1 is knocked out. Furthermore, these strains do not grow in the absence of the UAA. Mukai et al. theorize that this is because, in the absence of the UAA, ribosomes stall at Amber stop codons, limiting their availability for translating other proteins and resulting in the degradation of essential proteins whose coding sequence ends in UAG. Finally, Huang et al. increased Amber suppression by overexpressing the C-terminal domain of ribosomal protein L11. The C-terminal domain is thought to bind the ribosome and allows for normal protein translation but greatly increased rates of Amber suppression due to competition with full-length, functional L11. This measure circumvented the need for an RF1 knockout and allows coding of multiple UAAs residues in one gene.
Adaptation to System
Manipulating the genetic code puts strong selection pressures on the cell that are not well-understood. While RF1 knockout cells initially grow very poorly in media not supplemented with unnatural amino acid, they rapidly adapt to these conditions and recover a growth rate comparable to cells grown in media supplemented with UAA. Even bacteria grown in media supplemented with UAA were observed to develop resistance to a lack of UAA after two weeks of daily transfers. The mechanism of this resistance has not yet been determined. These properties make Amber suppressor strains unsuitable for long-term culture under current conditions.
Fluctuation tests are used to measure the rate of mutations leading to a specific selectable phenotype, such as antibiotic resistance. Fluctuation tests to measure the rate of mutation leading to resistance to lack of UAA have proven difficult. Rather than producing a small number of distinct colonies, plates have typically had a mix of a haze of growth and too many colonies to count. It is unclear whether this growth is due to mutation, or salvage of residual iodotyrosine plated with the cells. I have been trying different plating conditions to alleviate these issues. Obtaining colonies from a fluctuation test would not only allow us to measure the mutation rate, but would provide us with a number of clones which have uniquely evolved to be resistant to a lack of 3-iodotyrosine. Sequencing the plasmids, and perhaps genomes of these mutants would provide valuable insight into routes of adaptation to the system, and thus potential strategies to circumvent these adaptations.
- Longstaff DG, Larue RC, Faust JE, Mahapatra A, Zhang L, Green-Church KB, and Krzycki JA. . pmid:17204561.
- Böck A, Forchhammer K, Heider J, and Baron C. . pmid:1838215.
- Dawson PE and Kent SB. . pmid:10966479.
- Native Chemical Ligation. (n.d.). In Wikipedia. Retrieved March 25, 2012, from http://en.wikipedia.org/wiki/Native_chemical_ligation
- Schnölzer M and Kent SB. . pmid:1566069.
- Hecht SM, Alford BL, Kuroda Y, and Kitano S. . pmid:248056.
- Noren CJ, Anthony-Cahill SJ, Griffith MC, and Schultz PG. . pmid:2649980.
- Link AJ, Mock ML, and Tirrell DA. . pmid:14662389.
- Bacher JM and Ellington AD. . pmid:11514527.
- Liu DR and Schultz PG. . pmid:10220370.
- Wang L, Xie J, and Schultz PG. . pmid:16689635.
- Wang L and Schultz PG. . pmid:11564556.
- Liu CC and Schultz PG. . pmid:20307192.
- Liu CC and Schultz PG. . pmid:20307192.
- Park HS, Hohn MJ, Umehara T, Guo LT, Osborne EM, Benner J, Noren CJ, Rinehart J, and Söll D. . pmid:21868676.
- Srinivasan G, James CM, and Krzycki JA. . pmid:12029131.
- Mukai T, Hayashi A, Iraha F, Sato A, Ohtake K, Yokoyama S, and Sakamoto K. . pmid:20702426.
- Johnson DB, Xu J, Shen Z, Takimoto JK, Schultz MD, Schmitz RJ, Xiang Z, Ecker JR, Briggs SP, and Wang L. . pmid:21926996.
- Muir TW. . pmid:12626339.
- Huang Y, Russell WK, Wan W, Pai PJ, Russell DH, and Liu W. . pmid:20237646.