Arking:JCAOligoTutoria22

From OpenWetWare
Revision as of 15:04, 31 January 2009 by JCAnderson (talk | contribs)
Jump to navigationJump to search

Generating parts encoding circularly permuted proteins

A circularly permuted is generated by (conceptually) linking together the N and C terminus of a protein into a circular molecule, and then cutting it back open at a different site. In the DNA, what this boils down to is doing something like this:

Note that in this example, I've shown how you would permute a periplasmically-expressed protein, hence the prepro sequence targeting it to the periplasm. When permuting, regulatory features need to stay in the same spots--you should want to "spin around" the active peptide. If this protein weren't periplasmic, it would be even easier to permute. You'd just flip the N and C terminal regions and that's it.

To illustrate how this done, let's try making one! Let's design a circularly permuted T4 Lysozyme. I have no idea whether the product of this construction file is a functional protein or not, but you'll get the idea of it. First of all, grab the crystal structure of it. T4 Lysozyme (T4L) is heavily studied structurally, so there are tons of files available on pubmed or pdb. Let's look at PDB ID: 3DN1.

The first question to ask is where are the N and C terminus. Are they reasonably close to each other? If not, you're probably not going to be able to make this work. They don't have to be right on top of each other, you can make up for some distance with a flexible linker. They look pretty good in T4L. You also want to look for a place to cut it back open. The ideal spots are going to be large disordered loops. T4L doesn't really have one, so we'll just go with one of the loops. Gly51 looks like a reasonable spot. There are more sophisticated modeling tools that probably would be wise to use for this sort of design, but I won't get into that here.

To make this thing, we need a template DNA. Berkeley iGEM 2008 has cloned T4L already and also removed the restriction sites, so we'll start with pBca1256-K112012, which you can download here.

The first step is to break up the sequence into its constituent regions, clone and sequence those. If we had a periplasmically-targetted protein, we'd split this up into 3 parts:

In designing the component parts, you need to make sure you are grabbing the right portions of the protein. For prepro sequences, often the sequence is presented as an annotation in the genbank file. If not, you can use a prediction tool such as: http://www.cbs.dtu.dk/services/SignalP/. You need to be sure you are including the entire prepro including the dipeptide that gets cleaved during processing.

For our purposes here, T4L is a cytoplasmic protein, so we don't have to deal with a prepro. We just need to break it into N and C-terminal parts. We need to make sure we're cutting at the right spot, though. Let's look at the crystal structure. Often the numbering of the amino acids in crystal structures is not the same as the numbering from the start codon! So, don't go into autopilot mode in finding your cut site within the DNA. First of all, find the amino acid (Gly51) in the structure and note the peptide that comes after it. In this case, it's GRNCNG. Now, using ApE, we'll translate the T4L CDS. Put your cursor at the BglII site of the ApE file with your source plasmid sequence and then go under ORFs > Find next. That should light up your open reading frame. Keep in mind that some genes start with GTG or even TTG, so note the annotation that is in your source to be sure you are really starting at the right spot. With the ORF highlighted, select ORFs > translate. make sure the DNA: Above button is clicked and say OK. Now, look for your peptide within the window that ApE popped up. Use your cursor to highlight the DNA above the GRNCNG peptide and copy that sequence. You can now close that translate window. Search for the GRNCNG peptide within your sequence file and highlight it. Now translate that again and make sure it translates at GRNCNG. If it doesn't, you probably grabbed the DNA 1 or 2 basepairs out of frame. Go back and re-do it until you get the sequence corresponding to GRNCNG highlighted.

OK, now we're ready to break this into two DNA sequences. First of all, copy the ORF of T4L into a new window. Next, let's break it directly 3' of the Gly51 codon. So, I'm going to start my mouse at the start codon and highlight up to the last base of the Gly51 codon. Now ctrl+x to cut, and paste in a new window. Alright, now we have our two windows corresponding to the N and C termini. The two sequences I have are:

N-terminus:

 atgaatatatttgaaatgttacgtatagatgaaggtcttagacttaaaatctataaagacacagaaggctattacactattggcatcggtcatttgcttacaaaaagtccatcacttaatgctgctaaatctgaattagataaagctattggg

C-terminus:

 cgtaattgcaatggtgtaattacaaaagatgaggctgaaaaactctttaatcaggatgttgatgctgctgttcgcggaatcctgagaaatgctaaattaaaaccggtttatgattctcttgatgcggttcgtcgctgtgcattgattaatatggttttccaaatgggagaaaccggtgtggcaggatttactaactctttacgtatgcttcaacaaaaacgctgggatgaagcagcagttaacttagctaaaagtagatggtataatcaaacacctaatcgcgcaaaacgagtcattacaacgtttagaactggcacttgggacgcgtataaaaatctataa

Now would be a good time to repeat the translation proceedure on these sequences and make sure that each sequence is still in-frame and starts and stops with the right amino acids.

Alright, now add your BglBrick polylinker ends to these sequence and design some oligos. You should also add/remove start and stop codons where appropriate at this step. You'll be re-using these oligos in the second part of the construction, and the termini of the final products will be set by these oligos. Here's my two construction files:

PCR Oca9393/Oca9394 on pBca1256-K112012   (187 bp, EcoRI/BamHI)
Sub into pBca9145-Bca1144#5               (EcoRI/BamHI, 2057+910, L)
Product is pBca9145-Bca9393   {N.T4L>}
----
Oca9393  Construction of cpT4L N term part
ctctgGAATTCATGAGATCTatgaatatatttgaaatgttac
Oca9394  Construction of cpT4L N term part
catgtGGATCCttacccaatagctttatctaattcag
PCR Oca9395/Oca9396 on pBca1256-K112012   (376 bp, EcoRI/BamHI)
Sub into pBca9145-Bca1144#5               (EcoRI/BamHI, 2057+910, L)
Product is pBca9145-Bca9395   {<C.T4L!}
----
Oca9395  Construction of cpT4L C term part
ctctgGAATTCATGAGATCTatgcgtaattgcaatggtgtaattac
Oca9396  Construction of cpT4L C term part
catgtGGATCCttatagatttttatacgcg

Alright, now that we have our individual parts made and sequenced, let's assemble them using SOEing to generate the circularly permuted gene. Let's design the junction between the two. First of all, we're going to want to put a linker between the two parts. It definitely will matter what this sequence is, but it really must be determined empirically. For today, let's just use the sequence GGQSGQ. A DNA sequence for that is:

 Linker
 GGAGGGcagtctgggcag

Now, let's grab the last 20bp (or so, usual PCR design rules apply) of the new N-terminal part. We'll want to remove the stop codon, so that gives us:

 N junction
 gggacgcgtataaaaatcta

Let's grab the first 20bp (or so) of the new C-terminal part. Whether to include the start codon or not is up for debate. Usually you would want to remove it:

 C junction
 aatatatttgaaatgttacg

Our forward oligo for amplifying Bca9393 will then be

 Linker.C junction
 GGAGGGcagtctgggcagaatatatttgaaatgttacg

Our reverse oligo for amplifying Bca9395 will be the reverse complement of

 N junction.Linker
 gggacgcgtataaaaatctaGGAGGGcagtctgggcag

So, that's:

 ctgcccagactgCCCTCCtagatttttatacgcgtccc

Alright, now we can write up the construction file:

PCR Oca9397/Oca9394 on pBca9145-Bca9393    (182 bp, gp, =fragA)
PCR Oca9395/Oca9398 on pBca9145-Bca9395    (380 bp, gp, =fragB)
PCR Oca9395/Oca9394 on fragA + fragB       (544 bp, EcoRI/BamHI)
Sub into pBca9145-Bca1144#5                (EcoRI/BamHI, 2057+910, L)
Product is pBca9145-Bca9398   {cpT4L!}
----
Oca9397  Forward SOEing oligo for cpT4L
GGAGGGcagtctgggcagaatatatttgaaatgttacg
Oca9398  Reverse SOEing oligo for cpT4L
ctgcccagactgCCCTCCtagatttttatacgcgtccc

Check it!

For things like this you really really REALLY want to go through the construction file very carefully and inspect your final model sequence for correctness.

  • Did all the pcrs "work"?
  • Is your product sequence the correct frame, and are starts, stops, and linkers all in the right places and correct frames?