Build-a-Gene Session 5: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 82: Line 82:




The DNA is sequenced using chain termination sequencing(also called Sanger or cycle sequencing). Information on cycle sequencing and how it works can be found here: [http://www.dnalc.org/view/15923-Cycle-sequencing.html] [and http://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806]  
The DNA is sequenced using chain termination sequencing(also called Sanger or cycle sequencing). Information on cycle sequencing and how it works can be found here: [http://www.dnalc.org/view/15923-Cycle-sequencing.html] and [http://www.nature.com/scitable/topicpage/the-order-of-nucleotides-in-a-gene-6525806]. When sequencing data is sent to us, we receive not only a text file containing the sequence of the DNA insert, but we also receive the data from the sequencing machine in the form of a color-coded electropherogram. The electopherogram represents the data obtained from sequencing detector, with the height of each peak representing the strength of the signal. We can therefore see the quality of the sequencing data that was obtained as well as investigate any ambiguities in the sequence. A sample electropherogram is here: [http://www.lookfordiagnosis.com/mesh_info.php?term=Sequence+Analysis%2C+Dna&lang=1#] You will notice that the signal at the end of the electropherogram is not as strong as at the beginning; the peaks are much shorter and broader and become difficult to distinguish from one another. This is due to the difficulty of discriminating between relatively long DNA sequences at single-nucleotide resolution.


When sequencing data is sent to us, we receive not only a text file containing the sequence of the DNA insert, but we also receive the data from the sequencing machine in the form of a color-coded electropherogram. The electopherogram represents the data obtained from sequencing detector, with the height of each peak representing the strength of the signal. We can therefore see the quality of the sequencing data that was obtained as well as investigate any ambiguities in the sequence. You will notice that the signal at the end of the electropherogram is not as strong as at the beginning; the peaks are much shorter and broader and become difficult to distinguish from one another. This is due to the difficulty of discriminating between relatively long DNA sequences at single-nucleotide resolution.


Our emGFP gene is about 750 bp long. However, DNA sequencing reactions (called sequencing "reads") are only 700 nucleotides long. We therefore sequence each clone twice (once from the beginning to the end of the emGFP gene and once from the end to the beginning)- we call these "forward” and “reverse” sequencing reads. This ensures that we will get good sequencing data across the entire gene.




Our emGFP gene is about 750 bp long. However, DNA sequencing reactions (called sequencing "reads") are only 700 nucleotides long. We therefore sequence each clone twice (once from the beginning to the end of the emGFP gene and once from the end to the beginning)- we call these "forward” and “reverse” sequencing reads. This ensures that we will get good sequencing data across the entire gene.
''Comparing the forward sequencing read to the desired emGFP sequence''




Now we need to determine if our clones contain a sequence that perfectly matches the emGFP gene and promoter or if they have DNA sequence errors. To accomplish this, we a bioinformatics tool called Clustal W [www.ebi.ac.uk/Tools/clustalw2/index.html].  
Now we need to determine if our clones contain a sequence that perfectly matches the emGFP gene and promoter or if they have DNA sequence errors. To accomplish this, we a bioinformatics tool called Clustal W [www.ebi.ac.uk/Tools/clustalw2/index.html].  


1. Input the title of your sequence.  
1. Input the title of your sequence.<br/>
2. Input the sequence of the emGFP gene. The line before the emGFP sequence must contain >Name.of.Sequence (no spaces).  
2. Input the sequence of the emGFP gene. The line before the emGFP sequence must contain >Name.of.Sequence (no spaces). <br/>
3. Skip a line and input the forward sequencing reaction, preceded by >Name.of.Sequence.  
3. Skip a line and input the forward sequencing reaction, preceded by >Name.of.Sequence. <br/>
4. Click Align.  
4. Click Align. <br/>


Clustal W gives you a scores table indicating the pairwise alignment similarity score (out of 100). More importantly, it provides a DNA alignment. Residues that are identical in the two sequences marked with a *.  The alignment extends past the end of the emGFP gene and continues to sequence the vector as well.
Clustal W gives you a scores table indicating the pairwise alignment similarity score (out of 100). More importantly, it provides a DNA alignment. Residues that are identical in the two sequences marked with a *.  The alignment extends past the end of the emGFP gene and continues to sequence the vector as well.
''Analyzing the reverse sequencing read''


The reverse sequencing read is the reverse complement of the emGFP sequence because it sequenced the complementary DNA strand of the double helix. To line it up with the emGFP gene, we must first reverse the sequence.
The reverse sequencing read is the reverse complement of the emGFP sequence because it sequenced the complementary DNA strand of the double helix. To line it up with the emGFP gene, we must first reverse the sequence.


1. Go to the Sequence Manipulation Suite [http://www.bioinformatics.org/sms2/rev_comp.html]
1. Go to the Sequence Manipulation Suite [http://www.bioinformatics.org/sms2/rev_comp.html]<br/>
2. Make sure that you are on the Reverse Complement page and input your reverse sequencing read.  
2. Make sure that you are on the Reverse Complement page and input your reverse sequencing read. <br/>
3. Click Submit.  
3. Click Submit. <br/>
4. Cut and paste this sequence into Clustal W, preceded by >Name.of.Sequence.  
4. Cut and paste this sequence into Clustal W, preceded by >Name.of.Sequence. <br/>


The result should show the emGFP gene aligned with both the forward and reverse sequencing reads. At any nucleotide position, if your forward and reverse reads do not agree, one of the sequences is probably HIGHER quality than the other at every individual discrepant base (it’s more likely the ends at the beginning of the sequencing read are more reliable than at the end of the sequencing read. A mutation is only recorded if the forward and reverse reads agree with each other and disagree with the building block sequence.  
The result should show the emGFP gene aligned with both the forward and reverse sequencing reads. At any nucleotide position, if your forward and reverse reads do not agree, one of the sequences is probably HIGHER quality than the other at every individual discrepant base (it’s more likely the ends at the beginning of the sequencing read are more reliable than at the end of the sequencing read. A mutation is only recorded if the forward and reverse reads agree with each other and disagree with the building block sequence.  
''Calculating the error rate''





Revision as of 15:03, 18 August 2013

COLONY PCR

Because PCA produces many DNA molecules, not all of which are the correct size, we want to make sure that the DNA that we work with from here on contains the full-length emGFP gene. Remember that each bacterial cell originally picked up one DNA molecule. As that cell grew into a colony, all of the cells in that colony contain the same DNA molecule. Other bacterial colonies will contain DNA molecules that may be of a different size or sequence. We can therefore screen the bacterial colonies by colony PCR to determine which ones contain a plasmid insert that is the correct size for the emGFP+promoter.


1. Dispense 50 ul of water per tube into 12 different tubes.
2. Use a sterile toothpick to pick a bacterial colony and resuspend it in a tube with water. Repeat for 11 more colonies.
3. Prepare a master mix for all PCRs by combining all reagents listed below in one tube.


10 uM primer 8.4 ul
10 uM primer 8.4 ul
2.5 mM dNTPs 14.0 ul
Taq+buffer 95.2 ul


4. Add 9 ul of the master mix into 12 different PCR tubes.
5. Add 1 ul of resuspended bacterial cells from a different colony into each PCR tube (IT IS VERY IMPORTANT TO STORE THE REMAINDER!) Start the PCR reactions in the PCR machine.


Reaction Conditions:

95°C, 6 minutes

30 cycles:

       95oC, 30 seconds
       55oC, 30 seconds
       72oC, 1 minute

72oC, 10 minutes



GEL ELECTROPHORESIS

Now, we need to check how well our colony PCR worked by running our PCR products on an agarose gel to verify whether which colonies contain a plasmid carrying the full-length emGFP gene.


Pouring a Gel:

1. Weigh out 0.35 g of agarose on a piece of weigh paper. Transfer to an Erlenmeyer flask. Add 50 ml of 1x TAE.
2. Place the flask in the microwave and heat until the agarose is completely transparent and colorless.
3. Remove the flask of clear agarose and allow it to cool. This will take about 10 min.
4. When the agarose is cool, add 5 ul of gel red to the melted agarose
5. Swirl the agarose to incorporate the gel red and pour the agarose into the gel tray.
6. Allow at least 20 minutes for the gel to solidify. Once solid, carefully remove the comb and place the solidified gel (still on the tray) into the gel box so that the wells are oriented on the same side as the black electrode.
7. Add enough 1x TAE buffer to completely cover the gel by about 1 cm.


Preparing your samples:

1. On a piece of parafilm, spot out 2 ul of 6x DNA loading dye (for each colony PCR reaction.
2. Add 5 ul of water to each spot of dye.
3. Add 5 ul of PCR product to each spot of dye.


Running a Gel:

1. Into the first lane of the gel load 10 ul of the DNA ladder.
2. Then load 10 ul of each of your PCR products (mixed with water and dye).
3. Place gel lid with electrodes on gel box, and set voltage to 100V.
4. Run gel approximately 30 minutes or until the dye is 2/3 of the way down the gel, then take picture.


DNA SEQUENCE ANALYSIS

Once we’ve screened our clones by colony-screening PCR to verify that they contain an insert of the correct size, we need to sequence the inserts to verify that they contain an emGFP gene and promoter without any sequence errors.


The DNA is sequenced using chain termination sequencing(also called Sanger or cycle sequencing). Information on cycle sequencing and how it works can be found here: [1] and [2]. When sequencing data is sent to us, we receive not only a text file containing the sequence of the DNA insert, but we also receive the data from the sequencing machine in the form of a color-coded electropherogram. The electopherogram represents the data obtained from sequencing detector, with the height of each peak representing the strength of the signal. We can therefore see the quality of the sequencing data that was obtained as well as investigate any ambiguities in the sequence. A sample electropherogram is here: [3] You will notice that the signal at the end of the electropherogram is not as strong as at the beginning; the peaks are much shorter and broader and become difficult to distinguish from one another. This is due to the difficulty of discriminating between relatively long DNA sequences at single-nucleotide resolution.


Our emGFP gene is about 750 bp long. However, DNA sequencing reactions (called sequencing "reads") are only 700 nucleotides long. We therefore sequence each clone twice (once from the beginning to the end of the emGFP gene and once from the end to the beginning)- we call these "forward” and “reverse” sequencing reads. This ensures that we will get good sequencing data across the entire gene.


Comparing the forward sequencing read to the desired emGFP sequence


Now we need to determine if our clones contain a sequence that perfectly matches the emGFP gene and promoter or if they have DNA sequence errors. To accomplish this, we a bioinformatics tool called Clustal W [www.ebi.ac.uk/Tools/clustalw2/index.html].

1. Input the title of your sequence.
2. Input the sequence of the emGFP gene. The line before the emGFP sequence must contain >Name.of.Sequence (no spaces).
3. Skip a line and input the forward sequencing reaction, preceded by >Name.of.Sequence.
4. Click Align.

Clustal W gives you a scores table indicating the pairwise alignment similarity score (out of 100). More importantly, it provides a DNA alignment. Residues that are identical in the two sequences marked with a *. The alignment extends past the end of the emGFP gene and continues to sequence the vector as well.


Analyzing the reverse sequencing read


The reverse sequencing read is the reverse complement of the emGFP sequence because it sequenced the complementary DNA strand of the double helix. To line it up with the emGFP gene, we must first reverse the sequence.

1. Go to the Sequence Manipulation Suite [4]
2. Make sure that you are on the Reverse Complement page and input your reverse sequencing read.
3. Click Submit.
4. Cut and paste this sequence into Clustal W, preceded by >Name.of.Sequence.

The result should show the emGFP gene aligned with both the forward and reverse sequencing reads. At any nucleotide position, if your forward and reverse reads do not agree, one of the sequences is probably HIGHER quality than the other at every individual discrepant base (it’s more likely the ends at the beginning of the sequencing read are more reliable than at the end of the sequencing read. A mutation is only recorded if the forward and reverse reads agree with each other and disagree with the building block sequence.


Calculating the error rate


We would like to know the overall error rate for creation of our building blocks (we call this value ) since this information will help us to determine the efficiency of our method and protocols. The error rate can be calculated as follows:

 = (Total # mutations found)/(Total # nucleotides sequenced that are not vector sequence). For example: if you found 13 mutations in 6 clones of a 750 bp BB, then =13/(6*750) = 0.002

The probability of a clone being perfect (pc) is pc=e-L.