Arking:JCAOligoTutorial6: Difference between revisions

Revision as of 16:38, 10 May 2007

Sequencing Analysis

So, you've made your basic or composite, and you think you've found a colony that contains the right product. You now want to confirm that it's right. You need to sequence to find out exactly what the sequence is of the thing in your tube. In general, sequencing is cheaper and better when outsourced to a company or core facility. So, you send out your sample, and they send you back a sequence file. In this part of the tutorial, we're going to go through how sequencing works and how to analyze the data they send you.
Before we start, you will need ApE (A Plasmid Editor) for this. So, if you haven't already done so, download it from http://www.biology.utah.edu/jorgensen/wayned/ape/. Replace the feature annotation database within ApE with an updated version. To do this, find directory in which you installed ApE on your computer and locate the file "Default_Features.txt". On my computer, it's: C:\Program Files\ApE\Accessory Files\Features. Replace this file with This Version. Additionally, download the program FinchTV from http://www.softpedia.com/get/Science-CAD/FinchTV.shtml so you can view chromatograms.

How sequencing works

What you really need to know

Sequencing starts with a sample plasmid DNA or PCR product and one DNA oligonucleotide. This is what you send to the sequencing facility. They do something called "cycle sequencing" to your sample, and email you back some data. You can expect between 400bp and 1000bp of "true" data that begins somewhere around 20-50 bp into the read. Where "good" and "bad" data starts and ends varies sample to sample, and we'll get into that later. The sequence you read corresponds to the region 3' of where your oligo anneals. So, you must pick the appropriate oligo to send the sequencers based on what region of the plasmid you are interested in.

Overview of the process

(From http:// www.biochem.arizona.edu/classes/bioc471/pages/Lecture21/AMG9.1a.gif)
So, the facility is going to run a reaction similar to a PCR on your sample and then take the products generated in that sample and load them into an instrument. The reaction is going to contain many little fragments of your sequencing, and the machine will separate them to single base pair resolution using capillary electrophoresis. Capillary electrophoresis is basically like the agarose gels we run in lab, but the gel is in a long narrow tube. It detects each little DNA as it comes off the column by fluorescence, and the spectrum of fluorescence (the "Chromatogram") can be interpretted by software as a string of A's, C's, T's and G's referred to as the "calls". They will send you both a text file of calls and a chormatogram file.

How the cycling reaction works

From http:// www.ejbiotechnology.info/content/vol1/issue1/full/3/bip/Fig1.gif The cycling reaction starts by denaturing your sample plasmid and annealing the oligo to its homolous sequence. The reaction contains essentially a PCR reaction (dNTPs, thermostable DNA polymerase, buffer), so the polymerase starts adding bases to the 3' end of the oligo. However, there are two additional components -- ddNTPs (dideoxynucleotides, in "A" of the figure) and dyes. The dye will make the synthesized products visible in the electrophoresis instrument. The ddNTPs are chain terminators. Because they lack a 3' hydroxyl group, whenever one of these gets incorporated into a growing DNA the synthesis cannot proceed further resulting in a truncated product. For each cycling reaction, one of the 4 ddNTPs is added. So, in the reaction with ddATP, the chains get termiated at every A, and so on for all 4 reactions. The "cycling" aspect of this is that the process of denaturing, annealing, and extending is repeated so that there is linear (but not exponential) amplification of the original plasmid template.

If we were to load these cycling reactions on a normal gel like we have in lab, you'd see something like the gel at left (from http:// www.cambio.co.uk/images/html_images/sequitherm_cycle1.gif). In this gel, you can see that for every vertical space on the gel there is one band from one of the lanes. You can therefore read off what base was present at each position.

In practice, since the sample is run by capillary electrophoresis, you end up with a chromatogram plotting fluorescence intensity versus time like this:

(From http:// site.hylabs.co.il/upload/infocenter/info_images/08062004105243@Sequencing-quality-.gif) That is an example of a chromatogram--the raw data from sequencing which you receive with every sample. The "calling" of the bases is just an algorithm's best interpretation of what each peak of that spectrum correponds to. Usually the calls are pretty accurate, but occasionally the quality of the data is poor and the calls are wrong. The remainder of this tutorial deal with how you interpret the sequencing results.

Interpretting a sequencing result

Go ahead and download the following 2 files:

 File:Jca387 ca998 2007-03-10 D02 005.txt (The calls)
 File:Jca387 ca998 2007-03-10 D02 005.ab1 (The chromatogram)

Open up jca387_ca998_2007-03-10_D02_005.txt in notepad, select all the text, and paste it into a window of ApE. Hit ctrl-K. This will search through the feature database and light up any features present in the sequence. Run your cursor over some of the colored text and take a look at what's in there. You've seen this plasmid before, it's pBca9145-Bca1089, the Biobricks version 2.0 RFP basic part. You should see RFP and the 4 Biobrick 2.0 restriction sites: EcoRI, BglII, BamHI, and XhoI.

Let's now compare this read to the sequence file you downloaded for the previous tutorials. If you can't find the file, here's the link. Open up JCASeq_pBca9145-Bca1089.str in a second window of ApE. Highlight all the sequence in the sequencing read, copy it, and search for that string of text in pBca9145-Bca1089.

Uh oh...what happened? (You should have gotten an error saying "No sequence found". Does this mean the plasmid is wrong? Um, no, not at all. In fact, this is par for the course. This is about as good as a read gets, and we know this plasmid is perfectly fine. So, what's up. Well, go ahead and launch the ab1 file into FinchTV and let's look at the raw data.

First of all, the read begins directly 3' to the spot where the oligo anneals. In this case, the oligo was ca998 (gtatcacgaggcagaatttcag), so the first few bases should have been "ataaaaaaaat". Clearly, though, the first 35 bases of this read are totally garbage. That's normal. An important take-home point from this is that if your oligo anneals closer than 50bp to sequence you need to read, you're probably not going to get the data you want. From around 35bp in to around 800bp, this read looks really nice. Go ahead and select bases 35 to 800 of the sequence file in ApE and see if they match pBca9145-Bca1089.

If you have any comments or want to report a potential error in the tutorial, please email me (Chris Anderson) at JCAnderson2167-at-gmail.com

@@ Line 30: / Line 30: @@
 Open up jca387_ca998_2007-03-10_D02_005.txt in notepad, select all the text, and paste it into a window of ApE.  Hit ctrl-K.  This will search through the feature database and light up any features present in the sequence.  Run your cursor over some of the colored text and take a look at what's in there.  You've seen this plasmid before, it's pBca9145-Bca1089, the Biobricks version 2.0 RFP basic part.  You should see RFP and the 4 Biobrick 2.0 restriction sites:  EcoRI, BglII, BamHI, and XhoI.
-Let's now compare this read to the sequence file you downloaded for the previous tutorials.  If you can't find the file, here's the [[ |link]].
+Let's now compare this read to the sequence file you downloaded for the previous tutorials.  If you can't find the file, here's the [[JCASeq_pBca9145-Bca1089.str |link]].  Open up JCASeq_pBca9145-Bca1089.str in a second window of ApE.  Highlight all the sequence in the sequencing read, copy it, and search for that string of text in pBca9145-Bca1089.
+Uh oh...what happened?  (You should have gotten an error saying "No sequence found".  Does this mean the plasmid is ''wrong''?  Um, no, not at all.  In fact, this is par for the course.  This is about as good as a read gets, and we know this plasmid is perfectly fine.  So, what's up.  Well, go ahead and launch the ab1 file into FinchTV and let's look at the raw data.
+First of all, the read begins directly 3' to the spot where the oligo anneals.  In this case, the oligo was ca998 (gtatcacgaggcagaatttcag), so the first few bases should have been "ataaaaaaaat".  Clearly, though, the first 35 bases of this read are totally garbage.  That's normal.  An important take-home point from this is that if your oligo anneals closer than 50bp to sequence you ''need'' to read, you're probably not going to get the data you want.  From around 35bp in to around 800bp, this read looks really nice.  Go ahead and select bases 35 to 800 of the sequence file in ApE and see if they match pBca9145-Bca1089.
 ----
 If you have any comments or want to report a potential error in the tutorial, please email me (Chris Anderson) at JCAnderson2167-at-gmail.com

Arking:JCAOligoTutorial6: Difference between revisions

Revision as of 16:38, 10 May 2007

Contents

Sequencing Analysis

How sequencing works

What you really need to know

Overview of the process

How the cycling reaction works

Interpretting a sequencing result

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools