20.109(S15):AIV detection assay and analysis (Day7): Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(37 intermediate revisions by 2 users not shown)
Line 33: Line 33:
#*Label 8 microcentrifuge tubes according to the samples you will prepare for RRT-PCR.
#*Label 8 microcentrifuge tubes according to the samples you will prepare for RRT-PCR.
#*Obtain the cDNA aliquots from the front lab bench (sample #1, #2, #3).
#*Obtain the cDNA aliquots from the front lab bench (sample #1, #2, #3).
#*Each reaction should contain 4 μL of cDNA, 2 μL of your diluted primer solution, and water for a total volume of 12.5 μL.  For each sample/primer set combination, prepare enough for 2.5 reactions.
#*Each reaction should contain 4 μL of cDNA (at 6.25 ng/uL), 2 μL of your diluted primer solution, and water for a total volume of 12.5 μL.  For each sample/primer set combination, prepare enough for 2.5 reactions.
#*At the front bench, add the appropriate volume of 2X Power SYBR such that the final concentration in your tubes is 1X.  Ask the teaching faculty if you are unsure on how much to add.  
#Add your reactions to the plate on the front bench according to the map.  You will add 12.5 μL to each well.
#Add your reactions to the plate on the front bench according to the map.  You will add 25 μL to each well.
#Once everyone has added their reactions, we will add 12.5 μL of the 2X Syber Green master mix to each well.


For your reference, the RRT-PCR conditions will be those shown below:
For your reference, the PCR cycling conditions will be those shown below:


<center>
<center>
Line 63: Line 63:
Next time you will be asked to quickly analyze ''a lot'' of sequencing data. Today you'll spend a good portion of the class practicing this analysis so that you can complete it during the class period on Thursday or Friday. '''This is very important so that everyone can have access to all of the 16S rRNA sequencing data for their final report.''' Please try to work through this entire protocol today so that you are familiar with the steps you'll complete on a larger scale next time.
Next time you will be asked to quickly analyze ''a lot'' of sequencing data. Today you'll spend a good portion of the class practicing this analysis so that you can complete it during the class period on Thursday or Friday. '''This is very important so that everyone can have access to all of the 16S rRNA sequencing data for their final report.''' Please try to work through this entire protocol today so that you are familiar with the steps you'll complete on a larger scale next time.


Note: much of what is written below will also appear in the protocol for M1D8. For today's practice analysis we will use data obtained in S14 for three birds that are being repeated this year. Those birds are numbers 312, 290, and 274. Remember that you can find the characteristics of these birds on the M1D1 Talk page.
Note: For today's practice analysis we will use data obtained in S14 for three birds that are being repeated this year. Those birds are numbers 312, 290, and 274. Remember that you can find the characteristics of these birds on the M1D1 Talk page.


====Part A: Understand possible insert orientations within vector====
====Part A: Understand possible insert orientations within vector====
Line 77: Line 77:


#The data from Genewiz is available at the company website, [http://genewiz.com linked here].  
#The data from Genewiz is available at the company website, [http://genewiz.com linked here].  
#Choose the "Login" link and then use "astachow@mit.edu" and "be20109" to log in.  
#Choose the "Login" link and then use "astachow@mit.edu" and "be20109" to log in.
#At the bottom right should be a section called ''Recent Result''s. Click on ''More'' to expand it, and then click the link labeled with the tracking number 10-256413247.
#*<font color = red> On M1D8 you will use "nllyell@mit.edu" and "be20109" to get your sequencing data.</font color>
#*This order was placed on 3/3/2014.
#*<font color = red> On M1D8 you will also need the information contained within [[Media:AllBirdsS2015.xlsx| '''this spreadsheet''']]. First, start with the sequencing data that indicates it is a good read. Analyze 1/2 of those samples, keeping track of which ones you completed. As you proceed, cross off the samples that you have analyzed on the spreadsheet at the front of the lab. '''Aim to analyze a total of 9 sequences --- if you finish all of the sequences for your bird, be a good colleague and help out your peers.'''</font color>
#Find the sample named ''xxxxx'' (completed using the T7 primer) and its corresponding sample ''xxxx'' (completed using the M13 primer).
#At the bottom right should be a section called ''Recent Result''s. Click on ''More'' to expand it, and then search for sequencing completed between 2/2/14 and 3/2/14. Click the link labeled with the tracking number 10-256046007.
#Find the sample named ''Pnk-1-M13-F'' (completed using the M13 primer) and its corresponding sample ''Pnk-1-T7-R'' (completed using the T7 primer).  
#The quickest way to start working with a particular sequence is to follow the "View" link under the ''Seq File'' heading. For ambiguous data, you may want to look directly at the ''Trace File'' as well.
#The quickest way to start working with a particular sequence is to follow the "View" link under the ''Seq File'' heading. For ambiguous data, you may want to look directly at the ''Trace File'' as well.


Line 87: Line 88:
#Begin by downloading [[Media:PCRBlunt_20109.ape| '''this file''']], which contains the DNA sequence of the vector we are using in GenBank format. Open the file in ApE (''A plasmid Editor'', created by M. Wayne Davis at the University of Utah), which is found on your desktop. Three items of interest are highlighted: the forward priming site, the reverse priming site and the two basepairs between which your sequence should be inserted.
#Begin by downloading [[Media:PCRBlunt_20109.ape| '''this file''']], which contains the DNA sequence of the vector we are using in GenBank format. Open the file in ApE (''A plasmid Editor'', created by M. Wayne Davis at the University of Utah), which is found on your desktop. Three items of interest are highlighted: the forward priming site, the reverse priming site and the two basepairs between which your sequence should be inserted.
#*If you are using your own computer, you may download ApE for [http://biologylabs.utah.edu/jorgensen/wayned/ape/ free].   
#*If you are using your own computer, you may download ApE for [http://biologylabs.utah.edu/jorgensen/wayned/ape/ free].   
#Paste the forward sequence of the bird 312 reaction into a new ApE file. Locate where the vector ends and the insert begins; trim away the vector.
#Paste the forward sequence of the Pnk-1-M13-F reaction into a new ApE file. Locate where the vector ends and the insert begins; trim away the vector.
#*It may be easiest to find the insert by doing ''Edit'' &rarr; ''Find'' (or Apple-F) using the base pairs right before the insert should begin.
#*It may be easiest to find the insert by doing ''Edit'' &rarr; ''Find'' (or Apple-F) using the base pairs right before the insert should begin.
#Paste the reverse sequence of your first candidate into yet another ApE file. Immediately use ''Edit'' &rarr; ''Reverse Complement'' to adjust the sequence, and again trim away the vector.
#Paste the reverse sequence (Pnk-1-T7-R) into yet another ApE file. Immediately use ''Edit'' &rarr; ''Reverse Complement'' to adjust the sequence, and again trim away the vector.
#*''Why is it more convenient to work with the reverse complement when sequencing from the reverse direction?''
#*''Why is it more convenient to work with the reverse complement when sequencing from the reverse direction?''
#In ApE, use ''Tools'' &rarr; ''Align Sequence'' to find where the forward and reverse sequences overlap. Combine them into one sequence with no repeated parts; where both forward and reverse sequence have coverage of the gene, choose whatever combination has the fewest unknown based, or Ns (ideally none!).  
#In ApE, use ''Tools'' &rarr; ''Align Sequence'' to find where the forward and reverse sequences overlap. Combine them into one sequence with no repeated parts; where both forward and reverse sequence have coverage of the gene, choose whatever combination has the fewest unknown bases, or Ns (ideally none!).  
#*You may find it easiest to print out the alignment and mark up the hardcopy in order to choose where to switch from using forward to using reverse sequence. Let the base-pair numbers be your guides.
#*You may find it easiest to print out the alignment and mark up the hardcopy in order to choose where to switch from using forward to using reverse sequence. Let the base-pair numbers be your guides.
#*If you haven't already, you can map our printer by adding sleaterkinney.mit.edu to your printer preferences.
#*If you haven't already, you can map our printer by adding sleaterkinney.mit.edu to your printer preferences.
Line 98: Line 99:
#Finally, depending on the orientation of your insert, you may want to reverse complement the entire sequence. Use the original sequences of the forward and reverse 16S primers to guide your decision.  
#Finally, depending on the orientation of your insert, you may want to reverse complement the entire sequence. Use the original sequences of the forward and reverse 16S primers to guide your decision.  
#*<font color=FF33FF>'''It is important for subsequent alignment that all sequences are 5' to 3' (begin with AGA).'''</font color>
#*<font color=FF33FF>'''It is important for subsequent alignment that all sequences are 5' to 3' (begin with AGA).'''</font color>
#You must now save each sequence in .txt format. Please copy-paste the sequence into a program such as TextEdit, choose ''File'' &rarr; ''Save'', and in the pulldown menu select ''Plain Text''.
#You must now save the sequence in .txt format. Please copy-paste the sequence into a program such as TextEdit, choose ''File'' &rarr; ''Save'', and in the pulldown menu select ''Plain Text''.


====Part D: Identify species from sequences====
====Part D: Identify species from sequences====
Line 112: Line 113:
# When a particular clone is very closely matched to two different species, you might choose to define it at a higher order, such as genus or family. When a particular clone is not well-matched to any known species (perhaps representing an unidentified or undocumented species), you might also choose to define it at a higher order when submitting this information in the phylogenetics program.
# When a particular clone is very closely matched to two different species, you might choose to define it at a higher order, such as genus or family. When a particular clone is not well-matched to any known species (perhaps representing an unidentified or undocumented species), you might also choose to define it at a higher order when submitting this information in the phylogenetics program.
# Be sure to rename the Excel file according to your your section day, team color, and sample ID number.  
# Be sure to rename the Excel file according to your your section day, team color, and sample ID number.  
# Please post all of your .txt files (up to 8 per person) and also your Excel file to the table on today's Talk page when you have finished.
# Please post all of your .txt files (up to 8 per person) and also your Excel file to the table on the M1D8 Talk page when you have finished.
#*<font color = red>'''This MUST be completed by the end of class on M1D8, so make sure you feel comfortable with this exercise before leaving class today.'''</font color>
#*<font color = blue> At this step it is convenient to rename your .txt files to reflect the species of bacteria that you found. Please follow this example: "Klebsiella.oxytoca.TRBlu.Bird#.4." '''You MUST use this naming convention so that the classes Unifrac analysis will work later.'''
#*Please use the following 3-letter abbreviations for your colors: Red, Org, Ylw, Grn, Blu, Pnk, Prp, Sil, Wht.
</font color>
#'''Complete this step with all of your sequences (from one bird) before continuing to MEGA.''' If you are analyzing more than one bird, you'll need to do a MEGA alignment for each bird.
 
====Part E: Align sequences and construct tree====
 
For this next part you will use freely available software called Molecular Evolutionary Genetics analysis, or MEGA. Feel free to read additional information about this software at the [http://www.megasoftware.net MEGA website]. What you need should already be downloaded on your laboratory computers, or you can download onto your personal computers if you wish.
*For today you will use .txt files that were generated by your 20.109 peers during Spring 2014. You can find all of the .txt files [https://www.dropbox.com/sh/t7dm0752e5r21k5/AADjAxgDxj5xx0rvj-CHMs-ja?dl=0 at this link]. Download all of these .txt files to a folder somewhere on your computer. Next time you will use the .txt files that you generate from your own sequencing data to build phylogenetic trees. After the classwide data is available you will also align all of the data to facilitate Unifrac analysis.
#Open MEGA. In the upper left corner, click on the icon labeled ''Align'', and choose ''Edit/Build Alignment'' from the pulldown menu. This selection should open the Alignment Explorer. When you are prompted, choose "DNA" alignment of course.
#Under ''Edit'', choose ''Insert Sequence from File'' and select all of the .txt files. They should now appear in the explorer.
#When you have input all the sequences, choose ''Edit'' &rarr; ''Select All'', followed by ''Alignment'' &rarr; ''Align by Clustal-W''.
#Now choose ''Data'' &rarr; ''Save Session'' and name the alignment with your group, day, and bird name. 
#Under ''Data'', choose ''Phylogenetic Analysis''.
#*''When prompted, should you answer that the DNA is protein-coding or not protein-coding?''
#Now leave Alignment Explorer and go back to the original MEGA window.
#From the ''Phylogeny'' icon pulldown menu, select ''Construct/Test Neighbor-Joining Tree''. To proceed, click on ''Compute''.
#Finally, choose ''File'' &rarr; ''Export current tree (Newick)'' and then choose ''Image'' &rarr; ''Save as PDF file'' to document your tree. Save the tree using your group, day, and bird name as before.
#Compare your tree to [[Media:S14_data2.pdf |this tree]] to make sure you've correctly followed the protocol thus far. (This step was specific for M1D7.)
 
====Part F: Determine if there are differences between birds using Fast Unifrac====
 
Now for the quantitative analysis! This is the fun part, but because we use freeware sometimes the process of analyzing the data can seem a bit tedious. Hang in there, the information you gain from completing the Unifrac analysis will allow you to finally determine if geography and sex contribute to differences in your bird's microbiome.
 
*We will use an online Unifrac calculator to compare birds 312, 290, and 274 using the S14 data today. For your final report, you'll want to use this tool to compare birds to determine if sample collection location (geography) or sex contribute to differences in the microbiome. Therefore, you may want to use ''all'' the data generated by the class once everyone has contributed their .txt and alignment (.mas) files.
 
#Go to the [http://unifrac.colorado.edu/ University of Colorado] Fast Unifrac website.
#The first thing you'll want to do is create an User account. Go to the ''User'' pulldown menu in the upper right hand corner of the screen. Register for an account and then make sure you are logged in. The Galaxy site will save your workspace so you can come back to your analysis at any time.
#In order to complete the Fast Unifrac analysis, you will need three data files:
#*Your tree file that was exported in Newick format. (S14_data.nwk)
#*A text file that contains a list of your sequence names and the sample that they came from. We'll call this the ID file.
#*A text file that contains the characteristics that you are evaluating -- a category mapping file. For today all of the bird samples were collected at Carson Beach, but there are two females and one male bird.
#Let's create your ID file:
#*Go back to MEGA. In the main MEGA window, double click on the sequence alignment icon -- this is the icon with the T...A... button.
#*Once your sequence alignment opens. Click on the third button from the left, it has an arrow pointing at XL. This will export your sequence alignment to an Excel file. Name it whatever you'd like and keep track of where it is being saved.
#*Open that Excel file.
#*Copy the names of your sequences (from column A) and paste them into a new Excel file. '''MEGA automatically puts a space between the strain name -- please put an underscore in place of that space.''' Now you can close the Excel file with your sequencing data.
#**<font color = red>Update after T/R section: Please make '''two''' ID files, one with spaces and one with underscores. See below for more information.</font color>
#*In column B of the new Excel file, you want to indicate what bird sample each sequence came from.
#*Now save your Excel file as a tab-delimited .txt file. Name it S14_IDfile.txt. It should look like [[Media:S14_IDfile3.txt | this file]] when you open it.
#The category mapping file lists the names of your samples, their characteristics, and a Description of the bird. Download this [[Media:S14_CatMap.txt |category mapping file]]. '''Make sure you understand how to create this file before you leave today.'''
#Upload your data by clicking on the ''Get Data'' --> ''Upload data'' links on the left hand side of the Galaxy homepage.
#*Upload your S14_data.nwk file, your ID file, and the S14_CatMap.txt file that you just downloaded. Note, you'll have to go back to ''Upload data'' between each upload. Everything is looking good if you can see green boxes on the right side of your screen.
#Now click on ''FastUniFrac'' on the left side of the screen.
#You should work through the ''Cluster Samples'', the ''Jackknife Cluster Samples'', the ''Sample Distance Matrix'', and ''PCoA'' activities using the ID file with ''spaces''. The remaining tests: ''P test significance'' and the ''Unifrac significance'' require the ID file with ''underscores''. Why? Who knows. Make sure you read the text associated with each test so that you understand what each test is telling you. <font color = red> You are not required to report the results of every test -- only use those that help you to answer your research question or that you find most interesting. </font color>
#*When you run the ''Jackknife Cluster Samples'' make sure you decrease the number of samples you keep to 5.
#**Think about what is happening with this test. What do the different colors mean?


===Part 3: AIV screening analysis===
===Part 3: AIV screening analysis===
Line 127: Line 174:
#*Record the C<sub>T</sub> values for the samples you tested with the Runstadler lab primers.  Are the values similar for the three samples?  Can you hypothesize as to which sample(s) were positive and/or negative for the presence of AIV?
#*Record the C<sub>T</sub> values for the samples you tested with the Runstadler lab primers.  Are the values similar for the three samples?  Can you hypothesize as to which sample(s) were positive and/or negative for the presence of AIV?
#*Now, compare the C<sub>T</sub> values for your primers and the Runstadler lab primers.  Can you make any conclusions concerning the relative sensitivities?
#*Now, compare the C<sub>T</sub> values for your primers and the Runstadler lab primers.  Can you make any conclusions concerning the relative sensitivities?
#To report the results of your AIV detection assay, you will calculate


==Homework==
==Homework==

Latest revision as of 11:37, 5 March 2015


20.109(S15): Laboratory Fundamentals of Biological Engineering

Home        People        Schedule Spring 2015        Assignments        Lab Basics        OWW Basics       
DNA Engineering        System Engineering        Biomaterials Engineering              

Introduction

Real-time reverse transcription-polymerase chain reaction (RRT-PCR) allows researchers to monitor the results of PCR as amplification is occurring (this technique is also referred to as quantitative or real-time PCR). During RRT-PCR data are collected throughout the amplification process using a fluorescent dye. The fluorescent dye is highly specific for double-stranded DNA and when bound to DNA molecules the fluorescence intensity increases proportionate to the increase in double-stranded product. In contrast, the data for traditional PCR are simply observed as a band on a gel (remember back to M1D5).

The fluorescent dye binds to double-stranded DNA during the cycles of PCR. At the annealing temperature the primer (blue arrow) binds to the template (black line). During an incubation at the extension temperature the new copy of DNA (orange dashed arrow) is sythesized by the polymerase enzyme. The inactive fluorescent dye molecules present in the reaction (grey stars) bind to the newly generated double-stranded DNA and become activated (green stars). To eliminate clutter, the basepairs between the DNA strands were omitted. An animation of this process is linked here.

For the purpose of your AIV screening assay, your RRT-PCR data will be used to test the sensitivity of the primers you designed on Day 2 of this module. Primers with high sensitivity can better detect AIV from flu-positive bird samples. Why are primers with high sensitivity better? The sensitivity of the primer determines its ability to identify and bind the target sequence. If primer binding is enhanced then amplification is improved and the presence of AIV is more likely to be detected. In addition, primers with lower sensitivity can give false positives and hinder the development of models that aid researchers in understanding AIV abundance and epizootiology in wild birds.

Example amplification curve. These data were collected by Sp14 20.109ers!
Example melt curve. These data were collected by Sp14 20.109ers!

To compare the sensitivity of your primers to the sensitivity of those currently used in the Runstadler lab you will examine the CT values from your RRT-PCR assay. The CT values are displayed as an amplification curve following RRT-PCR (these values are also given numerically). The initial cycles measure very little fluorescence due to low amounts of double-stranded DNA and are used to establish the inherent background fluorescence. As double-stranded product is produced, fluorescence is measured and the curve appears linear. This linear portion of the curve represents the exponential phase of PCR. Throughout the exponential phase, the curve should be smooth. Sharp points may be due to errors in reaction preparation or failures in the machine used to measure fluorescence. As mentioned previously, the first cycle in which the fluorescence measurement is above background is the CT. During the later cycles the curve shows minimal increases in fluorescence due the depletion of reagents.

Following the RRT-PCR amplification measurements, a melt curve is completed. Melt curves assess the dissociation of double-stranded DNA while the sample is heated. As the temperature is increased, double-stranded DNA ‘melts’ as the strands dissociate. As discussed above, the fluorescent dye used in RRT-PCR associates with double-stranded DNA and fluorescence measurements will decrease as the temperature increases. In RRT-PCR, the melt curve is used to confirm that a single amplification product was generated during the reaction. If additional products were present, the melt curve would presumably show additional peaks. Why might this be true? Can you think of a scenario where two different products would produce a single peak in a melt curve?


Protocols

Part 1: Prepare primers and setup screen for AIV in bird samples

  1. Calculate the amount of water needed for each primer (forward and reverse, separately) to give a concentration of 100 μM.
  2. Touch-spin your primers, resuspend each in the appropriate volume of sterile water, vortex, and touch-spin again.
  3. Now prepare a dilution from your archival stock. Prepare 100 μL of a solution that has each primer present at 20 μM.
    • Try the calculation on your own first. If you get stuck ask the teaching faculty for help.
    • Be sure to change tips between primers!
  4. Return the rest of your primers, plus your primer specification sheets, up front.
  5. With your primers you will screen three bird samples for AIV. You will also use the Avian Influenza A Matrix Forward and Reverse primers to screen the samples. Lastly, you will setup 'no template control' reactions with your primer set and the Runstadler Lab primer set.
    • Label 8 microcentrifuge tubes according to the samples you will prepare for RRT-PCR.
    • Obtain the cDNA aliquots from the front lab bench (sample #1, #2, #3).
    • Each reaction should contain 4 μL of cDNA (at 6.25 ng/uL), 2 μL of your diluted primer solution, and water for a total volume of 12.5 μL. For each sample/primer set combination, prepare enough for 2.5 reactions.
  6. Add your reactions to the plate on the front bench according to the map. You will add 12.5 μL to each well.
  7. Once everyone has added their reactions, we will add 12.5 μL of the 2X Syber Green master mix to each well.

For your reference, the PCR cycling conditions will be those shown below:

Stage Cycles Details
1 1 95°C for 10 min
2 40 95°C for 15 sec
60°C for 30 sec
72°C for 30 sec
3 1 4°C hold

Part 2: Bird microbiome practice analysis

Next time you will be asked to quickly analyze a lot of sequencing data. Today you'll spend a good portion of the class practicing this analysis so that you can complete it during the class period on Thursday or Friday. This is very important so that everyone can have access to all of the 16S rRNA sequencing data for their final report. Please try to work through this entire protocol today so that you are familiar with the steps you'll complete on a larger scale next time.

Note: For today's practice analysis we will use data obtained in S14 for three birds that are being repeated this year. Those birds are numbers 312, 290, and 274. Remember that you can find the characteristics of these birds on the M1D1 Talk page.

Part A: Understand possible insert orientations within vector

  1. Recall from Day 1 the sequences of the forward and reverse primers used to broadly amplify bacterial 16S rRNA gene segments:
    • Forward: 5' AGAGTTTGATCCTGGCTCAG
    • Reverse: 5' ACGGGCGGTGTGTACA
  2. Based on these sequences, you might expect that your insert will always begin with "AGA" and always end with "CGT." (Draw a picture to make sure you understand why the last three bases are as they are written here.)
  3. However, in blunt-end cloning, the insert – here our PCR product – can face in either orientation. Take a moment to figure out what other basepairs you might expect to see at the beginning or end of your sequenced insert.
    • The kind of cloning we are doing is called non-directional cloning. Directional cloning is possible when, for example, two different restriction enzymes are used to create overhangs that are complementary to the vector but not to each other.

Part B: How to download a sequence

  1. The data from Genewiz is available at the company website, linked here.
  2. Choose the "Login" link and then use "astachow@mit.edu" and "be20109" to log in.
    • On M1D8 you will use "nllyell@mit.edu" and "be20109" to get your sequencing data.
    • On M1D8 you will also need the information contained within this spreadsheet. First, start with the sequencing data that indicates it is a good read. Analyze 1/2 of those samples, keeping track of which ones you completed. As you proceed, cross off the samples that you have analyzed on the spreadsheet at the front of the lab. Aim to analyze a total of 9 sequences --- if you finish all of the sequences for your bird, be a good colleague and help out your peers.
  3. At the bottom right should be a section called Recent Results. Click on More to expand it, and then search for sequencing completed between 2/2/14 and 3/2/14. Click the link labeled with the tracking number 10-256046007.
  4. Find the sample named Pnk-1-M13-F (completed using the M13 primer) and its corresponding sample Pnk-1-T7-R (completed using the T7 primer).
  5. The quickest way to start working with a particular sequence is to follow the "View" link under the Seq File heading. For ambiguous data, you may want to look directly at the Trace File as well.

Part C: Prepare sequences for analysis

  1. Begin by downloading this file, which contains the DNA sequence of the vector we are using in GenBank format. Open the file in ApE (A plasmid Editor, created by M. Wayne Davis at the University of Utah), which is found on your desktop. Three items of interest are highlighted: the forward priming site, the reverse priming site and the two basepairs between which your sequence should be inserted.
    • If you are using your own computer, you may download ApE for free.
  2. Paste the forward sequence of the Pnk-1-M13-F reaction into a new ApE file. Locate where the vector ends and the insert begins; trim away the vector.
    • It may be easiest to find the insert by doing EditFind (or Apple-F) using the base pairs right before the insert should begin.
  3. Paste the reverse sequence (Pnk-1-T7-R) into yet another ApE file. Immediately use EditReverse Complement to adjust the sequence, and again trim away the vector.
    • Why is it more convenient to work with the reverse complement when sequencing from the reverse direction?
  4. In ApE, use ToolsAlign Sequence to find where the forward and reverse sequences overlap. Combine them into one sequence with no repeated parts; where both forward and reverse sequence have coverage of the gene, choose whatever combination has the fewest unknown bases, or Ns (ideally none!).
    • You may find it easiest to print out the alignment and mark up the hardcopy in order to choose where to switch from using forward to using reverse sequence. Let the base-pair numbers be your guides.
    • If you haven't already, you can map our printer by adding sleaterkinney.mit.edu to your printer preferences.
    • Be aware that that long stretches of the same base (particularly Gs and Cs) are prone to error; for example, the string "CCC" may be mis-sequenced as "CC" or "CCCC."
  5. Save this sequence as a new file called YourTeamDayYourTeamColorYourSampleIDC"Candidate Number (e.g., WFPurple737C1).
  6. Finally, depending on the orientation of your insert, you may want to reverse complement the entire sequence. Use the original sequences of the forward and reverse 16S primers to guide your decision.
    • It is important for subsequent alignment that all sequences are 5' to 3' (begin with AGA).
  7. You must now save the sequence in .txt format. Please copy-paste the sequence into a program such as TextEdit, choose FileSave, and in the pulldown menu select Plain Text.

Part D: Identify species from sequences

  1. The "nucleotide BLAST" alignment program can be accessed through the NCBI BLAST page or directly from this link. When you have your own sequencing data you will follow the steps below for each clone, one at a time.
  2. Paste the sequence text that you prepared above into the "Query" box. If there were ambiguous areas of your sequencing results, these will be listed as "N" rather than "A" "T" "G" or "C" and it's fine to include Ns in the query.
  3. Under Choose Search Set, select "16S ribosomal RNA sequences (Bacteria and Archaea)" from the Database pulldown menu.
  4. Click on the BLAST button. Matches will be shown by vertical lines between the aligned sequences, while mismatches and gaps will be shown with a dash.
  5. Because this gene is highly conserved, a number of species should come up as highly matched. However, one should (usually) be a best choice. Think carefully here rather than blindly accepting the top species listed.
    • For example, if a partial sequence for species A comes up as the top choice, a full sequence for species B comes up as the second choice, and a full sequence for species A is the third most closely matched choice, is species A or B truly closer to your original sequence?
  6. When you have decided which is best, use the linked template to document this strain and its accession number, its associated max score, query coverage, max identity, gaps, mismatches, and full taxonomy; write down these parameters for the second most closely matched species as well. The taxonomy information can be found by clicking on the accession number and looking under the "organism" heading.
    • Taxonomy order is kingdom, phylum, class, order, family, genus, and species.
  7. When a particular clone is very closely matched to two different species, you might choose to define it at a higher order, such as genus or family. When a particular clone is not well-matched to any known species (perhaps representing an unidentified or undocumented species), you might also choose to define it at a higher order when submitting this information in the phylogenetics program.
  8. Be sure to rename the Excel file according to your your section day, team color, and sample ID number.
  9. Please post all of your .txt files (up to 8 per person) and also your Excel file to the table on the M1D8 Talk page when you have finished.
    • At this step it is convenient to rename your .txt files to reflect the species of bacteria that you found. Please follow this example: "Klebsiella.oxytoca.TRBlu.Bird#.4." You MUST use this naming convention so that the classes Unifrac analysis will work later.
    • Please use the following 3-letter abbreviations for your colors: Red, Org, Ylw, Grn, Blu, Pnk, Prp, Sil, Wht.

  1. Complete this step with all of your sequences (from one bird) before continuing to MEGA. If you are analyzing more than one bird, you'll need to do a MEGA alignment for each bird.

Part E: Align sequences and construct tree

For this next part you will use freely available software called Molecular Evolutionary Genetics analysis, or MEGA. Feel free to read additional information about this software at the MEGA website. What you need should already be downloaded on your laboratory computers, or you can download onto your personal computers if you wish.

  • For today you will use .txt files that were generated by your 20.109 peers during Spring 2014. You can find all of the .txt files at this link. Download all of these .txt files to a folder somewhere on your computer. Next time you will use the .txt files that you generate from your own sequencing data to build phylogenetic trees. After the classwide data is available you will also align all of the data to facilitate Unifrac analysis.
  1. Open MEGA. In the upper left corner, click on the icon labeled Align, and choose Edit/Build Alignment from the pulldown menu. This selection should open the Alignment Explorer. When you are prompted, choose "DNA" alignment of course.
  2. Under Edit, choose Insert Sequence from File and select all of the .txt files. They should now appear in the explorer.
  3. When you have input all the sequences, choose EditSelect All, followed by AlignmentAlign by Clustal-W.
  4. Now choose DataSave Session and name the alignment with your group, day, and bird name.
  5. Under Data, choose Phylogenetic Analysis.
    • When prompted, should you answer that the DNA is protein-coding or not protein-coding?
  6. Now leave Alignment Explorer and go back to the original MEGA window.
  7. From the Phylogeny icon pulldown menu, select Construct/Test Neighbor-Joining Tree. To proceed, click on Compute.
  8. Finally, choose FileExport current tree (Newick) and then choose ImageSave as PDF file to document your tree. Save the tree using your group, day, and bird name as before.
  9. Compare your tree to this tree to make sure you've correctly followed the protocol thus far. (This step was specific for M1D7.)

Part F: Determine if there are differences between birds using Fast Unifrac

Now for the quantitative analysis! This is the fun part, but because we use freeware sometimes the process of analyzing the data can seem a bit tedious. Hang in there, the information you gain from completing the Unifrac analysis will allow you to finally determine if geography and sex contribute to differences in your bird's microbiome.

  • We will use an online Unifrac calculator to compare birds 312, 290, and 274 using the S14 data today. For your final report, you'll want to use this tool to compare birds to determine if sample collection location (geography) or sex contribute to differences in the microbiome. Therefore, you may want to use all the data generated by the class once everyone has contributed their .txt and alignment (.mas) files.
  1. Go to the University of Colorado Fast Unifrac website.
  2. The first thing you'll want to do is create an User account. Go to the User pulldown menu in the upper right hand corner of the screen. Register for an account and then make sure you are logged in. The Galaxy site will save your workspace so you can come back to your analysis at any time.
  3. In order to complete the Fast Unifrac analysis, you will need three data files:
    • Your tree file that was exported in Newick format. (S14_data.nwk)
    • A text file that contains a list of your sequence names and the sample that they came from. We'll call this the ID file.
    • A text file that contains the characteristics that you are evaluating -- a category mapping file. For today all of the bird samples were collected at Carson Beach, but there are two females and one male bird.
  4. Let's create your ID file:
    • Go back to MEGA. In the main MEGA window, double click on the sequence alignment icon -- this is the icon with the T...A... button.
    • Once your sequence alignment opens. Click on the third button from the left, it has an arrow pointing at XL. This will export your sequence alignment to an Excel file. Name it whatever you'd like and keep track of where it is being saved.
    • Open that Excel file.
    • Copy the names of your sequences (from column A) and paste them into a new Excel file. MEGA automatically puts a space between the strain name -- please put an underscore in place of that space. Now you can close the Excel file with your sequencing data.
      • Update after T/R section: Please make two ID files, one with spaces and one with underscores. See below for more information.
    • In column B of the new Excel file, you want to indicate what bird sample each sequence came from.
    • Now save your Excel file as a tab-delimited .txt file. Name it S14_IDfile.txt. It should look like this file when you open it.
  5. The category mapping file lists the names of your samples, their characteristics, and a Description of the bird. Download this category mapping file. Make sure you understand how to create this file before you leave today.
  6. Upload your data by clicking on the Get Data --> Upload data links on the left hand side of the Galaxy homepage.
    • Upload your S14_data.nwk file, your ID file, and the S14_CatMap.txt file that you just downloaded. Note, you'll have to go back to Upload data between each upload. Everything is looking good if you can see green boxes on the right side of your screen.
  7. Now click on FastUniFrac on the left side of the screen.
  8. You should work through the Cluster Samples, the Jackknife Cluster Samples, the Sample Distance Matrix, and PCoA activities using the ID file with spaces. The remaining tests: P test significance and the Unifrac significance require the ID file with underscores. Why? Who knows. Make sure you read the text associated with each test so that you understand what each test is telling you. You are not required to report the results of every test -- only use those that help you to answer your research question or that you find most interesting.
    • When you run the Jackknife Cluster Samples make sure you decrease the number of samples you keep to 5.
      • Think about what is happening with this test. What do the different colors mean?

Part 3: AIV screening analysis

  1. Before you examine your data, take a moment to think about what the CT values tell you about sensitivity.
    • Would you expect more sensitive primers to have a higher or lower CT value when compared to less sensitive primers? Why?
  2. View the amplification curves and melt curves and melt curves for your RRT-PCR data on the Talk page.
    • Briefly describe the appearance of your amplification curve. Do the samples cross the threshold line during the same cycle? Why or why not?
    • Briefly describe the appearance of your melt curve. Is there a single peak? Why or why not?
  3. Go to the Talk page to view the RRT-PCR CT data collected from your AIV screening assay.
    • Are CT values available for your ‘no transcript control’ sample? What does this say about the remaining data from your assay?
    • Record the CT values for the samples you tested with your primers. Are the values similar for the three samples? Can you hypothesize as to which sample(s) were positive and/or negative for the presence of AIV?
    • Record the CT values for the samples you tested with the Runstadler lab primers. Are the values similar for the three samples? Can you hypothesize as to which sample(s) were positive and/or negative for the presence of AIV?
    • Now, compare the CT values for your primers and the Runstadler lab primers. Can you make any conclusions concerning the relative sensitivities?

Homework

Due on M1D8

  1. Revise the draft of your methods section you submitted on M1D4 applying the feedback you received. In addition, include a write-up of the methods associated with the cloning you completed for this module.

Due on M2D3

  1. You will report your findings for the AIV screening portion of this module as a Primer Design Memo. Review the guidelines for this assignment and feel free to get an early start!

Reagent List

  • Your brains!

Navigation Links

Next Day: Phylogenetic analysis

Previous Day: Journal club I