BISC 219/2009: Mod 3 Analyzing the data on the putative transgenic and control plants

From OpenWetWare

Jump to: navigation, search
Wellesley College BISC 219 Genetics

Analyzing the data on the putative transgenic & control plants

Experimental questions for this series:

Did we successfully produce & clone transgenic tobacco plants through genetic engineering?
How does random insertion of T-DNA affect the transgenic plants?

To think about when analyzing the data:
What do the control plants’ results confirm?
Do you need both evidence for the presence of the gusA gene AND evidence of gene expression to confirm that genetically engineered plants are truly transgenic?
How might you account for possible discrepancies between the two experiments?
Do you see significant variation in gene expression among the transgenics in the enzyme activity assays? Was variability expected?
Are the two measurements of beta-glucuronidase activity similar for each plant?
If not, how do you account for small and large differences?
What are possible causes of false positives and false negatives in each of the activity assays?


The following directions are for "old" Excel (pre-2007) If you have Office 2007 or later, you will need to use a modified version of these directions.
There is a YouTube video of how to make a graph of a frequency distribution in new Excel for Mac at: [1]

Sorting Data:
Sort the data in Excel to make it easier to determine how well the two measures of gusA expression correlate with each other and with the structural evidence for the gusA gene. Later the sorted data will be easier to use to find the appropriate bin range for the bars in a histogram if you decide to show the data in that form.

To sort data: First copy and paste all the data you want to sort into a spreadsheet (Workbook is the word used in Excel for a spreadsheet). Remember that whatever you put on the far left column in the spreadsheet will be graphed on the x axis.

HINTS: Sort the control plants separately from the putative transgenic plants. Highlight all the columns of data on the transgenics that you want to sort. You should, eventually, (when we have gathered all the data) have 4 columns for this experiment in this order, left to right: plant #, enzyme activity from histochemical assay (in the form of numbers 0 ,1, 2,or 3), enzyme activity from PNPG breakdown, presence of gusA gene by pcr (+ or -). If you only have the activity data at this point leave the last data column empty and re-sort when you have it all.

  1. Open the Data menu on the Excel toolbar and choose Sort
  2. If the plant numbers aren’t sequential, first choose plant # under the SORT BY menu option.
  3. Copy and paste the data sorted by plant # into another part of your spreadsheet and highlight all four columns of it.
  4. Use the drop-down menu under SORT BY to choose the next parameter to make the primary sort.
  5. Choose either the histochemical or the PNPG breakdown assay. Then choose descending as the way to sort the data.
  6. Choose the other activity assay to be the secondary sorting criterion and descending. Do not choose plant numbers at all as a criterion for sorting.
  7. Copy and paste the newly sorted data a different area of the spreadsheet.
  8. Repeat the sort, reversing the order of importance of the 2 activity assays and copy and paste the sorted data into another area of the spreadsheet.
  9. When you have this data, sort one more time by the pcr evidence for the gene as the primary characteristic, histochemical activity as the secondary, and PNPG breakdown as tertiary (if you are allowed to name a tertiary criterion).

Repeat this sorting schema for the control data.

Questions to ask of your data:

  1. In the sorted data do you find clear evidence for your major points?
  2. Which data show that the control plants are not transgenic while the plants transformed using the vector strain of Agrobacterium tumefaciens strain LBA4404/pBI121 are?
  3. If there is, as expected, significant variability in gene expression among the transgenics, how can you best show that point from these data?
  4. If the two measurements of enzyme activity don’t give similar results in the same plant, what does that mean in terms of your confidence in the ability of each of the assays to measure beta-glucuronidase activity accurately? Do you have more confidence in one assay over the other? Why or why not?

How should you use and display these data to make your point(s)?
There are many options and no right or wrong answer, although some choices are better than others. Your goal is clarity so your reader can easily see how your data allows you to make conclusions. There are directions below for making a histogram or frequency distribution. You are not required to display your data in this format.

Making a Frequency Distribution, also called a Histogram Frequency distributions are used when you want to know the most common answer in a data set that measures the same thing in many individuals (mode) AND how common the measurements are that are outside the mode. Put another way, a histogram shows the distribution of the individuals that make up the range in the data instead of concentrating exclusively or heavily on the mean. This type of graph would probably be a good way to show some of the major points in this experiment.

The basic procedure for preparing a frequency histogram is to determine the range of the data (amount of variation or the difference between the highest and lowest values), sort the data into bins (bars on the histogram representing segments of the data), fill the bins with your data, and create a column graph or pie where the data in each bin comprise a bar or piece of the pie.

To use EXCEL to create a histogram:
First make sure that the Data Analysis Package is installed: Go to the “Tools” menu. If “Data Analysis” is one of the choices then the package is installed. If not then you must install it.

Installing Data Analysis Package. (needed for descriptive statistics, or for making a histogram).
Open the “Tools” menu. Select “Add-in”. A new menu will list several choices. Click in the box beside “Analysis Tool Pack”, and then click OK. Go back to the “Tools” menu and “Data Analysis” should be one of your choices. Open it.
If the Data Analysis package is ready to use:

  1. Click on the Data Analysis option in the Tools Menu of Excel and choose Histogram from the drop-down menu.
  2. For input range, click on the entry box and then select all of the data from one column of your spreadsheet (NOT the heading) by holding down & dragging the mouse through the data until it is all enclosed in a box with dotted lines. The input range entry should show the location of those cells in the spreadsheet.
  3. Bin range. If you leave the bin range entry box blank Excel will determine the bin size for you by dividing the range (difference between the highest and lowest measurement) equally, BUT a better option is to figure your bins out manually. To see the difference between the default and manual selection, leave the bin range blank and finish the histogram by clicking on OUTPUT RANGE and then on an empty area of the spreadsheet. When the bins and frequency are displayed there you will probably have more bins than you need and the cut off limit for each bin is unlikely to be a simple number.


Instead of letting Excel determine the bins, follow these directions for creating your own bin range column using your own sense of the data. The number of bins into which you sort the data directs the appearance of the histogram and is important in the data analysis. The number of bins is up to you. The appropriate number depends upon the amount of data, the range, and the point you want your histogram to make.

The bin range column must be in the form of a list of numbers with no hyphens. Creating a bin range column for the histochemical gus activity assay is easy. Convert the symbols -, +, ++, +++ into numbers: 0, 1, 2, 3 for all the data in the data column. Make another column somewhere else in your spreadsheet that lists these numbers 0, 1, 2, 3 in that order. That list will be the bin range column for the histochemical enzyme activity assay.

Creating a bin range column for the gus activity assay by PNPG breakdown is trickier. To decide on the other bin limits, you could divide up the rest of the data evenly into 3 bins (the same number as the number of categories of positive values in the histochemical assay) or try to decide where there are increases that best indicate significant jumps in activity level. The data does not have to be divided exactly evenly, but you have to be able to defend your bin decisions. List the largest activity unit for each bin in the column you are creating. Bins CANNOT overlap & they cannot include hyphens.

After you have created a bin range column composed of the highest value for each bin, select that column by highlighting it and make sure the location of that column of cells appears in the bin range entry.

Select the OUTPUT RANGE by clicking on the radio button to its left and then on the output range entry area. Click on any empty cell in any area of the spreadsheet where there is no data. Wherever you click will be where the histogram bin range and frequency (the number of observations that fell within the selected bin number) will be displayed in adjacent columns. These two columns will comprise the histogram graph.

Plotting the histogram bins as a graph:
To plot either assay as a histogram, use ChartWizard. Follow the directions, clicking on either a column graph or a pie as your choice of how to design the histogram. When you are asked for input range, highlight the frequency data for the assay.

You can move quickly through the prompts on the wizard to label the graph. In the tab called Titles: Enter X and Y axis labels here. Remember to include units of measurement if applicable. Do not give your histogram a title here. The title of the histogram should be part of the figure legend that is found below (not above) the graph. You may add the title later as a text box when you create the legend.

Gridlines are included as the default. If you don’t like them DESELECT “major gridlines” under the tab called “Value (Y) axis”.

Legend (this is what we would call the “Key” but Excel refers to as a legend): DESELECT “show legend” if you are graphing only one set of data and the information in it is repetitive to the axis labels or unhelpful.
Tell Excel where you’d like the graph to go.
Choose: on another sheet.
Click “Finish”. You can refine the graphs later.

To refine the histogram:

  1. To reduce the large gaps between bars of a histogram, double-click on one of the bars, choose the “options” tab, and then reduce the “gap width” until you are satisfied with the appearance.
  2. To remove the gray background double click on the background of the plot you’ve created. For “Border” and “Area”, select “NONE”.
  3. Changing the scaling. Scale can have a profound effect on the visual impact of your data (play with it if you don’t believe it) Be thoughtful about the scale. You might not want the scale to make a tiny difference look large, or then again you might. It depends upon the point you are trying to make using the data. If you want the reader to compare 2 similar graphs, he or she might prefer to see all the data using the same scale IF the units are comparable. To change the scale, specify the max and min values, and set the major unit (gridlines):Double click on the axis you wish to adjust. Decide on the minimum value. It does not always need to be zero, but often should be. Notice that you can set where the X axis crosses the Y. This is used to fix a graph in which the X axis appears to float in space, or at the top, or somewhere else unseemly.
  4. To finish your graph, double click on the axis labels to specify the font, font size, bold, etc. to make the labels maximally readable and professional looking.


If you are stuck and can’t figure out how to modify either an axis, a “legend” (this is an Excel word–“key” is a better term), a label, or the data … double click it. Clicking often brings up a menu that allows you to modify what you wish.

Now examine your completed graphs and ask yourself if your data display passes the test for good figures: a new reader should be to able see how your data allows you to make conclusions(s) and the figure legend explains the experiment clearly (without unnecessary detail). If your figure does all this, then you have designed a successful data display.
Personal tools