Andrew Forney Week 8: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(first part of sanity check)
(more added to sanity check 1)
Line 21: Line 21:




=====Sanity Check=====
=====Sanity Check: Significant Differences=====
''Significant Differences:''
* ''How many genes have p value < 0.05?'' The filter result found 948 records.
* ''How many genes have p value < 0.05?'' The filter result found 948 records.
* ''What about p < 0.01? '' The filter result found 235 records.
* ''What about p < 0.01? '' The filter result found 235 records.
Line 28: Line 27:
* ''What about p < 0.0001?'' The filter result found 2 records!
* ''What about p < 0.0001?'' The filter result found 2 records!


 
* ''Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?'' The filter result found 352 records.
* ''Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?'' The filter result found 596 records.
* ''What about an average log fold change of > 0.25 or < -0.25? (This is a more realistic value for the fold change cut-off because it represents about a 20% fold change which is about the level of detection of this technology.)'' The filter result found 918 records.


====Excel File: Exercise====
====Excel File: Exercise====

Revision as of 02:22, 24 October 2010

Author: Andrew Forney

Assignment: Individual Journal 8


Electronic Lab Notebook

Normalization of Log Ratios

This portion was fairly straightforward and simply a task of rote repetition. Being a fan of algorithms, and as such, efficiency, I discovered that the fastest method was to first set up the appropriate columns (insert, then title) because the titles were so similar--the pattern buffer was used effectively because few changes were made between many copies and pastes.

After the columns were set, copying and pasting the pseudo-dynamic formula was made more efficient because the syntactic similarities made changes to the equations easy.


Statistical Analysis

This part of the analysis had a couple rough patches but was still pretty straightforward. The first hiccup was the instruction: "Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns." I was unsure of what "right side of your worksheet" really meant--did it mean the right of the current screen bounds of the worksheet or just the right-most unoccupied column? I read a little further and decided that since the AVG_LogFC columns corresponded to their respective four individual samples, I would place them adjacent to one another. This seemed to be the most logical choice as it mimicked the format of the first part of the instructions, but I then realized that the remainder of the statistical analysis section assumed that the AVG_LogFC columns were directly adjacent to one another. As such, I needed to adapt a tiny bit of the formulas to fit my differing layout, but in the end, the results were the same--the difference should have made no change to the "forGenMapp" sheet.

The second minor hiccup was the formula, "=TDIST(ABS(R2),degrees of freedom,2)" as I got a #NAME error when trying to enter it. I then looked up this error in the help documentation and facepalmed when I saw that "degrees of freedom" should actually be an integer value--in our case, 2.

As far as tips and tricks went for this section, I discovered that the copy-paste process could be sped by CTRL-selecting columns that were not necessarily adjacent, which then allowed me to paste them in their proper order where needed. Additionally, I had always known of the presence of "Paste Special..." but never really used it. This example, pasting the values of the copied material over their formulaic derivation, let me get a gist for its purpose. Other than these two nuances, coupled with those that I noted about the normalization step, there wasn't a whole lot to say about the process. The instructions were easy to follow aside from the couple of small issues I initially had, and I'm confident that my end result is accurate.


Sanity Check: Significant Differences
  • How many genes have p value < 0.05? The filter result found 948 records.
  • What about p < 0.01? The filter result found 235 records.
  • What about p < 0.001? The filter result found 24 records.
  • What about p < 0.0001? The filter result found 2 records!
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there? The filter result found 352 records.
  • Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there? The filter result found 596 records.
  • What about an average log fold change of > 0.25 or < -0.25? (This is a more realistic value for the fold change cut-off because it represents about a 20% fold change which is about the level of detection of this technology.) The filter result found 918 records.

Excel File: Exercise

Individual Excel Exercise (Excel Spreadsheet)

Individual Excel Exercise (Text Document)