Andrew Forney Week 8: Difference between revisions

Revision as of 02:22, 24 October 2010

Electronic Lab Notebook

Normalization of Log Ratios

This portion was fairly straightforward and simply a task of rote repetition. Being a fan of algorithms, and as such, efficiency, I discovered that the fastest method was to first set up the appropriate columns (insert, then title) because the titles were so similar--the pattern buffer was used effectively because few changes were made between many copies and pastes.

After the columns were set, copying and pasting the pseudo-dynamic formula was made more efficient because the syntactic similarities made changes to the equations easy.

Statistical Analysis

This part of the analysis had a couple rough patches but was still pretty straightforward. The first hiccup was the instruction: "Go to a new column on the right of your worksheet. Type the header "Avg_LogFC_A", "Avg_LogFC_B", and "Avg_LogFC_C" into the top cell of the next three columns." I was unsure of what "right side of your worksheet" really meant--did it mean the right of the current screen bounds of the worksheet or just the right-most unoccupied column? I read a little further and decided that since the AVG_LogFC columns corresponded to their respective four individual samples, I would place them adjacent to one another. This seemed to be the most logical choice as it mimicked the format of the first part of the instructions, but I then realized that the remainder of the statistical analysis section assumed that the AVG_LogFC columns were directly adjacent to one another. As such, I needed to adapt a tiny bit of the formulas to fit my differing layout, but in the end, the results were the same--the difference should have made no change to the "forGenMapp" sheet.

The second minor hiccup was the formula, "=TDIST(ABS(R2),degrees of freedom,2)" as I got a #NAME error when trying to enter it. I then looked up this error in the help documentation and facepalmed when I saw that "degrees of freedom" should actually be an integer value--in our case, 2.

As far as tips and tricks went for this section, I discovered that the copy-paste process could be sped by CTRL-selecting columns that were not necessarily adjacent, which then allowed me to paste them in their proper order where needed. Additionally, I had always known of the presence of "Paste Special..." but never really used it. This example, pasting the values of the copied material over their formulaic derivation, let me get a gist for its purpose. Other than these two nuances, coupled with those that I noted about the normalization step, there wasn't a whole lot to say about the process. The instructions were easy to follow aside from the couple of small issues I initially had, and I'm confident that my end result is accurate.

Sanity Check: Significant Differences

How many genes have p value < 0.05? The filter result found 948 records.
What about p < 0.01? The filter result found 235 records.
What about p < 0.001? The filter result found 24 records.
What about p < 0.0001? The filter result found 2 records!

Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there? The filter result found 352 records.
Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there? The filter result found 596 records.
What about an average log fold change of > 0.25 or < -0.25? (This is a more realistic value for the fold change cut-off because it represents about a 20% fold change which is about the level of detection of this technology.) The filter result found 918 records.

Excel File: Exercise

Individual Excel Exercise (Excel Spreadsheet)

Individual Excel Exercise (Text Document)

@@ Line 21: / Line 21: @@
-=====Sanity Check=====
+=====Sanity Check: Significant Differences=====
-''Significant Differences:''
 * ''How many genes have p value < 0.05?'' The filter result found 948 records.
 * ''What about p < 0.01? '' The filter result found 235 records.
@@ Line 28: / Line 27: @@
 * ''What about p < 0.0001?'' The filter result found 2 records!
+* ''Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change greater than zero. How many are there?'' The filter result found 352 records.
+* ''Keeping the "Pvalue" filter at p < 0.05, filter the "Avg_LogFC_all" column to show all genes with an average log fold change less than zero. How many are there?'' The filter result found 596 records.
+* ''What about an average log fold change of > 0.25 or < -0.25? (This is a more realistic value for the fold change cut-off because it represents about a 20% fold change which is about the level of detection of this technology.)'' The filter result found 918 records.
 ====Excel File: Exercise====

Andrew Forney Week 8: Difference between revisions

Revision as of 02:22, 24 October 2010

Contents

Electronic Lab Notebook

Normalization of Log Ratios

Statistical Analysis

Sanity Check: Significant Differences

Excel File: Exercise

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools