# Dahlquist:Mathematical Modeling

### From OpenWetWare

(→Schade Data Organized: documented the steps to using MatLab) |
Current revision (01:33, 10 March 2010) (view source) (→February 16, 2010: added the information that was discussed in lab) |
||

(9 intermediate revisions not shown.) | |||

Line 1: | Line 1: | ||

{{Template:Dahlquist}} | {{Template:Dahlquist}} | ||

+ | |||

+ | Alondra, here is the link to the [http://www.pnas.org/content/103/35/13004/suppl/DC1 supplementary information] from the Belle et al. article. ''— [[User:Kam D. Dahlquist|Kam D. Dahlquist]] 13:55, 24 February 2010 (EST)'' | ||

== Alondra's Notebook == | == Alondra's Notebook == | ||

Line 46: | Line 48: | ||

=== How to use MatLab=== | === How to use MatLab=== | ||

+ | * To use MatLab we will need a know where locate our documents. The data will be found under computer in the Local disk C. | ||

+ | * Once we are in the disk, we will choose the folder gene_regulatory_modeling where we will find the excel data needed and MatLab programs written. | ||

+ | -When making changes to the excel data we must make sure that degradation rates, production rates, and parameters are all correct. | ||

+ | -ex. If we were to be analyzing three genes we must have three production and regulation rates. Also, when we look at the network and see how many 1's appear, that will be the number of weights and thresholds that will be in our parameter. (Note. The 1's must be read from left to write until the row ends, and then we can move on to the next row.) | ||

+ | * Once we have chosen which excel sheet we need, we will open it | ||

+ | * From there we will open the MatLab program that contains the differential equation program that we need for the particular experiment(we must open MatLab first and the locate our document). | ||

+ | * From there we just have to press the run button (once changes have been made and saved) and analyze the graphs given. | ||

+ | * After we run the program, there will be an output file that will let us know the Log 2 concentrations. | ||

+ | |||

+ | ===February 9,2010=== | ||

+ | *We came into the lab and we copied and paste the Schade data into the new excel worksheet. The numbers in the data represent the ratio between red and green spots in the microarrays. | ||

+ | *In the new worksheet we transferred and the data and organized it in a way that we would understand it. | ||

+ | 1) We renamed each chip. Their name is, t for time, the time it was looked at and the sample number. (An example would look like t0-1 meaning, it is the first sample for time at t=0). | ||

+ | |||

+ | 2) We deleted the raw and control columns leaving only the normalized. We also kept the ID's for the genes that were in the original data set. | ||

+ | * When looking at the data we noticed that the number of samples for each time were different. For time at 10 minutes there is an extra sample. Also, we do not know which samples are paired with each when they were dye swapped. | ||

+ | |||

+ | === February 16, 2010 === | ||

+ | *We began by working with the worksheet that we made last week and named it logFC. In this worksheet we computed the log base 2 of each value in the data. (note:FC means full change) The command/formula used in excel is =LOG(B2,2). | ||

+ | *Then we made a new worksheet to normalize the data from the previous worksheet (logFC). This worksheet we named normalized. We added two rows to calculate the average and the standard deviation of each value. The average formula is =AVERAGE(B3:B6423). It shows that we took the average of all the genes at a specific time. The standard deviation formula is =STDEV(B4:B6423). It shows that we took the standard deviation of each gene at an specific time. Then we added a column to calculate the difference between the log value and the average value divided by the standard deviation. the formula look like =(B4-B$2)/B$3. (Note: B4 is the log value, B$2 is the average value and B$3 is the standard deviation value. The dollar sign is to hold the place value of 2 while the column B is interchangeable.) | ||

+ | *Then we made a new worksheet and copied the previous sheet (with paste special to only paste values not formulas) and only kept the normalized columns calculated previously. This worksheet is called average and p-value. To calculate the average we used the formula =AVERAGE(B2:D2). We also calculate a T statistic that tells us whether the scaled and centered average log ratio is significantly different than itself. We used a modified formula =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)). We also calculated a p-value with the equation =TDIST(ABS(R2),degrees of freedom,2). The degress of freedom is n-1. | ||

+ | *We finally copied this information into a new worksheet called final. This will help us view the information in an organized manner. | ||

+ | |||

+ | |||

+ | === February 23,2010, March 2, 2010 and March 9, 2010 === | ||

+ | *We had completed the calculations from the Schade data. What we noticed was that we needed to check the degradation rates for the 15 genes in our network. | ||

+ | *We took the raw half-life values of the genes from the Belle paper. the Belle paper was published in 2006, "Quantification of protein half-lives in the budding yeast proteome". | ||

+ | *Unfortunately 3 out of the 15 genes in our network did not have half-life values in this paper. | ||

+ | *To calculate the degradation rates from the half-life we used the equation =ln(1/2)/the half-life. We compared them t Stephanie Kuelbs' degradation rates and they were similar | ||

+ | *We could not leave those genes with zero values because that would misguide our function. | ||

+ | *We decided to take the average degradation rates of 203 genes from the Harbison's paper, Transcriptional regulatory code of a eukaryotic genome, published in 2004. The genes in this paper were compared to the genes in Belle's paper. | ||

+ | *The main reason why we decided to take the average of the 203 genes instead of only the genes in our network is because we might change the network later. If we were to change it, then the values would not have gone with that data. Since we are taking the average of the 203 transcription factors, then it accounts for the entire network, even if we were to change it. | ||

+ | *We took the median of the 203 genes. | ||

+ | *From the 203 gene list and the 3000 gene list, 135 genes were overlapped. | ||

+ | *Dr. Fitzpatrick suggested that for statistical purposes we took out the high and low 10% values. They are considered outliers. We also wanted the average to have less variation. | ||

+ | *Fortunately, our average was really close to the median. | ||

+ | *The average value that was obtained and replaced the genes that did not have degradation rates was -0.0193. |

## Current revision

Alondra, here is the link to the supplementary information from the Belle et al. article. *— Kam D. Dahlquist 13:55, 24 February 2010 (EST)*

## Contents |

## Alondra's Notebook

### January 26, 2010

- Set up user page on OpenWetware
- Saved Schade data and pdf file to flash drive

### Schade Data Organized

- Column A has the systematic of all the genes.

Note: The next columns will be presented in groups of three for an specific chip. The columns are broken down into normalized, raw, and control data (in that order).

- Columns B,C,D belong to the 257659.txt chip at time(t)=0.
- Columns E,F,G correspond to the 257720.txt chip at t=0.
- Columns H,I,J belong to the 258333.txt chip at t=0.
- Columns K,L,M belong to the 12164589.txt chip at t=10minutes.
- Columns N,O,P correspond to 12164610.txt chip at t=10minutes.
- Columns Q,R,S belong to the 12164813.txt chip at t=10min.
- Columns T,U,V belong to the 12251647.txt chip at t=10min.
- Columns W,X,Y belong to the 12251674a.txt chip at t=10min.
- Columns Z,AA,AB belong to the 12251693.txt chip at t=10min.
- Columns AC,AD,AE belong to the 12251985a.txt chip at t=10min.
- Columns AF,AG,AH correspond to the 257384.txt chip at t=30min.
- Columns AI,AJ,AK belong to the 257657.txt chip at t=30min.
- Columns AL,AM,AN belong to the 257661.txt chip at t=30min.
- Columns AO,AP,AQ belong to the 257662.txt chip at t=30min.
- Columns AR,AS,AT belong to the 258277.txt chip at t=30min.
- Columns AU,AV,AW belong to the 258317.txt chip at t=30min.
- Columns AX,AY,AZ correspond to the 275072.txt chip at t=2hr.
- Columns BA,BB,BC belong to the 275130.txt chip at t=2hr.
- Columns BD,BE,BF belong to the 275146.txt chip at t=2hrs.
- Columns BG,BH,BI belong to the 275151.txt chip at t=2hrs.
- Columns BJ,BK,BL belong to the 12251635.txt chip at t=12hrs.
- Columns BM,BN,BO correspond to the 12251649.txt chip at t=12hrs.
- Columns BP,BQ,BR belong to the 12251956.txt chip at t=12hrs.
- Columns BS,BT,BU belong to the 12251987.txt chip at t=12hrs.
- Columns BV,BW,BX belong to the 185234 .txt chip at t=60hrs.
- Columns BY,BZ,CA belong to the 185389.txt chip at t=60hrs.
- Columns CB,CC,CD belong to the 223342.txt chip at t=60hrs.
- Columns CE,CF,CG belong to the 223809.txt chip at t=60hrs.
- Columns CH,CI,CJ belong to the 224911.txt chip at t=60hrs.
- Columns CK,CL,CM belong to the 224973.txt chip at t=60hrs.
- Column CN is the standard name of the gene.

### How to use MatLab

- To use MatLab we will need a know where locate our documents. The data will be found under computer in the Local disk C.
- Once we are in the disk, we will choose the folder gene_regulatory_modeling where we will find the excel data needed and MatLab programs written.

-When making changes to the excel data we must make sure that degradation rates, production rates, and parameters are all correct. -ex. If we were to be analyzing three genes we must have three production and regulation rates. Also, when we look at the network and see how many 1's appear, that will be the number of weights and thresholds that will be in our parameter. (Note. The 1's must be read from left to write until the row ends, and then we can move on to the next row.)

- Once we have chosen which excel sheet we need, we will open it
- From there we will open the MatLab program that contains the differential equation program that we need for the particular experiment(we must open MatLab first and the locate our document).
- From there we just have to press the run button (once changes have been made and saved) and analyze the graphs given.
- After we run the program, there will be an output file that will let us know the Log 2 concentrations.

### February 9,2010

- We came into the lab and we copied and paste the Schade data into the new excel worksheet. The numbers in the data represent the ratio between red and green spots in the microarrays.
- In the new worksheet we transferred and the data and organized it in a way that we would understand it.

1) We renamed each chip. Their name is, t for time, the time it was looked at and the sample number. (An example would look like t0-1 meaning, it is the first sample for time at t=0).

2) We deleted the raw and control columns leaving only the normalized. We also kept the ID's for the genes that were in the original data set.

- When looking at the data we noticed that the number of samples for each time were different. For time at 10 minutes there is an extra sample. Also, we do not know which samples are paired with each when they were dye swapped.

### February 16, 2010

- We began by working with the worksheet that we made last week and named it logFC. In this worksheet we computed the log base 2 of each value in the data. (note:FC means full change) The command/formula used in excel is =LOG(B2,2).
- Then we made a new worksheet to normalize the data from the previous worksheet (logFC). This worksheet we named normalized. We added two rows to calculate the average and the standard deviation of each value. The average formula is =AVERAGE(B3:B6423). It shows that we took the average of all the genes at a specific time. The standard deviation formula is =STDEV(B4:B6423). It shows that we took the standard deviation of each gene at an specific time. Then we added a column to calculate the difference between the log value and the average value divided by the standard deviation. the formula look like =(B4-B$2)/B$3. (Note: B4 is the log value, B$2 is the average value and B$3 is the standard deviation value. The dollar sign is to hold the place value of 2 while the column B is interchangeable.)
- Then we made a new worksheet and copied the previous sheet (with paste special to only paste values not formulas) and only kept the normalized columns calculated previously. This worksheet is called average and p-value. To calculate the average we used the formula =AVERAGE(B2:D2). We also calculate a T statistic that tells us whether the scaled and centered average log ratio is significantly different than itself. We used a modified formula =AVERAGE(N2:P2)/(STDEV(N2:P2)/SQRT(number of replicates)). We also calculated a p-value with the equation =TDIST(ABS(R2),degrees of freedom,2). The degress of freedom is n-1.
- We finally copied this information into a new worksheet called final. This will help us view the information in an organized manner.

### February 23,2010, March 2, 2010 and March 9, 2010

- We had completed the calculations from the Schade data. What we noticed was that we needed to check the degradation rates for the 15 genes in our network.
- We took the raw half-life values of the genes from the Belle paper. the Belle paper was published in 2006, "Quantification of protein half-lives in the budding yeast proteome".
- Unfortunately 3 out of the 15 genes in our network did not have half-life values in this paper.
- To calculate the degradation rates from the half-life we used the equation =ln(1/2)/the half-life. We compared them t Stephanie Kuelbs' degradation rates and they were similar
- We could not leave those genes with zero values because that would misguide our function.
- We decided to take the average degradation rates of 203 genes from the Harbison's paper, Transcriptional regulatory code of a eukaryotic genome, published in 2004. The genes in this paper were compared to the genes in Belle's paper.
- The main reason why we decided to take the average of the 203 genes instead of only the genes in our network is because we might change the network later. If we were to change it, then the values would not have gone with that data. Since we are taking the average of the 203 transcription factors, then it accounts for the entire network, even if we were to change it.
- We took the median of the 203 genes.
- From the 203 gene list and the 3000 gene list, 135 genes were overlapped.
- Dr. Fitzpatrick suggested that for statistical purposes we took out the high and low 10% values. They are considered outliers. We also wanted the average to have less variation.
- Fortunately, our average was really close to the median.
- The average value that was obtained and replaced the genes that did not have degradation rates was -0.0193.