Natalie Williams Fall Electronic Notebook 2015

From OpenWetWare
Revision as of 14:29, 21 September 2016 by Natalie Williams (talk | contribs) (→‎November 18, 2015: Added december information)
Jump to navigationJump to search

Fall 2015

September 2015

September 2015

This month consisted of meetings and getting to know new members of the research team.
We decided that we would be going through all the data analysis & number crunching of the raw data. In doing so, we hope to consolidate all of the information and data to obtain one master data set.

  • The protocol for this can be found [[ |here]]


Grace and I also looked over the literature that we selected last time to find data sets with production and degradation rates for mRNA.

For the sanity check:

  • WT statistical results
Sanity Check
P-value criteria Number of Genes % out of 6189
p<0.05 2600 42.0%
p<0.01 1727 27.9%
p<0.001 1015 16.4%
p<0.0001 574 9.3%
Bonferroni p<0.05 302 4.9%
B&H p<0.05 1936 31.3%
  • S. paradoxus statistical results
Sanity Check
P-value criteria Number of Genes % out of 6189
p<0.05 2513 40.6%
p<0.01 1637 26.5%
p<0.001 852 13.8%
p<0.0001 433 7.0%
Bonferroni p<0.05 232 3.7%
B&H p<0.05 1791 28.9%
  • dHMO1 statistical results
Sanity Check
P-value criteria Number of Genes % out of 6189
p<0.05 1019 16.5%
p<0.01 432 7.0%
p<0.001 119 1.9%
p<0.0001 46 0.7%
Bonferroni p<0.05 25 0.4%
B&H p<0.05 114 1.8%

October 2015

October 7, 2015
  • I took the significant p-values that were less than 0.05 from the B-H column. Those genes that came up with significant p-values were then put into YEASTRACT to determine which transcription factors are enriched.
  • This was done for both the dHMO1 and the S. paradoxus strains
  • In trying to pare down the network, I came up with five ways to generate different networks:
    • All genes that were not connected were removed from the network. Also, genes with fewer than 1 regulator and controlling 1 other gene were removed
    1. Top 25 genes with the highest or best p-values
    2. Top 25 genes that have the highest regulation control
    3. Genes with the highest number of regulators
    4. 10 Genes with the highest number of regulators and 10 genes with the lowest regulation control
    5. Genes with low regulators and low regulation control, with these values falling between 2 and 10.
October 14, 2015
  • In coming back from this week's meeting, ways to pare down the network were discussed.
  • The method was to:
    1. Delete genes that were not connected
    2. Delete the genes with the significant p-values from YEASTRACT, but with the highest p-value, regardless of if they were deletion strains
      • See how many networks you get from deleting this
      • If more than one gene was affected, resulting in 0 connections with the network, those genes were to be deleted as well
    3. Next, regard the deletion strains and make sure that they remain in the network
      • The same steps were followed for above, however, these genes must be kept such that the deletion strains remain in the network
October 21, 2015

The first list contains the initial genes found in the network with no regard for the genes that have microarray data for them from the Dahlquist Lab. Here are the genes that made it to the initial network containing 32 nodes and 71 edges:

  • ACE2
  • MSN2
  • SFP1
  • YHP1
  • YOX1
  • ASH1
  • ASF1
  • CSE2
  • SNF2
  • CYC8
  • MGA2
  • STB5
  • SWI5
  • YLR278C
  • CST6
  • RPN4
  • SNF6
  • MSN4
  • ABF1
  • SNF5
  • ZAP1
  • GCN4
  • TAF14
  • PHO2
  • MCM1
  • AFT2
  • HSF1
  • SKO1
  • SWI3
  • GCR2
  • SOK2
  • CIN5

With these remaining nodes, the network was pared down based on the least significant p-value within this network. There were a subsequent 14 deletions. The list that follows contains which genes were deleted and how many nodes and edges remained after the specific gene's deletion.

  • Deletion 1: SOK2
    • 31 nodes
    • 58 edges
  • Deletion 2: GCR2
    • 20 nodes
    • 56 edges
  • Deletion 3: SWI3
    • 29 nodes
    • 54 edges
  • Deletion 4: SKO1
    • 28 nodes
    • 49 edges
  • Deletion 5: HSF1
    • 26 nodes
    • 46 edges
    • CSE2 deleted as a result of deletion of HSF1
  • Deletion 6: MCM1
    • 24 nodes
    • 39 edges
    • SNF6 deleted sequentially
  • Deletion 7: PHO2
    • 23 nodes
    • 38 edges
  • Deletion 8: TAF14
    • 22 nodes
    • 37 edges
  • Deletion 9: GCN4
    • 21 nodes
    • 36 edges
  • Deletion 10: SNF1
    • 20 nodes
    • 35 edges
  • Deletion 11: CIN5
    • 19 nodes
    • 29 edges
  • Deletion 12: AFT2
    • 18 nodes
    • 28 edges
  • Deletion 13: ZAP1
    • 17 nodes
    • 27 edges
  • Deletion 14: ABF1
    • 13 nodes
    • 17 edges
    • CST6, MGA2, and SNF2 deleted as a result of ABF1 deletion

When considering the TFs studied in this lab, different deletions were made to ensure that the TFs of interest would remain in the network. The initial network had the following 35 nodes and 88 edges:

  • ACE2
  • MSN2
  • SFP1
  • YHP1
  • YOX1
  • ASH1
  • ASF1
  • CSE2
  • SNF2
  • CYC8
  • MGA2
  • STB5
  • SWI5
  • YLR278C
  • CST6
  • RPN4
  • SNF6
  • MSN4
  • ABF1
  • SNF5
  • ZAP1
  • GCN4
  • TAF14
  • PHO2
  • MCM1
  • AFT2
  • HSF1
  • SKO1
  • SWI3
  • GCR2
  • SWI4
  • CIN5
  • GLN3
  • HMO1
  • HAP4

19 subsequent deletions were made following the this original network to pare down the network to a mere 15 nodes.

  • Deletion 1: GCR2
    • 24 nodes
    • 86 edges
  • Deletion 2: SWI3
    • 33 nodes
    • 84 edges
  • Deletion 3: SKO1
    • 32 nodes
    • 78 edges
  • Deletion 4: HSF1
    • 31 nodes
    • 74 edges
  • Deletion 5: MCM1
    • 30 nodes
    • 65 edges
    • CSE2 deleted as a result of deletion of MCM1
  • Deletion 6: PHO2
    • 28 nodes
    • 64 edges
  • Deletion 7: TAF14
    • 27 nodes
    • 63 edges
  • Deletion 8: SNF5
    • 26 nodes
    • 62 edges
  • Deletion 9: MSN4
    • 25 nodes
    • 57 edges
  • Deletion 10: SNF6
    • 24 nodes
    • 56 edges
  • Deletion 11: RPN
    • 23 nodes
    • 54 edges
  • Deletion 12: CST6
    • 22 nodes
    • 53 edges
  • Deletion 13: YLR278C
    • 21 nodes
    • 50 edges
  • Deletion 14: SWI5
    • 20 nodes
    • 48 edges
  • Deletion 15: STB5
    • 19 nodes
    • 45 edges
  • Deletion 16: MGA2
    • 18 nodes
    • 43 edges
  • Deletion 17: CYC8
    • 17 nodes
    • 37 edges
  • Deletion 18: SNF2
    • 16 nodes
    • 36 edges
  • Deletion 19: ASH1
    • 15 nodes
    • 32 edges
October 28, 2015

Took a look at the Degradation rates (half-lives) of the mRNA from literature.

The two papers that we decided to draw data from were: Wang 2002 and Shalem 2008. The following weeks will be spent using Access to pick out the specific half-lives of the wanted transcription factors.

November 2015

November 4, 2015

We worked on pulling out the data from the respective data sets we chose to work on --> Grace w/ Wang & myself w/ Shalem

  • Access was used to extract the data for the transcription factors of interest (TFoI)
  • Both the Access protocol and the list of TFoI can be found on Grace's page

Results
Only one TF did not have a half life associated with it: YPL248C.
Most of the TFs also had large differences between the two measured half-lives in minutes. For example, YER045C had one half-life equal to 146.871 with its second value at 12.751. Their average is 79.811 with standard deviation set at 94.837.
Many of these values, half-lives with large discrepancies between the measurements, were seen (at least for the TFoI).
I created a column where it lists TFs that had standard deviations less than 10. If those values were less than 10, the numerical value was kept; however, if the numbers were greater than 10, the value was sent to 0 for easier analysis.
The values that were sent to 0 were then counted. Out of the 202 TFoI, 136 were sent to 0.
To me, this suggests that we may not be able to use the measured half-life values from this source - just based on the raw numbers.

The data file can be seen here: Media:NW_DegRates_SpecificTFs.xlsx‎

November 11, 2015

This week Input sheets were created for the networks that we created based on the criteria of p-values.
Input sheet creation protocol can be found on Github's wiki. Some of the pages are blank, such as the degradation and production rates sheets. Some of the parameters also are blank because I did not know what values to input for them.

  • I may just use the values used in previous network sheets for these values (i.e. MaxIter, TolMax, etc.)

November 18, 2015

We wanted to finish creating the Input sheets before running them in MATLAB. We would try to accomplish this before our next meeting (after Thanksgiving Break)

December 2015

December 2, 2015

I finished creating my last two input sheets for the wt_added genes. We tried running our input sheets and were not

We will create a test sheet without missing values.