Desireegonzalez Week 6

From OpenWetWare
Jump to navigationJump to search

Purpose

  • The scientific purpose of my investigations during this weeks assignment, is to create an input workbook that can then be used by GRNmap to help determine the values of w, b, and P for the network of genes obtained from dHAP4 Profile9. This assignment will act as a way to learn how to properly format input workbooks, while also learning how to use a queries in a Microsoft Access database to collect data (i.e. production rates, degradation rates, ANOVA data) pertinent to the Profile9 genes that had been mapped in a network during the Week 4/5 Assignment.

Methods

Creating a GRNmap Input Workbook

  • An Excel Workbook, formatted similar to the Sample Workbook, needed to be created using the the dHAP4 Profile9 network and microarry data, in order to be able to run it in the GRNmap modeling software.
  • Note that the steps below must be followed exactly as written or else, GRNmap will perceive an error and will not function.

Creating a Production_Rates Sheet

  1. A production rate sheet containing the production rate parameters "P" for all the genes in the network was created with two columns (from left to right) entitled, "id" and "production_rate".
  2. The "id" column was copied from the gene list in column 1 of the "network" tab in the regulation matrix Excel Sheet used for GRNsight in the Week 4/5 Assignment.
  3. The production rate values were copy and pasted one-by-one from the following Microsoft Access database, download from here.
    • Note that the genes needed to stay listed in the same order for all sheets!
    • Also note that the production rate values were all rounded to four decimal places.

Creating a Degradation_Rates Sheet

  1. A degradation rate sheet containing the degradation rates for all the genes in the network was created with two columns (from left to right) entitled, "id" and "degradation_rate".
  2. The "id" column was copied from the "id" column used in the "production_rate" worksheet; seen in the step above.
  3. Similar to the production rate values, the degradation rate values were then copy and pasted one-by-one from the following Microsoft Access database, Expression-and-Degradation-rate-database_2019.accdb
    • Note that like in the production rate steps above, the genes needed to stay listed in the same order for all sheets and the degradation values were all rounded to four decimal places.

Creating Expression Data Sheets for Individual Yeast Strains

  1. Next, the Microsoft Access database workbook was used to run queries to obtain the ANOVA log2 data for all of the given strains found within our gene network.
  2. To begin running the queries, an excel work book with the list of Profile9 network genes (the same data as what was placed into the "id" columns in the steps above), was made.
    • This Excel workbook was uploaded as a new table in the database, by clicking on the "External Data" tab and selecting the Excel icon with the "up" arrow on it, and then clicking the "Browse" button and selecting the Excel file with the list of Profile9 network genes.
    • After uploading the file, the button next to "Import the source data into a new table in the current database" was chosen before clicking "OK".
    • A pop up window depicting the option "First Row Contains Column Headings" was checked before clicking "Next," and then selecting the button for "Choose my own primary key." before clicking "Next" a second time.
    • The title of the left most column was checked for its label of "id" before clicking "Next" and typing "network" were it says "Import to Table." When these steps were done, the "Finish" button was clicked.
      • Another pop up message which asked if the import steps needed to be saved, was just closed (not saved). This led to the presence of a new table entitled "network" to show up on the left-hand side of the screen.
  3. The next step in the process was creating queries within the Microsoft Access database to obtain the required ANOVA log2 data from all strains.
  4. To begin, the "Create" tab was clicked, followed by the selection of the icon for "Query Design".
  5. Then in the window that appeared, the "network" table was selected and the "Add" button was clicked. A second table, titled "STRAIN_log2_expression" was also selected and added.
    • This step in the query individually focused on one stain expression table at a time; the STRAIN in the title was replaced by either of the following strains: wt, dGLN3, dHAP4, dZAP1, depending on which strain the query was being done for.
  6. After adding the tables, the word "id" in the network table was identified. Once found the mouse was dragged from it to the "standard_name" field in the "STRAIN_log2_expression" table, and then was released.
    • This caused a line to appear between the two words; this line was then right-clicked and told to "join properties".
      • When joining the properties of the two tables, Option "2: Include ALL records from 'network' and only those records from 'production_rates' where the joined fields are equal," was selected before clicking "OK".
  7. After this, the word "id" was clicked in the "network table" and was dragged to the bottom of the screen to the first column next to the word "Field" and was then released.
  8. The same process of dragging and releasing on the rightmost column was done with the "STRAIN_log2_expression" data for all the time points of 15 minutes, 30 minutes, and 60, minutes.
    • When doing this step, recall that only one strain at a time would be used in the query; for example all the dragged data would be labeled as STRAIN_LogFC_t15-2, STRAIN_LogFC_t15-3, etc, where strain would be changed by the actual name of the strain.
  9. When all the data needed was dragged down to the columns at the bottom of the page, the gray area near the table was right-clicked and in the menu that appeared, "Query Type" then "Make Table Query" were selected.
    • In the window that appeared, the name of the table was changed to "STRAIN_log2_expression_1" to distinguish it from the other tables visible.
  10. When done renaming the table "Current Database" was selected and "OK" was clicked.
      • Recall that "STRAIN" in the title is changed to wt, dHAP4, dGLN3, or dZAP1, depending on what strain the query is being run for.
  11. Next, IN THE "Query Tools: Menus" tab, the exclamation point was clicked, and then "Yes" was clicked on the pop up box that stated the number of rows being pasted into the table.
  12. A new table labeled "STRAIN_log2_expression_1" was then visible on the left-hand side of the page.
    • This table was then opened and the data was copy and pasted into a new sheet in the input workbook being created.
      • The new sheet was labeled "STRAIN_log2_expression", where "STRAIN" was changed depending on the stain being used to obtain the query data.
  13. Once copied into the input workbook, the column headers were changed to be the time at which the data were collected, without any units. For example, the 15 minute timepoint would have a column header "15" and the 30 minute timepoint would have the column header "30". Replicate data for the same timepoint should be in columns immediately next to each other and have the same column headers. For example, three replicates of the 15 minute timepoint would have "15", "15", "15" as the column headers.
  14. The query process in steps #4 to #13 above was repeated three more times, until data for all 4 strains (wt, dHAP4, dZAP1, and dGLN3) was collected.

Creating a Network Sheet

  1. A network sheet was created by copy and pasting the network derived from the YEASTRACT database on the Week 4/5 Assignment.
    • All data in this matrix was left as is; only the title of the first column was changed to "cols regulators/rows targets."

Creating a Network_Weights Sheet

  1. A network_weights sheet was created by copy and pasting the network sheet edited in the previous step seen above.
    • The weights are initial guesses which will be used by GRNmap, so the content in this sheet can be identical to the "network" sheet.

Creating an Optimization_Parameters Sheet

  1. The optimization_parameters sheet for my input workbook was just copied and pasted from the sample input workbook.
    • No modifications needed to be made to row 15 since all strains were found in the network for dHAP4 Profile9.

Creating a Threshold_B Sheet

  1. A threshold_b parameters sheet was then created by making two columns on a new sheet in the input workbook.
  2. The left-most column was labeled "id" and the standard names of the genes from the first column in the network sheet was copy and pasted into the "id" column of the treshold_b sheet.
  3. The second column was labeled "threshold_b" and then 0 was input as an initial guess value corresponding to all of the genes provided.

Using Dynamical Systems to Model the Gene Regulatory Network

  • The running of the model using GRNmap was done in class on February 28th, 2019.
  1. Download the GRNmap v1.10 code from the GRNmap Downloads page, GRNmap Downloads page.
    • A direct link to start downloading this file can be found here
  2. The downloaded file was the unzipped by right-clicking on the folder, clicking "7-zip", and then choosing "Extract here."
  3. Launch MATLAB R2014b.
    • When MATLAB R2014b was launched, the folder GRNmodel.m from the directory of the unzipped folder was opened.
  4. In order to run this model in MATLAB, the green "play" arrow was chosen.
    • This prompted the selection of a file to be used; I then selected my created input workbook.
  5. After this an optimization diagnostics graphic showed the progress of the estimation, and when the run finished, expression plots were displayed.
    • When finished running, output .xlsx and .mat files appeared in the same folder as the initial input folder, along with .jpg files containing the optimization diagnostic and individual expression plots.
      • These files were saved; they can then be uploaded into GRNsight to visualize the results.

Results

  • A properly formatted input workbook containing the production_rate, degradation_rate, log2 data for all strains, network, network_weights, optimization_parameters, and threshold_b data values was created and uploaded onto Box and on this Openwetware page under "Data and Files" section below.
    • This input workbook will be used in class on February 28th to run the model using GRNmap.

Data and Files

GRNmap Input Workbook Files: Final and Sample

Microsoft Access Database and Queries

Analyzing the Models Results

  1. For our initial runs, we estimated all three parameters w, P, and b.
    • How do the modeling results change if P is instead fixed and w and b are estimated?
    • How do the modeling results change if b is fixed and w and P are estimated?
    • How do the modeling results change if P and b are fixed, and only w is estimated?
  2. For our initial runs, we included all three microarray datasets, wt, Δgln3, and Δhap4.
    • What happens to the results if we base the estimation on just two strains (wt + one deletion strain)?
    • What happens to the results if we base the estimation on just the wt strain data?
  3. When viewing the modeling results in GRNsight, you may determine that one or more genes in the network does not appear to be doing much.
    • What happens to the modeling results if you delete this gene from the network and re-run the model (remember you will have to delete references to this gene in all worksheets of the input file).
  4. You also might think that a particular edge (regulatory relationship) is not needed. What happens if you delete that edge?
  5. What happens if you include the t90 and t120 expression data?

Scientific Conclusion

  • In conclusion, this weekly assignment's purpose was fulfilled since it helped me gain the skills of properly making and formatting an input workbook to be used in the future with GRNmap.
  • No scientific conclusions about the accurate values for w, b, and P, could be depicted since the model still needs to be run.

Acknowledgements

  • I communicated with my homework partners, Ava and Brianna through text message in order to try to figure out how to do the query for the dHAP4_ANOVA data. I also emailed Dr. Dahlquist and Dr. Fitzpatrick for assistance when I was confused on the query procedure and later when I was experiencing error messages trying to run the model using GRNmap.
  • In addition to communicating with the professors, I copied and edited some of the wikitext for the methods sections of this assignment from the BIOL388/S19: Week 6 page written by Dr. Dahlquist and Dr. Fitzpatrick.

Except for what is noted above, this individual journal entry was completed by me and not copied from another source. Desireegonzalez (talk) 21:36, 27 February 2019 (PST)

References

Dahlquist, K. (2018, October 24). GRNmap. Retrieved February 28, 2019, from http://kdahlquist.github.io/GRNmap/downloads/

Dahlquist, K. & Fitzpatrick, B.G. (2019, February 20). BIOL388/S19:Week 6. Retrieved from https://openwetware.org/wiki/BIOL388/S19:Week_6 on 25 February 2019.

Below are the links to all the Assignments and Journal Entries of the Spring 2019 Semester.

User Page: user:desireegonzalez

Template Page: template:desireegonzalez

Weekly Assignment Pages:

  1. Week 1
  2. Week 2
  3. Week 3
  4. Week 4
  5. Week 5
  6. Week 6
  7. Week 7
  8. Week 9
  9. Week 10
  10. Week 11
  11. Week 12
  12. Week 15

Individual Journal Entry Pages:

  1. desireegonzalez Week 1
  2. desireegonzalez Week 2
  3. desireegonzalez Week 3
  4. desireegonzalez Week 4/5
  5. desireegonzalez Week 6
  6. desireegonzalez Week 7
  7. desireegonzalez Week 9
  8. desireegonzalez Week 10
  9. desireegonzalez Week 11
  10. desireegonzalez Week 12
  11. desireegonzalez Week 15

Shared Journal Pages:

  1. Shared Journal Week 1
  2. Shared Journal Week 2
  3. Shared Journal Week 3
  4. Shared Journal Week 4
  5. Shared Journal Week 5
  6. Shared Journal Week 6
  7. Shared Journal Week 7
  8. Shared Journal Week 9
  9. Shared Journal Week 10
  10. Shared Journal Week 11
  11. Shared Journal Week 12
  12. Shared Journal Week 14/15