Natalie Williams: Electronic Notebook

Protocol for MATLAB

This page will help you input and run data sets from your document into an output.

Dahlquist:GRNmap

Electronic Notebook

Fall 2014

This contains all the procedures and tasks that I completed and the trials that I ran in Fall 2014.

This contains all the procedures and tasks that I completed and the trials that I ran in Spring 2015. Most of the activities/notes for this semester focused on creating a poster for the various conferences that we attended in the Spring.

Summer 2015

This link has all the information for what occurred over the summer. A lot of it was testing the code by changing the initial weights and the threshold b values of the input sheets.

Fall 2015

Fall 2016

I was abroad last spring semester, Spring 2016, so there are no notes or records of my experience in the lab at that time.

September 2016

September 14, 2016

This week's job was to understand and analyze the study of Neymotin et al (2014) to derive degradation rates from the half-life values of the genes annotated.

Personally, I received feedback on my HNRS thesis abstract that is to be submitted on Sept. 30. I made the changes and sent them to Dr. Dahlquist.

I have worked more of the R Tutorial that Dr. Dahlquist has issued to both Brandon and me. While Brandon has already coded a script to generate random matrices, our next task will be to come up with code to then generate the distribution of in-degree and out-degree via a bar graph.

September 21, 2016

Today, Wednesday 21/9/2016, I completed my task of computing the degradation rates from Neymotin et al's article. I uploaded the file to the DahlquistLab repository where it waits to be reviewed by Dr. Dahlquist.

For the completion of my task with the degradation rates, the following was done:

I downloaded the supplemental data (Table S5) from Neymotin et al
From Neymotin's data, I edited the following
- Alphabetized: Gene names were used for the alphabetization
  - For alphabetization, I selected the entire sheet
  - Next, I clicked the Sort button that looks like a funnel, and selected "Custom sort"
  - For custom sort, I selected the column with the gene names, for me, Column 1
  - I, then, sorted from descending order from A -> to Z
- Isolated Half Lives: Created a separate sheet with only Systematic & Gene Names and the thalf life
  - On this new sheet, I copied the Gene names and the thalf lifes corresponding to those genes
  - I calculated the median half life, which will used to calculate the degradation rate of any gene with missing data
  - The following Excel equation was used
  - =MEDIAN("Column Containing thalf lives")
- Degradation Rates: Created an additional sheet for calculating the degradation rate from the half lives
  - Again, the Gene names and the thalf lives were pasted into this new sheet so that the calculations could be carried out on a single page without interfering with other information or formats
  - The following equation was used to calculate the degradation rate
  - = (ln (0.5)/ half life of specific gene)
  - For genes with missing data, the equation would be the following
  - = (ln (0.5)/ median half life)
I used a previous file shared with me from Dr. Dahlquist to make the comparison between this work (Neymotin) and Harbison's list of 203 TFs
I used Microsoft Access to pair the two data sets together using the systematic names in order to identify if there was missing data for the genes
1. First open a new blank database.
2. I imported my two excel files that contained my data
  - This act can be achieved by selecting the External Data tab and clicking the Excel icon
  - I then went through a series of instructions
    1. I browsed my computer for the file that I needed and selected it
    2. I chose the sheet that I would import, for me, this was Harbison's list of 203 TFs and the sheet with Neymotin's calculated degradation rates
    3. Depending on your sheet's format, the first row may either include headings or go directly into your data; select the box if your first row contains column headings
    4. I skipped the next question, asking about field names and the index, clicking next
    5. I then chose my own primary key - setting it to the first column with Systematic Names (not all genes have universal Gene names)
    6. I then clicked finish and import.
  - Now your data should be seen as a table in Access
3. To pair the data sets together, I selected the Create tab and hit Query Design
  - When you selected Query Design, a pop-up window appears and shows all the tables within your current database. Choose the tables that you wish to pair the data for. Exit out of that pop-up window and now you should see your tables with their heading under them.
  - Select the heading that has the information you want to pair with the other file. For me it was the Systematic Names from Neymotin's data with the Systematic Names from Harbison's data
  - Drag the heading and match it to the heading for the other data. Right click on the link that forms between the two headings
  - Because I only want the data from Neymotin's that matches with Harbison's data, I would select the option that states: "Include ALL records from 'Harbison 203' and only those records from 'Neymotin degradation rates' where the joined fields are equal."
  - Press ok and you should now see a pointed arrow head towards Neymotin deg rate heading
  - Now you can drag and drop the headings with the data that you want into the field below. For my query, I selected the names of Harbison's 203 TFs and then dragged down Neymotin's Systematic names as well as the calculated degradation rate to see if any genes were missing.
  - Now that the field is full, click Run to run your query.
4. A table should appear now with the data you wanted beside the heading - for me, I have the Systematic names paired together and their corresponding degradation rates in the column beside them.

- I also chose to include the calculated degradation rates from Neymotin's data in that query as well

I also revised my abstract and sent that to Dr. Dahlquist for review.

September 28, 2016

Today, I worked on some of the TRACE documentation (Numbers 5 & 6). For No. 5, I noted that it required descriptions of how coding was tested and implemented as well as software design. For those portions of the Implementation Verification (No. 5), I will have Eddie from the Coding team help me.

I finished edits of my Honors Thesis abstract and submitted it.

In talking with Dr. Dahlquist, Brandon and I will formulate the standard of input sheets needed for the lab. The process includes:

Using the genes from last year's GRNs
Plugging those genes into YEASTRACT to get the most up-to-date connections with those genes
Uploading the matrix into GRNSight to make sure that all the genes in the GRN are connected to each other
Creating the input sheets from scratch with the new degradation rates I computed, estimated production rates, and the expression rates from the microarray data
- For any missing data points, it was decided that the average of expression levels from all available time points will be used
- To ensure knowledge of missing time points, cells will be highlighted in different colors so that when GRNmap can execute with missing values, the filled in data can be removed easily

October 2016

October 5, 2016

Today, I created the input sheets for the two strains that I have - wild-type & dCIN5 from Kayla Jackson's file. The protocol can be found on the Dahlquist Github repository.

To achieve the degradation rates and the log expression data for each strain, the Access protocol above was use. The data from one workbook was paired to the existing data in the other workbook with the log expression so that only genes in the network had their expression's noted.

October 12, 2016

Today, I worked on a lot of documentation and cleaning up my various files that I have shared.

The first thing I updated was the protocol to obtaining the file with the degradation rates and the calculations that I did from the half lives.
I updated the wiki (github) with the newer protocol, which still has be to be reviewed by Dr. D.
I tweaked a few files that I've uploaded to the Dahlquist Repository on Github

The files that I edited are the following:

wt_NEW_Input_16_Node; I changed the optimization parameters sheet to add the headings 'optimization_parameter' and 'value'
- I then updated the values for the optimization parameters, i.e. alpha and MaxIter. These values can be found under Step 11: GRNmap on Dr. Dahlquist's Microarray Data Workflow.
dCIN5_NEW_KJ_15_Node; again, I changed the optimization parameters sheet to added the headings 'optimization_parameter' and 'value' and updated the parameters' values according to the workflow mentioned above
Neymotin_Williams_TF_Comparison; I added an additional sheet for the rounded values that will be used for the degradation rates of the input sheets.

I reviewed Brandon's input sheets while he reviews mine. Because I don't know how thoroughly I should've reviewed his data, I started with the accumulation of the log expression data. After that, I will average the numbers for the missing data, to ensure that what we have calculated is convergent.

I had to reupload and recalculate the degradation rates. Instead of taking the median of the TFs from Harbison's list, I took the median of all the genes on Neymotin's list. The median calculated from all of the genes from Neymotin's data was 10.2 compared to the median from the TFs in Harbison's list, which is 7.

October 19, 2016

I had to re-format and re-upload my old Input sheets. The files on the Dahlquist Repository did not have the data from all the strains, which was requested. Further, I updated the degradation rates as well as the production rates for the files. I continued with the formatting of the cells, using Arial font @ 11.

I then focused on trying to find the degradation rates that I found earlier this semester. The sites were bookmarked on my computer; however, there was an issue with my laptop where it wiped most recently bookmarked websites. Unfortunately, I wasn't able to find the specific sources I had earlier.

October 26, 2016

Updated wt & cin5 input sheets with dHMO1 log fold expression and also changed the format such that the labels weren't capitalized

dCIN5_log2_expression --> dcin5_log2_expression

I ran a GRNmap test of the wt data and the Testing Report can be found here.

Documents

Summer 2015

To view the most updated powerpoint click here
To see the input sheet that was run for the fixed b trial, please click this link
To view the output file from this fixed b trial, click here
To see the input sheet that was run from the estimated b, please click this
To view the output file from the estimated b, click here
The powerpoint that reviews and analyzes the outputs can be viewed here

GRNmap Testings

This is the template for future reports: GRNmap Testing Report
GRNmap Testing Report: Strain Run Comparisons 2015-05-27
GRNmap Testing Report: Non-1 Initial Weight Guesses 2015-05-28

Natalie Williams: Electronic Notebook

Contents