User:The Biology Group: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 49: Line 49:
We then looked at the phenotypes yielded by each row.  In this case blue, brown, intermediate. We then devised simple mathematical rules using all SNPs in question to come up with values.  Ranges of values were then correlated with different phenotypes.  The goal of the modeling team is to see if they can generate these rules using their model.
We then looked at the phenotypes yielded by each row.  In this case blue, brown, intermediate. We then devised simple mathematical rules using all SNPs in question to come up with values.  Ranges of values were then correlated with different phenotypes.  The goal of the modeling team is to see if they can generate these rules using their model.


[[Media:Data_Set.xlsx|Data Set 1]]
Below are 3 seperately generated data sets (3 different randomly generated matricies) that each use two rules as models:
:#
[[Media:Dataset2.xlsx|Data Set 2]]
 
[[Media:Dataset3.xlsx|Data Set 3]]
 
In this particular data set we see two rules:
:(0-4)= Blue  (5-12)= Intermediate (13-19)= Brown
:(0-4)= Blue  (5-12)= Intermediate (13-19)= Brown
:# (.5*homozygous recessive SNP1 + 2*homozygous recessive SNP3+ 3*heterozygous SNP6+ 12*heterozygous SNP10)
:# (.5*homozygous recessive SNP1 + 2*homozygous recessive SNP3+ 3*heterozygous SNP6+ 12*heterozygous SNP10)
:# (.67*heterozygous SNP2+ 1.5*homozygous recessive SNP4+ 5*homozygous recessive SNP7+ 4*heterozygous SNP9+ .4*homozygous recessive SNP11)
:# (.67*heterozygous SNP2+ 1.5*homozygous recessive SNP4+ 5*homozygous recessive SNP7+ 4*heterozygous SNP9+ .4*homozygous recessive SNP11)


[[Media:Data_Set.xlsx|Data Set 1]]
[[Media:Dataset2.xlsx|Data Set 2]]


[[Media:Dataset3.xlsx|Data Set 3]]


'''Rotterdam Proposal and Other Outreach Efforts'''
'''Rotterdam Proposal and Other Outreach Efforts'''

Revision as of 01:58, 9 December 2009

Bold textHey! So in order to make it easier and more accessible, we have moved all discussion surrounding the "Biology Aspect" of our potential project to this page.

Enjoy!

Contact Information

  • Ridhi Tariyal (ridhitariyal@gmail.com)
  • Jackie Nkuebe (jnnkuebe@fas.harvard.edu)
  • Joseph Torella (jtorella@gmail.com)
  • Anugraha Raman (amraman@fas.harvard.edu)
  • Anna Turetsky (turetsky@fas.harvard.edu)

Group Description

1. Coming up with concrete examples of epistatic interactions in polygenic traits.

2. Developing a method (in conjunction with the math modeling group) to accurately predict the correlation between disease risk and SNP "hierarchy" in human disease.

3. Determining the feasibility of given models.

Final Progress

Relevant Literature

Eye color and the Prediction of Complex Phenotypes from Genotypes

Pigmentation Paper and accompanying SNP Spreadsheet

SNPs important for Eye Color determination

Approximately 34 SNPs are listed here, but in reality studies have narrowed it down to 6 important ones. We hope that the mathematical models can do the same. The rsids, chromosome location, gene and alleles are all listed in this spreadsheet.

Aggregate GWAS Studies Spreadsheet (See tool portion for more information)

LDL and Cholesterol GWAS study and accompanying SNP Spreadsheet

Presentations

[[Media: |Final Presentation]]

Intermediate Eye-Color Presentation

Tool for finding SNPs and Relevant Literature

Formulated Data Set

We attempted to get genomes of actual people accompanied by phenotypic data; however we were unable to get that. The author of the eye color study told us that if we wrote a proposal we would most likely be able to obtain this data ( See Rotterdam Proposal Section)

In order to aid the modeling group we created data sets. The data set shown below is one for eye color. Taking 20 individuals with twelve different SNPs inmportant to eye color we created a genotypic matrix. After meeting for the modeling group we learned that they wanted to deal with a binary system of zeroes and ones that would turn even continuous traits into seemingly "binary" traits, since this would be easier to model. For example, eye color would be a continuous trait, since in our model we use SNPs with genotypes that yield blue, intermediate, or brown eyes. Then for each SNP, the individual could be homozygous dominant, heterozygous, homozygous recessive. In order to accomidate for all three cases we listed for each SNP 2 categories of homozygous dominant and heterozygous. We marked 0 if they didn't have the trait and 1 if they did have the trait. By process of elimination, or by the presence of a one we could accomadate for three possibilies using a binary system.

We then looked at the phenotypes yielded by each row. In this case blue, brown, intermediate. We then devised simple mathematical rules using all SNPs in question to come up with values. Ranges of values were then correlated with different phenotypes. The goal of the modeling team is to see if they can generate these rules using their model.

Below are 3 seperately generated data sets (3 different randomly generated matricies) that each use two rules as models:

(0-4)= Blue (5-12)= Intermediate (13-19)= Brown
  1. (.5*homozygous recessive SNP1 + 2*homozygous recessive SNP3+ 3*heterozygous SNP6+ 12*heterozygous SNP10)
  2. (.67*heterozygous SNP2+ 1.5*homozygous recessive SNP4+ 5*homozygous recessive SNP7+ 4*heterozygous SNP9+ .4*homozygous recessive SNP11)

Data Set 1

Data Set 2

Data Set 3

Rotterdam Proposal and Other Outreach Efforts

Ridhi contacted Drs. Kayser and Liu, authors of a paper on eye color and the prediction of complex phenotypes from genotypes. When asked how to obtain an anonymous, yet real data set, with genotypic and corresponding phenotypic data for the purpose of testing statistical models, they responded telling us to write a proposal to the management team of the Rotterdam Study. However, they did indicate that since certain expectations from the researchers requesting the data are usually in place before such data can be given out, there is the possibility that we wouldn't be given this data set. Dr.Liu stressed that creating dummy data would not be straightforward due to [linkage-disequilibrium], and he suggested to download [HapMap] data and create phenotypes based on genotypes at specific loci.

Here is the proposal that was sent to the management team.

Ridhi also tried contacting Dr.Shriver for a real data set, but we have yet to receive a real data set.

Jacqui contacted Amy Carmargo at the Broad Institute. She works on the genotyping, sequencing and haplotype determination of [|select candidate genes]. Her paper "Association of genetic variants in KCNH2 with QT interval duration in the Framingham Heart Study" was of particular interest to us because this study had a good documentation of the SNP Genotypes and Echocardiographic Phenotypes]. We wanted to see if we could get a real data-set from this study to test our model with.

After it became clear that the next best thing to having the corresponding data sets from these studies, would actually be to download HapMap data, Jacqui was able to successfully view data on SNPs for eye color after downloading HaploView.

In class Professor Church had mentioned the problem of chromosome location standardization. Since documentation has not been standardized, different locations in different studies that correlate snps with phenotypes could actually be addressing the same chromosomal location. In order to address issues related to this we contacted Bruce Birren who works on genome-wide mapping and sequencing programs in humans and directed sequencing projects for microbes at the Broad Institute.

Trait-o-matic add-ons

We thought that it would be very useful if one could type in a particular SNP location and get a listing of all of the genotypes for that location for everyone in the trait-o-matic database. This would act as a first step for building a model that tested the association between SNPs and phenotypic expressions based on research. This tool was then [implemented] into trait-o-matic by the infrastructure group.

We also thought it would be interesting if trait-o-matic would allow us to could search for SNPs that show a high minor allele frequency, and to then look for which ethnicities have the greatest variation for that SNP. To further this idea, it would be able to pick a region in the genome (based on characteristics that this region is generally known to modulate) and see a matrix revealing different allele frequencies by ethnicity. The tool above expands on the latter idea, but both ideas have yet to be implemented into trait-o-matic.


Future Directions and Cool Applications (Wish List)