Talk:Harvard:Biophysics 101/2009/Project: Difference between revisions

Revision as of 15:02, 7 October 2009

Assignment 4.

Phenomenal Pheno-matic

Our (Anugraha's [[1]] and Kelly's [[2]]) idea for the class project was to create a computationally-based array that would allow us to discover new genotype-phenotype connections as well as check existing links that have been previously documented. To do this, we would need access to the genomic data from the Personal Genome Project, as well as the test subject's personal phenotypes. Then, we would scroll through all of the PGP data and search for genes that were found in multiple cases of a certain phenotype. We could expand this out to test all possible ORFs and all possible genotypes, utilizing a similar approach to running a DNA microarray experiment in wet-lab biology. We would then get a list of genes that were overexpressed in individuals expressing a certain phenotype. We would then use OMIM [[3]], GeneTests [[4]], and SNPedia[[5]] to see if any of the genes were already documented. All three databases might have an API that would allow this part of the program to run easier; otherwise, we could study how to search online databases via some sort of webcrawler. This would allow us to focus on generating novel hypotheses about which genes might be linked to which phenotypes.

In order to show that our method is working, we would first take test subjects from PGP with known phenotypes and genotypes, and take a gene which is widely documented in OMIM and other primary research to produce a certain phenotype. Then we would run a sequence alignment with the genotypic sequence from OMIM and see if this sequence is present in the genotype of the known person from the PGP database with the desired phenotype. We would also double check by putting this sequence into SNPedia to see if it gives us the known phenotype. Once we have done this, we would have shown that we can use this method to find new phenotype-genotype associations. The program would then be developed for beta release.

As an additional side note, such a program would have the capability to take on a variety of tangential functions. For example, we could expand it to look for polygenic traits. Theoretically, our algorithm would have already identified any genes that were overexpressed. We could then add code to see if there were any instances in which phenotypes existed without the presence of all of our expected genes. This could lead to the identification of true contributors, thus expanding the program's potential and driving it towards the frontier of systems biology - data-driven research.

To infinity - and Human 2.0 - and beyond!

Kelly and Anugraha

Talk:Harvard:Biophysics 101/2009/Project: Difference between revisions

Revision as of 15:02, 7 October 2009

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools

@@ Line 1: / Line 1: @@
+Assignment 4.
+''Phenomenal Pheno-matic''
+Our (Anugraha's [[http://openwetware.org/wiki/User_talk:Anugraha_Raman]] and Kelly's [[http://openwetware.org/wiki/User_talk:Kelly_Brock]]) idea for the class project was to create a computationally-based array that would allow us to discover new genotype-phenotype connections as well as check existing links that have been previously documented.  To do this, we would need access to the genomic data from the Personal Genome Project, as well as the test subject's personal phenotypes.  Then, we would scroll through all of the PGP data and search for genes that were found in multiple cases of a certain phenotype.  We could expand this out to test all possible ORFs and all possible genotypes, utilizing a similar approach to running a DNA microarray experiment in wet-lab biology.  We would then get a list of genes that were overexpressed in individuals expressing a certain phenotype.  We would then use OMIM [[http://www.ncbi.nlm.nih.gov/omim/]], GeneTests [[http://www.ncbi.nlm.nih.gov/sites/GeneTests/?db=GeneTests]], and SNPedia[[http://www.snpedia.com/index.php/SNPedia]] to see if any of the genes were already documented.  All three databases might have an API that would allow this part of the program to run easier; otherwise, we could study how to search online databases via some sort of webcrawler.  This would allow us to focus on generating novel hypotheses about which genes might be linked to which phenotypes.
+In order to show that our method is working, we would first take test subjects from PGP with known phenotypes and genotypes, and take a gene which is widely documented in OMIM and other primary research to produce a certain phenotype.  Then we would run a sequence alignment with the genotypic sequence from OMIM and see if this sequence is present in the genotype of the known person from the PGP database with the desired phenotype. We would also double check by putting this sequence into SNPedia to see if it gives us the known phenotype.  Once we have done this, we would have shown that we can use this method to find new phenotype-genotype associations.  The program would then be developed for beta release.
+As an additional side note, such a program would have the capability to take on a variety of tangential functions.  For example, we could expand it to look for polygenic traits.  Theoretically, our algorithm would have already identified any genes that were overexpressed.  We could then add code to see if there were any instances in which phenotypes existed without the presence of all of our expected genes.  This could lead to the identification of true contributors, thus expanding the program's potential and driving it towards the frontier of systems biology - data-driven research.
+To infinity - and Human 2.0 - and beyond!
+Kelly and Anugraha