User talk:Brett Thomas: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
== October 13th: My Thoughts On The Project ==
'''High Level:''' I keep returning to the question of how we can use artificial intelligence and user community input to make a gene expression tool self learning. I think this would add another layer of analysis that traditional gene databases are missing.
'''The Problem(s):''' The central mechanism that each of these tools is trying to improve is the following: 1) identify a person's gene; 2) identify what non-genomic information can affect how this gene is expressed; 3) give the person information about future probabilities of outcome. It seems that the resources we've seen try to map step 1) to step 3) and on the whole do a poor job accounting for step 2).
This is actually a combination of two problems. The first is that information is not personalized enough. Companies like 23andme and Navigenics provide the above diagnostic tool, but it doesn't seem that they ask a person about their lifestyle. They give a bucket of information of the form: "You have gene 1 out of 3. If you have lifestyle A, you'll be susceptible to outcome X; and if you have lifestyle B, you'll be susceptible to outcome Y." It would be better if they provided targeted information of the form: "[First ask user their lifestyle] -> Since you have lifestyle B, you'll be susceptible to outcome Y. If you switch to lifestyle A, you'll transfer to outcome [somewhere between X and Y]" This is a subtle difference, and in these simple examples it doesn't seem important. But as research improves and environmental information becomes more targeted, I propose that users will begin to demand more and more targeted information.
How can research improve to make this information more targeted? That reveals the second problem: analyzing observational data is crucial. It seems there are two ways to figure out a genetic/environmental determinism: 1) do lab research to figure out the chemical mechanism that's happening in the cell; or 2) do a population study to figure out a correlation and try to determine cause and effect. It seems to me that (2) is much more promising in the near term. But the academic research method is insufficient - it'd take decades to acquire all the information we want through
'''The Solution(s):''' So the two problems are: 1) Need more targeted information; and 2) Need expression engine to be self-learning. I propose that these two problems can be solved together.
'''Relationships to Current Resources:'''
== Asst. 3: Project Ideas ==
== Asst. 3: Project Ideas ==
'''The concept:''' One common theme in the resources we've looked at is linking DNA data to personal data. One of the lessons I took from the last discussion was that dynamic data collection in the PGP would allow an important new layer of data analysis. I returned to this idea after looking at the gene identifier sites assigned this week - thye were trying to link traits and genes by working around what I see as the most basic way to do this: looking at people's genes and then asking them if they have a certain trait.  
'''The concept:''' One common theme in the resources we've looked at is linking DNA data to personal data. One of the lessons I took from the last discussion was that dynamic data collection in the PGP would allow an important new layer of data analysis. I returned to this idea after looking at the gene identifier sites assigned this week - thye were trying to link traits and genes by working around what I see as the most basic way to do this: looking at people's genes and then asking them if they have a certain trait.  

Revision as of 09:57, 13 October 2009

October 13th: My Thoughts On The Project

High Level: I keep returning to the question of how we can use artificial intelligence and user community input to make a gene expression tool self learning. I think this would add another layer of analysis that traditional gene databases are missing.

The Problem(s): The central mechanism that each of these tools is trying to improve is the following: 1) identify a person's gene; 2) identify what non-genomic information can affect how this gene is expressed; 3) give the person information about future probabilities of outcome. It seems that the resources we've seen try to map step 1) to step 3) and on the whole do a poor job accounting for step 2).

This is actually a combination of two problems. The first is that information is not personalized enough. Companies like 23andme and Navigenics provide the above diagnostic tool, but it doesn't seem that they ask a person about their lifestyle. They give a bucket of information of the form: "You have gene 1 out of 3. If you have lifestyle A, you'll be susceptible to outcome X; and if you have lifestyle B, you'll be susceptible to outcome Y." It would be better if they provided targeted information of the form: "[First ask user their lifestyle] -> Since you have lifestyle B, you'll be susceptible to outcome Y. If you switch to lifestyle A, you'll transfer to outcome [somewhere between X and Y]" This is a subtle difference, and in these simple examples it doesn't seem important. But as research improves and environmental information becomes more targeted, I propose that users will begin to demand more and more targeted information.

How can research improve to make this information more targeted? That reveals the second problem: analyzing observational data is crucial. It seems there are two ways to figure out a genetic/environmental determinism: 1) do lab research to figure out the chemical mechanism that's happening in the cell; or 2) do a population study to figure out a correlation and try to determine cause and effect. It seems to me that (2) is much more promising in the near term. But the academic research method is insufficient - it'd take decades to acquire all the information we want through

The Solution(s): So the two problems are: 1) Need more targeted information; and 2) Need expression engine to be self-learning. I propose that these two problems can be solved together.

Relationships to Current Resources:

Asst. 3: Project Ideas

The concept: One common theme in the resources we've looked at is linking DNA data to personal data. One of the lessons I took from the last discussion was that dynamic data collection in the PGP would allow an important new layer of data analysis. I returned to this idea after looking at the gene identifier sites assigned this week - thye were trying to link traits and genes by working around what I see as the most basic way to do this: looking at people's genes and then asking them if they have a certain trait.

Within the Personal Genome Project, I think such a mechanism would work as follows: researchers propose that a certain gene is associated with a certain trait. Researchers pose the question so it can be mapped to a discrete data set, and then send the questions to a targeted set of PGP-ers to get responses.

Implementation: I think this could be implemented as an extension to the PGP site. I think it'd take three infrastructure pieces:

  • Researcher facing engine: a platform that allows researchers to create questionnaires and specify which users they'll go to. Will also email users to say "we want to ask you another question."
  • User facing: a secure site for users to log in to quickly answer questions. Could be an app on the PGP site or standalone, depending on which aligns with the current rules.
  • Data: an extension to the current PGP data storage system to store data that is collected. Also could be directly integrated or a separate relational database with linked tables.

Notes:

  • Accounting for privacy: I think privacy would definitely be the biggest obstacle, particularly if we allow data to be cross tabulated, as many PGP-ers would be easy to uniquely identify.
  • What data to collect: I think the most important initial research would be to identify exactly what data researchers would want to collect. That work
  • API: I think a natural extension is to provide an API for the public to use. This would allow other gene sites to submit a (user, gene, trait) triplet. This would be an extensive undertaking, but may be worthwhile if such a service doesn't already exist.
  • Third party platforms: Another thought is that we could take advantage of a third party platform to create a quick app, like Google Health, Healthvault, or (the company that I worked for this summer) Keas.

Asst. 2: Modelling Gene Mutations

Here is a link to my code. It' spretty long (I did way too much) so I didn't want to clutter this page..

Asst. 1: Modeling Exponential Growth

I have some experience with python and excel, so the programming part of this asst wasn't very time consuming for me. I'm just going to throw out a few random notes:

  • The Model I was actually pretty confused by this model. These functions are some variant of Current Pop * constant factor. Seems like a more appropriate general model is Current pop to the power of constant factor. I just realized this a couple minutes ago, I'm sure I'll reconcile the difference before class.
  • Slide Without thinking, I copy/pasted Dr. Church's functions into excel as written. Then when I did the coding, I took it to mean linear growth with A2 representing the independent variable and A3 representing the output. This was dumb...and made the python coding like 10X harder too :)
  • Practicality When I actually understood what we were doing, I was able to analyze the biological component. In short: I really don't think exponential growth is a very practical model on either a population or evolutionary scale.
  • Population It seems that there have to be thousands of feedback loops when analyzing growth in a population. In the rabbit example, the true growth was probably only exponential for a short time before food enforced a negative feedback. On the other hand, if the first rabbits crowded out competitors, would have caused a positive feedback. The more I think about such examples, the more I think that exponential growth is more a corner case than a model.
  • Evolution Exponential growth makes even less sense to me when discussing evolutionary progress, because it seems evolution evolution of a species would "conquer" the lowest hanging fruit first. What I mean by this is that increases in brain size that were most effective probably came first, and then brain evolution would become subject to diminishing returns. If brain size is an indicator of progress, this would contradict the hypothesis from Slide 10: one has to be wrong..
  • Evolution vs. Technology Since I'm skeptical of the exponential model of evoluton, the analogy to Moore's Law becomes more interesting. Why should evolutionary vs. technological innovation be different? I wish I had the time to give this more thought, and hope we can in class today. One idea is that the pressures are different: transistor technology is measured absolutely, whereas in evolution a relative advantage is probably more important than an absolute advantage. Another is that it is more difficult for evolution to adjust the fundamental building blocks of a species, while Intel can easily switch from silicon to a graphite transistors if they can be abstracted to the same old x86 standards.