Harvard:Biophysics 101/2007/Notebook:Xiaodi Wu/2007-5-3

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(A random thought)
m (update)
Line 26: Line 26:
Also, note that the article you linked to seems not be cited by anyone / no-one seems to be doing anything with it. Although citations can be a crude method for analyzing impact, in this case it seems to be accurate: zero impact.
Also, note that the article you linked to seems not be cited by anyone / no-one seems to be doing anything with it. Although citations can be a crude method for analyzing impact, in this case it seems to be accurate: zero impact.
 +
 +
== Progress update ==
 +
 +
Completed HTML interface, etc., [http://snp.med.harvard.edu/rosencrantz/ accessible here].

Revision as of 02:04, 3 May 2007

Contents

A proposal

Having acquired a hard-to-find but now easily-updatable source for local data on genes and their loci, it's now possible to work around the problem of not being able to read images off of blast output. All we need from blast is a locus, which Katie's script does elegantly. Then, with this data:

  • Query our local database to ask what genes are in the locus (a simple MySQL query), and very very quick, I hope
  • Find the reference sequence (we also have a local copy of the genome, and know the exact locus--from which bp to which bp--from the blast query)
  • Compare (we've all written scripts for that)
  • Translate into protein if a coding sequence, otherwise come up with some other way of expressing this
  • Output the mutations (we also get an alignment back from blast...this is wonderful) using OMIM's notation, like this: [BRIP1, MET299ILE] (basically, [{gene name}, {amino acid}{position}{amino acid}]) and search in OMIM (we already have code for that, obviously)
  • Reap the benefits! (Also, compare it to dbSNP data.)

Does this sound like a clear and workable plan to people? Are there other considerations to be factored in?

A request

Related to what Katie has asked for in class, I've focused a lot on the first few stages of things, getting from sequence to OMIM. Regarding the 'reaping the benefits' part above, could people who've worked on the subsequent steps outline on their wiki page, and then link to their page from the class tasks list page, a sort of step-by-step accounting of what happens to this data after OMIM, and what sort of results we get, just like we started doing in class? There's a lot of code and work that's evidently be poured into this effort, but it's still somewhat unclear to me what exactly it all does...

A random thought

How is the idea of putting bioinformatics data into a database and then running a query on the database considered interesting enough for a paper in 2005? How?


Very good question!! So for example of less-than-competent publications, check out the 'application notes' section of this journal. Someone did an analysis of how much of the applications were still functioning after a year, after two years etc. and the results are dismal. Here is the link:

http://www.ghastlyfop.com/blog/2007/03/software-availabilty-quick-survey-using.html

Also, note that the article you linked to seems not be cited by anyone / no-one seems to be doing anything with it. Although citations can be a crude method for analyzing impact, in this case it seems to be accurate: zero impact.

Progress update

Completed HTML interface, etc., accessible here.

Personal tools