TChan/Notebook/2007-4-16: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
Line 26: Line 26:


# (Temporary) search_term will get whatever the input is  
# (Temporary) search_term will get whatever the input is  
search_term = "Hashimoto's Thyroiditis"
search_term = "breast cancer"


def parse_for_eMed(search_term):
def parse_for_eMed(search_term):
Line 43: Line 43:
     parsed_term = search_term.lower().capitalize().replace("'", '%27').replace(' ', '_')
     parsed_term = search_term.lower().capitalize().replace("'", '%27').replace(' ', '_')
     return "http://en.wikipedia.org/wiki/%s" % parsed_term
     return "http://en.wikipedia.org/wiki/%s" % parsed_term
def parse_for_WHO(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://search.who.int/search?ie=utf8&site=default_collection&client=WHO&proxystylesheet=WHO&output=xml_no_dtd&oe=utf8&q=%s&Search=Search" % parsed_term
def parse_for_GeneCards(search_term):
    parsed_term = search_term.lower().replace(" ", '+')
    # NB: This only gives a functionally correct search if the search_term is a name of a disease
    # because there are other formats for different inputs and different forms of the input
    return "http://www.genecards.org/cgi-bin/cardsearch.pl?search_type=kwd&speed=fast&search=%s#MICROCARDS" % parsed_term


def return_site_list_for_disease(search_term):
def return_site_list_for_disease(search_term):
Line 50: Line 61:
             ["Google, general search", parse_for_Google_genl(search_term)],
             ["Google, general search", parse_for_Google_genl(search_term)],
             ["Google, Treatment search", parse_for_Google_treatment(search_term)],
             ["Google, Treatment search", parse_for_Google_treatment(search_term)],
             ["Wikipedia", parse_for_Wikipedia(search_term)]]
             ["Wikipedia", parse_for_Wikipedia(search_term)],
            ["WHO", parse_for_WHO(search_term)],
            ["GeneCards", parse_for_GeneCards(search_term)]


final_list = return_site_list_for_disease(search_term)
final_list = return_site_list_for_disease(search_term)
Line 58: Line 71:


===Next Steps===
===Next Steps===
* Accessing other, not-as-easily-accessible sites: PubMed, OMIM, HapMap, GeneCards
* Accessing other, not-as-easily-accessible sites: PubMed, OMIM, HapMap, GeneCards?
** GeneCards can be done - sort of.  The options for different inputs are the Examples on GeneCards main page, so we would need to know what kind of input, according to those options, we have in order to get the search information we really want.  However, the current format does give useful gene information when otehr inputs are used (ie. inputting ''wnt*'' gives a list of genes pertaining to wnt).
** The others have a hidden, non-URL based system I don't know how to deal with - yet...

Revision as of 20:19, 16 April 2007

New Plan

  • INPUT: string of search-ready disease name or associated gene, ex. 'BRCA1', 'Hashimoto's Thyroiditis'
  • OUTPUT: list (of lists) of 1) base site name of 2) searched-URLs for the disease/gene

Sites to be Searched

  • General Patient Info
    • eMedicine
    • Google
    • Wikipedia
    • (WHO)
  • Less Patient-Friendly But Possibly Useful Info:
    • HapMap
    • OMIM
    • GeneCards

Tasks

  1. Parse the search-term for individual sites' search URLs
  2. Return the search-URL + parsed-search-terms

Code

import sys

# (Temporary) search_term will get whatever the input is 
search_term = "breast cancer"

def parse_for_eMed(search_term):
    parsed_term = search_term.lower().replace(' ', '%20')
    return "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=%s" % parsed_term

def parse_for_Google_genl(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://www.google.com/search?hl=en&q=%s&btnG=Search" % parsed_term 

def parse_for_Google_treatment(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://www.google.com/search?hl=en&q=%s+more:condition_treatment&cx=disease_for_patients&sa=N&oi=cooptsr&resnum=0&ct=col1&cd=1" % parsed_term

def parse_for_Wikipedia(search_term):
    parsed_term = search_term.lower().capitalize().replace("'", '%27').replace(' ', '_')
    return "http://en.wikipedia.org/wiki/%s" % parsed_term

def parse_for_WHO(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://search.who.int/search?ie=utf8&site=default_collection&client=WHO&proxystylesheet=WHO&output=xml_no_dtd&oe=utf8&q=%s&Search=Search" % parsed_term

def parse_for_GeneCards(search_term):
    parsed_term = search_term.lower().replace(" ", '+')
    # NB: This only gives a functionally correct search if the search_term is a name of a disease
    # because there are other formats for different inputs and different forms of the input
    return "http://www.genecards.org/cgi-bin/cardsearch.pl?search_type=kwd&speed=fast&search=%s#MICROCARDS" % parsed_term


def return_site_list_for_disease(search_term):
    # Currently returns site-name and URL list
    # ex. [["eMedicine", "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=parsed-term"]]
    return [["eMedicine", parse_for_eMed(search_term)],
            ["Google, general search", parse_for_Google_genl(search_term)],
            ["Google, Treatment search", parse_for_Google_treatment(search_term)],
            ["Wikipedia", parse_for_Wikipedia(search_term)],
            ["WHO", parse_for_WHO(search_term)],
            ["GeneCards", parse_for_GeneCards(search_term)]

final_list = return_site_list_for_disease(search_term)
print final_list
    

Next Steps

  • Accessing other, not-as-easily-accessible sites: PubMed, OMIM, HapMap, GeneCards?
    • GeneCards can be done - sort of. The options for different inputs are the Examples on GeneCards main page, so we would need to know what kind of input, according to those options, we have in order to get the search information we really want. However, the current format does give useful gene information when otehr inputs are used (ie. inputting wnt* gives a list of genes pertaining to wnt).
    • The others have a hidden, non-URL based system I don't know how to deal with - yet...