TChan/Notebook/2007-4-16: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 21: Line 21:
# Return the search-URL + parsed-search-terms
# Return the search-URL + parsed-search-terms


===Code===


===Parsing===
<pre>
* Characters in the search-term will be:
import sys
** alpha
** apostrophe
** blank-space
** lowercase
* Thus, will need to check how each site handles this, in addition to fitting it within the search-URLs
* Will test each site using search-string: <code>"Hashimoto's Thyroiditis"</code>


====eMedicine====
# (Temporary) search_term will get whatever the input is
* Tested URL:
search_term = "Hashimoto's Thyroiditis"
<pre>http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=hashimoto's%20thyroiditis</pre>
* Case:
** lower
* Space:
** replaced with<code>%20</code>
* Apostrophe:
** left in where it was
* Location
# <pre>http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=</pre>
# term, with replacements


====Google (General Search)====
def parse_for_eMed(search_term):
* Google has its general search, as well as a "Treatment" search with more specific information
    parsed_term = search_term.lower().replace(' ', '%20')
    return "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=%s" % parsed_term


=====General=====
def parse_for_Google_genl(search_term):
* Tested URL
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
<pre>http://www.google.com/search?hl=en&q=hashimoto%27s+thyroiditis&btnG=Search</pre>
    return "http://www.google.com/search?hl=en&q=%s&btnG=Search" % parsed_term
* Case:
** lower
* Space:
** replaced with <code>+</code>
* Apostrophe:
** replaced with <code>%27</code>
* Location:
# <pre>http://www.google.com/search?hl=en&q=</pre>
# term, with replacements
# <pre>&btnG=Search</pre>


=====Treatment-Specific=====
def parse_for_Google_treatment(search_term):
* Tested URL    http://www.google.com/search?hl=en&q=hashimoto%27s+thyroiditis+more:condition_treatment&cx=disease_for_patients&sa=N&oi=cooptsr&resnum=0&ct=col1&cd=1
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
* Case:
    return "http://www.google.com/search?hl=en&q=%s+more:condition_treatment&cx=disease_for_patients&sa=N&oi=cooptsr&resnum=0&ct=col1&cd=1" % parsed_term
** lower
 
* Space:
def parse_for_Wikipedia(search_term):
** replaced with <code>+</code>
    parsed_term = search_term.lower().capitalize().replace("'", '%27').replace(' ', '_')
* Apostrophe:
    return "http://en.wikipedia.org/wiki/%s" % parsed_term
** replaced with <code>%27</code>
 
* Location:
def return_site_list(search_term):
# <pre>http://www.google.com/search?hl=en&q=</pre>
    # Currently returns site-name and URL list
# term, with replacements
    # ex. [["eMedicine", "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=parsed-term"]]
# <pre>+more:condition_treatment&cx=disease_for_patients&sa=N&oi=cooptsr&resnum=0&ct=col1&cd=1</pre>
    return [["eMedicine", parse_for_eMed(search_term)],
            ["Google, general search", parse_for_Google_genl(search_term)],
            ["Google, Treatment search", parse_for_Google_treatment(search_term)],
            ["Wikipedia", parse_for_Wikipedia(search_term)]]
 
final_list = return_site_list(search_term)
print final_list
   
</pre>

Revision as of 19:42, 16 April 2007

New Plan

  • INPUT: string of search-ready disease name or associated gene, ex. 'BRCA1', 'Hashimoto's Thyroiditis'
  • OUTPUT: list (of lists) of 1) base site name of 2) searched-URLs for the disease/gene

Sites to be Searched

  • General Patient Info
    • eMedicine
    • Google ('more:condition_treatment' is default)
    • Wikipedia
    • (WHO)
  • Less Patient-Friendly But Possibly Useful Info:
    • HapMap
    • OMIM
    • GeneCards


Tasks

  1. Parse the search-term for individual sites' search URLs
  2. Return the search-URL + parsed-search-terms

Code

import sys

# (Temporary) search_term will get whatever the input is 
search_term = "Hashimoto's Thyroiditis"

def parse_for_eMed(search_term):
    parsed_term = search_term.lower().replace(' ', '%20')
    return "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=%s" % parsed_term

def parse_for_Google_genl(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://www.google.com/search?hl=en&q=%s&btnG=Search" % parsed_term 

def parse_for_Google_treatment(search_term):
    parsed_term = search_term.lower().replace("'", '%27').replace(' ', '+')
    return "http://www.google.com/search?hl=en&q=%s+more:condition_treatment&cx=disease_for_patients&sa=N&oi=cooptsr&resnum=0&ct=col1&cd=1" % parsed_term

def parse_for_Wikipedia(search_term):
    parsed_term = search_term.lower().capitalize().replace("'", '%27').replace(' ', '_')
    return "http://en.wikipedia.org/wiki/%s" % parsed_term

def return_site_list(search_term):
    # Currently returns site-name and URL list
    # ex. [["eMedicine", "http://www.emedicine.com/cgi-bin/foxweb.exe/searchengine@/em/searchengine?boolean=and&book=all&maxhits=40&HiddenURL=&query=parsed-term"]]
    return [["eMedicine", parse_for_eMed(search_term)],
            ["Google, general search", parse_for_Google_genl(search_term)],
            ["Google, Treatment search", parse_for_Google_treatment(search_term)],
            ["Wikipedia", parse_for_Wikipedia(search_term)]]

final_list = return_site_list(search_term)
print final_list