Harvard:Biophysics 101/2007/Notebook:Christopher Nabel/2007-5-3: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
m (New page: ==Tasks Due Today== Our goal for today was to establish a fully-operational version of the group code. Given that Mike wrote an expansive code for mis-match analysis, I dedicated all of ...) |
mNo edit summary |
||
Line 25: | Line 25: | ||
# and the query sequence matches the hit sequence at the SNP position | # and the query sequence matches the hit sequence at the SNP position | ||
if int(score) > 100 and position >= len(id_hit): | if int(score) > 100 and position >= len(id_hit): | ||
if id_query == id_hit: parsed.append( | if id_query == id_hit: parsed.append(id) | ||
return parsed | return parsed | ||
</pre> | </pre> | ||
I used our old friend, [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=K00396 Apoe], | I used our old friend, [http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=K00396 Apoe], and ran it through the old and updated versions of the group code. Here's the output from the old code: | ||
<pre> |
Revision as of 22:01, 2 May 2007
Tasks Due Today
Our goal for today was to establish a fully-operational version of the group code. Given that Mike wrote an expansive code for mis-match analysis, I dedicated all of my time to a task I would reasonably be able to complete by this date. The task I assumed was the elimination of 'false positives'--SNPs identified by the BLAST algorithm, which weren't actually found in the query sequence. I executed this task by placing additional constraints on current code to extract SNP data. Here is the code I updated:
# extracts snp data def extract_snp_data(str): dom = parseString(str) variants = dom.getElementsByTagName("Hit") if len(variants) == 0: return parsed = [] for v in variants: # now populate the struct hit_def = get_text(v.getElementsByTagName("Hit_def")[0].childNodes) id_query = get_text(v.getElementsByTagName("Hsp_hseq")[0].childNodes) id_hit = get_text(v.getElementsByTagName("Hsp_qseq")[0].childNodes) score = get_text(v.getElementsByTagName("Hsp_score")[0].childNodes) id = get_text(v.getElementsByTagName("Hit_accession")[0].childNodes) # extract position of the SNP from Hit Definition lower_bound = hit_def.find("pos=")+4 upper_bound = hit_def.find("len=")-1 position = int(Hit_def[lower_bound:upper_bound]) # only consider it a genuine snp if the hit score is above 100, # the query/hit sequences are longer than the position of the SNP # and the query sequence matches the hit sequence at the SNP position if int(score) > 100 and position >= len(id_hit): if id_query == id_hit: parsed.append(id) return parsed
I used our old friend, Apoe, and ran it through the old and updated versions of the group code. Here's the output from the old code: