Harvard:Biophysics 101/2007/Notebook:Christopher Nabel/2007-5-3: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
m (New page: ==Tasks Due Today== Our goal for today was to establish a fully-operational version of the group code. Given that Mike wrote an expansive code for mis-match analysis, I dedicated all of ...) |
(No difference)
|
Revision as of 21:59, 2 May 2007
Tasks Due Today
Our goal for today was to establish a fully-operational version of the group code. Given that Mike wrote an expansive code for mis-match analysis, I dedicated all of my time to a task I would reasonably be able to complete by this date. The task I assumed was the elimination of 'false positives'--SNPs identified by the BLAST algorithm, which weren't actually found in the query sequence. I executed this task by placing additional constraints on current code to extract SNP data. Here is the code I updated:
# extracts snp data def extract_snp_data(str): dom = parseString(str) variants = dom.getElementsByTagName("Hit") if len(variants) == 0: return parsed = [] for v in variants: # now populate the struct hit_def = get_text(v.getElementsByTagName("Hit_def")[0].childNodes) id_query = get_text(v.getElementsByTagName("Hsp_hseq")[0].childNodes) id_hit = get_text(v.getElementsByTagName("Hsp_qseq")[0].childNodes) score = get_text(v.getElementsByTagName("Hsp_score")[0].childNodes) id = get_text(v.getElementsByTagName("Hit_accession")[0].childNodes) # extract position of the SNP from Hit Definition lower_bound = hit_def.find("pos=")+4 upper_bound = hit_def.find("len=")-1 position = int(Hit_def[lower_bound:upper_bound]) # only consider it a genuine snp if the hit score is above 100, # the query/hit sequences are longer than the position of the SNP # and the query sequence matches the hit sequence at the SNP position if int(score) > 100 and position >= len(id_hit): if id_query == id_hit: parsed.append(id_hit) return parsed
I used our old friend, Apoe,