Harvard:Biophysics 101/2007/Notebook:Michael Wang/2007-3-15

From OpenWetWare
Jump to navigationJump to search

Step 1 The first thing I did was identify the orfs in the sequence. I could use my orf function from the last assignment, but I did it manually and found a start codon at position 62 and a stop position at position 73 for a 7 Aa orf. Pretty short.

>example1 <BR>                                                                     
CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG                  
C(ATGCCACCCACAACAACTTTTTAA)AAGAATCAGACGTGTGAAGGATTCTATTCGAATTA                   
CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC                  
CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT                   
ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG                   
CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC                  
GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC

Step 2

This is useless without some comparison so I blasted it and it matched to a sequence on human chromosome 10 with one SNP at position 202.

>ref|NT_030059.12|Hs10_30314 Download subject sequence spanning the HSP Homo sapiens chromosome 10 genomic contig, reference assembly
Length=44617998

 Features flanking this part of subject sequence:
   3895 bp at 5' side: hypothetical protein
   425 bp at 3' side: HtrA serine peptidase 1

 Score =  787 bits (397),  Expect = 0.0
 Identities = 400/401 (99%), Gaps = 0/401 (0%)
 Strand=Plus/Plus

Query  1         CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG  60
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968870  CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG  42968929

Query  61        CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA  120
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968930  CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA  42968989

Query  121       CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC  180
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968990  CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC  42969049

Query  181       CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT  240
                 |||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||
Sbjct  42969050  CGCGGACGCTGCCTTCGTCCGGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT  42969109

Query  241       ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG  300
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42969110  ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG  42969169

Query  301       CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC  360
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42969170  CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC  42969229

Query  361       GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC  401
                 |||||||||||||||||||||||||||||||||||||||||
Sbjct  42969230  GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC  42969270

Step 3

This SNP does not fall within the contained ORF so there does not immediately appear to be anything to be worried about. However, it might be worth it to check it against OMIM. Searching for HtrA serine peptidase 1 on Chromosone 10 gives one SNP:

http://www.ncbi.nlm.nih.gov/SNP/snp_ss.cgi?subsnp_id=16082056

This SNP matches up to the one we have from the blast comparison. Apparently it increases the risk of age-related macular degeneration, which has been verified in both a Hong Kong and Utah based population. If I were a physician, I would recommend that the patient seek the opinion of an optometrist and have genetic testing done on relatives to see if they are also at risk.

A python implementation would basically follow these same steps, starting with the ORF identifier from the previous assignment, then automating a blast search and a subsequent OMIM query of any features identified by blast. To allow batch comparisons, the program should be able to suck up multiple sequences from various files (implemented in previous programs).

Here's my test sequence:

>ExampleM
 CGTGGGCTGC TTCTTTCCCC AGGCGAAGCT CAACTTCCTC CCATTGTTCT GAACCTCTGT
 GTGGACATCT TCTTTCTTCA AACGCACCAC GGTAAAATTC TCGCCTGCCT CGAAACCCCG
 CCTACCTCTG AGATCTGAGG ACGGATACTA AACGCTGGAC TTAAGGCAAT GTACACATGT
 AAGCAGGCTC TGTAGGCACT CACTCCGCCC AGGTGCGCGC GTGGCGGAGG GGGAACAGAG
 AAGCAGGACA GCTCTCCATC CTTCCCGTGT TCAGTCGTGG GAGACAACAA GAGAGGTCAC
 AGCCTGGCGA CCAAAAAGTG CGGCTAACTT CCCTGCCCAA GCTGACTTTC TCTGCAGGGT
 TCAAGGTTAA TTGTGAGGAT TTACATTCGC ATGGCACACC CGCATCCCCC TCTACGTGGA
 AATATGTCTT AACTTTCATA ACTGCCTTGC CAGCAGGGTA TTTTTCGCTA GGGGCGAAGC
 GTCCTTCGCA AGCCACCCAG CTGACCGGCA G