Harvard:Biophysics 101/2007/Notebook:Xiaodi Wu/2007-3-15: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(adding part II)
m (formatting)
 
Line 80: Line 80:
From OMIM, it can be found that age-related macular degeneration is a disorder that arises due to many factors, genetics being only one. If this individual is heterozygous for the allele, then the risk to him or her of this disease is not appreciable, but having children homozygous for the allele would obviously be the remaining concern. If the individual is homozygous, then this higher genetic risk suggests that environmental risk factors should be avoided if possible; these include avoiding smoking, and monitoring diet and cholesterol levels.
From OMIM, it can be found that age-related macular degeneration is a disorder that arises due to many factors, genetics being only one. If this individual is heterozygous for the allele, then the risk to him or her of this disease is not appreciable, but having children homozygous for the allele would obviously be the remaining concern. If the individual is homozygous, then this higher genetic risk suggests that environmental risk factors should be avoided if possible; these include avoiding smoking, and monitoring diet and cholesterol levels.


PART II
 
== Part II ==
(Mystery sequence)
(Mystery sequence)
<pre>
<pre>

Latest revision as of 19:19, 13 March 2007

Input sequence:

>example1                                                                      
CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG                    
CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA                    
CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC                    
CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT                    
ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG                    
CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC                    
GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC

First step: Align this sequence on NCBI Blast to get info on where the sequence is or might be. The single result is shown below.

>ref|NT_030059.12|Hs10_30314 Homo sapiens chromosome 10 genomic contig, reference assembly
Length=44617998

 Features flanking this part of subject sequence:
   3895 bp at 5' side: hypothetical protein
   425 bp at 3' side: HtrA serine peptidase 1


 Score =  736 bits (398),  Expect = 0.0
 Identities = 400/401 (99%), Gaps = 0/401 (0%)
 Strand=Plus/Plus

Query  1         CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG  60
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968870  CACCCTCGCCAGTTACGAGCTGCCGAGCCGCTTCCTAGGCTCTCTGCGAATACGGACACG  42968929

Query  61        CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA  120
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968930  CATGCCACCCACAACAACTTTTTAAAAGAATCAGACGTGTGAAGGATTCTATTCGAATTA  42968989

Query  121       CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC  180
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42968990  CTTCTGCTCTCTGCTTTTATCACTTCACTGTGGGTCTGGGCGCGGGCTTTCTGCCAGCTC  42969049

Query  181       CGCGGACGCTGCCTTCGTCCAGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT  240
                 |||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||
Sbjct  42969050  CGCGGACGCTGCCTTCGTCCGGCCGCAGAGGCCCCGCGGTCAGGGTCCCGCGTGCGGGGT  42969109

Query  241       ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG  300
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42969110  ACCGGGGGCAGAACCAGCGCGTGACCGGGGTCCGCGGTGCCGCAACGCCCCGGGTCTGCG  42969169

Query  301       CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC  360
                 ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct  42969170  CAGAGGCCCCTGCAGTCCCTGCCCGGCCCAGTCCGAGCTTCCCGGGCGGGCCCCCAGTCC  42969229

Query  361       GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC  401
                 |||||||||||||||||||||||||||||||||||||||||
Sbjct  42969230  GGCGATTTGCAGGAACTTTCCCCGGCGCTCCCACGCGAAGC  42969270

Conclusion -- this sequence we have obtained is on chromosome 10, and there is one SNP apparent.

Second step: Examine the genome browser to find where exactly this is to help formulate the query in the next step. Result: 10q25, bases 124210300 to 124210800.

Third step: Look up SNPs at Entrez SNP based on genome browser information, to find the relevant SNPs and compare to the sequence we currently have (query: "10[CHR] AND 124210300:124210800[CHRPOS]") Two exist:

1: rs11200638 [Homo sapiens]
    AGCTCCGCGGACGCTGCCTTCGTCC[A/G]GCCGCAGAGGCCCCGCGGTCAGGGT

2: rs2672598 [Homo sapiens]
    CGCCGGACTGGGGGCCCGCCCGGGA[A/G]GCTCGGACTGGGCCGGGCAGGGACT

Fourth step: For the first SNP, we have A where the reference sequence has G. For the second SNP (which is on the minus strand), we have T as the complement, in agreement with the reference. Search OMIM to find out more about these loci and the implications of having these alleles (queries: "rs11200638" and "rs2672598")

Conclusions -- Homozygosity for the A allele in the case of the first SNP results in a tenfold increased risk of wet (neovascular) age-related macular degeneration. No information is available for the other locus.

Last step and physician's advice: From OMIM, it can be found that age-related macular degeneration is a disorder that arises due to many factors, genetics being only one. If this individual is heterozygous for the allele, then the risk to him or her of this disease is not appreciable, but having children homozygous for the allele would obviously be the remaining concern. If the individual is homozygous, then this higher genetic risk suggests that environmental risk factors should be avoided if possible; these include avoiding smoking, and monitoring diet and cholesterol levels.


Part II

(Mystery sequence)

>Example by Xiaodi
ACCTGGACCCCTGTGCCTTGTATGCATCTGAAGAGGAGATCGGGCAGTTGGTGAAGCAGATGCTGGATGA
CTTTGGACCACATCGCTACATTGCCAACCTGGGCCATGGGCTTTATCCTGACATGGACCCAGAACATGTG
GGCGCCTTTGTGGATGCTGTGCATAAACACTCACGTCTGCTTCGACAGAACTGAGTGTATACCTTTACCC
TCAAGTACCACTAACACAGATGATTGATCGTTTCCAGGACAATAAAAGTTTCGGAGTTGAACTATTGTGT
AGTTTTGTTTGTGAAAGATTGTGCCCATATCCTCAGTTCTTCTTAGCCTCTGCTCCTTCCCTGGGAACCC