# User:Justinhlo

(Difference between revisions)
 Revision as of 16:13, 8 January 2007 (view source) (→Potential Models for DNA modeling)← Previous diff Revision as of 16:44, 8 January 2007 (view source) (→Potential Models for DNA modeling)Next diff → Line 83: Line 83: #*Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC). #*Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC). #The modified G-C content equation, with more length-dependent consideration: #The modified G-C content equation, with more length-dependent consideration: - #*Tm = 100.5 + 41*(yG+zC-836.4)/(wA+xT+yG+zC). + #*Tm = 100.5 + 41*(yG+zC-836.4)/(wA+xT+yG+zC) + 16.6 log([Na]) #*This is the same idea as the previous one unless the length changes.  So unless we are comparing length, this will be roughly the same result as above #*This is the same idea as the previous one unless the length changes.  So unless we are comparing length, this will be roughly the same result as above + #The Wetmur variant of the G-C method includes a % mismatch term: + #*(see http://www-nmr.cabm.rutgers.edu/bioinformatics/cogs/Tm_predict.html) + #*Tm = 81.5 + [(16.6)(log{[Na+]/(1.0+0.7([Na+]))}]+[0.41(%GC)] - 500/(wA+xT+yG+zC) - P + #*P = % mismatch (unclear if this is used as a direct number or as a decimal). #The NN model (with H and S ..): #The NN model (with H and S ..): #*It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions.  That means we have limited options for the fitting of the actual curve. #*It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions.  That means we have limited options for the fitting of the actual curve. Line 90: Line 94: #*Maybe there are other equations for extracting the H and S from the curve.  Will look into this. #*Maybe there are other equations for extracting the H and S from the curve.  Will look into this. - While these models do include salt-correction terms and sequence-based terms, I have not seen any adjustments for mismatches.  Perhaps we could derive an approximate "mismatch cost" term?  The first mismatch is the most costly, and after that, the subsequent temperature drop per mismatch is less until some point where there are so many mismatches that no hybridization is possible. + While these models do include salt-correction terms and sequence-based terms, I have not seen any adjustments for mismatches (correction: found the Wetmur one above).  Perhaps we could derive an approximate "mismatch cost" term?  The first mismatch is the most costly, and after that, the subsequent temperature drop per mismatch is less until some point where there are so many mismatches that no hybridization is possible.

Line 113: Line 117: TmOne = (wxyz[0]+wxyz[3])*2 + (wxyz[1]+wxyz[2])*4; TmOne = (wxyz[0]+wxyz[3])*2 + (wxyz[1]+wxyz[2])*4; TmTwo = 64.9 + 41*(wxyz[1]+wxyz[2]-16.4)/(sum(wxyz)); TmTwo = 64.9 + 41*(wxyz[1]+wxyz[2]-16.4)/(sum(wxyz)); - TmThree = 100.5 + 41*(wxyz[1]+wxyz[2]-36.4)/(sum(wxyz)); salt = 50; ## salt concentration in mM salt = 50; ## salt concentration in mM + TmThree = 100.5 + 41*(wxyz[1]+wxyz[2]-36.4)/(sum(wxyz))+16.6*math.log10(salt/1000); HS = NNEnergy((seq),salt/1000); HS = NNEnergy((seq),salt/1000); - conc = 33e-6; + conc = 50e-6; TmFour = (HS[0]*1000)/(HS[1]-1.987*math.log(4/conc))-273.15; TmFour = (HS[0]*1000)/(HS[1]-1.987*math.log(4/conc))-273.15; - print TmOne + - print TmTwo + TmFive = 81.5 + (16.6)*math.log10((salt/1000)/(1+0.7*salt/1000))+41*(wxyz[1]+wxyz[2])/sum(wxyz)-500/sum(wxyz) - print TmThree + sym = ' oC' - print TmFour + print 'Wallace Method: '+str(TmOne)+sym + print 'Basic %GC Method: '+str(TmTwo)+sym + print '%GC Method: '+str(TmThree)+sym + print 'NN Method: '+str(TmFour)+sym + print 'Wetmur %GC Method: '+str(TmFive)+sym query = raw_input("Do you want to try another sequence? y/n >>> "); query = raw_input("Do you want to try another sequence? y/n >>> "); if(query == "n"): if(query == "n"): done = True; done = True;

## DNA Melting Project: IAP 2007

### Ionic Strength

Using either NaCl or KCl (is there an advantage to one over the other? Does Na+ or K+ interact more with O-?). Original 19-bp sequence is fine ..

Goal: show how the set-up can be used to investigate the significant effects of ionic particle concentrations on DNA melting parameters.

```Control and Experimental Groups:
0 mM (control)
1 mM
10 mM (original module condition)
150 mM (physiological conditions, see
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2703503&dopt=Abstract)
1000 mM (SantaLucia paper’s conditions – I’m curious to see if we get similar results).
```

These conditions form an approximately logarithmic distribution between 0 M and 1 M. This is justified because the projected dependence of ΔS and ΔH on ionic concentration is also logarithmic. The 150 mM may be replaced by 100 mM if it is deemed more desirable.

It may be worth looking into the 50 mM concentration, because this is what PCRs are run at (50 mM KCl, specifically), and it is also what many of the models have actually been designed for.

### Mismatches

C, T are pyrimidines (small)
A, G are purines (large)

The original 19-bp sequences that we tested were:

```19bp perfect match:
5'- ATCAA GCAGC CATGC AAAT -3'
3'- TAGTT CGTCG GTACG TTTA -5'

19bp single-base mismatch (SNP) [T-C mismatch]:
5'- ATCAA GCATC CATGC AAAT -3'
3'- TAGTT CGTCG GTACG TTTA -5'
^
```

The mismatch is a pyrimidine-pyrimidine type. I would be curious to compare that to an SNP of purine-purine type would increase the differences, so for instance:

```19bp single-base mismatch (SNP) [A-G mismatch]:
5'- ATCAA GCAGC CATGC AAAT -3'
3'- TAGTT CGTAG GTACG TTTA -5'
^
```

The original sequence has 8 G-C pairs, or 42%, which is in the middle of what seems to be the normal 20-60% range (for proks, anyway .. http://insilico.ehu.es/oligoweb/index2.php?m=all). I think this is a good representative sequence, and it may be redundant to bother with different G-C contents, since it's pretty well-established that the more G-C, the higher the melting temperature (presumably due to hydrogen bonding effects?). One thing I do wonder about is the relationship of G-C content to ionic strength sensitivity of the Tm values, since the triple H-bond format could interact differently with Na/K as compared to the 2-H-bond format. Anyway, that isn't that important, probably.

Between the 1-mismatch and many-mismatch cases, it would be nice to have a somewhat-mismatched case that could simulate the actual relationship between different alleles at homologous chromosomes' loci or a random construct trying to integrate into DNA.

```5'- ATCAA GCAGC CATGC AAAT -3'
3'- TACTC CGTCT GAACG TCTA -5'
^ ^     ^  ^     ^
```

The original total mismatch case used the sequence:

```19bp total mismatch
5'- ATCAA GCAGC CATGC AAA -3'
3'- TATTC TGTTC CTGGT TTCC -5'
^^^ ^^ ^^ ^ ^^^   ^^
```

I do not know if the top line is a typo or not .. it doesn't actually have 19 bp ...

This is not really a "total mismatch," but even if it hybridizes, (at the ends), it will probably always show up as ssDNA due to the behavior of the intercalating dye. Is it significant that the ends are where there is a pair of matching bases on either side?

[...]

### Potential Models for DNA modeling

This interesting paper actually lists a good number of methods (including the one we used in the module). http://bioinformatics.oxfordjournals.org/cgi/content/full/21/6/711. However, while it compares the methods thoroughly, it does not run the empirical experiments in order to see which one is actually right.

Here are a few models worth investigating:

1. The very basic equation used for very short sequences:
• Tm= (wA+xT) * 2 + (yG+zC) * 4.
• Probably only interesting for length <20, or else the temperatures predicted are quite bad.
2. The standard G-C content equation:
• Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC).
3. The modified G-C content equation, with more length-dependent consideration:
• Tm = 100.5 + 41*(yG+zC-836.4)/(wA+xT+yG+zC) + 16.6 log([Na])
• This is the same idea as the previous one unless the length changes. So unless we are comparing length, this will be roughly the same result as above
4. The Wetmur variant of the G-C method includes a % mismatch term:
5. The NN model (with H and S ..):
• It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions. That means we have limited options for the fitting of the actual curve.
• This one is interesting in all three cases, as the ionic stuff affects entropy, the mismatching affects enthalpy (and entropy, I guess), and length affects both.
• Maybe there are other equations for extracting the H and S from the curve. Will look into this.

While these models do include salt-correction terms and sequence-based terms, I have not seen any adjustments for mismatches (correction: found the Wetmur one above). Perhaps we could derive an approximate "mismatch cost" term? The first mismatch is the most costly, and after that, the subsequent temperature drop per mismatch is less until some point where there are so many mismatches that no hybridization is possible.

Here is a very basic python program for using these four methods. I have not yet included any considerations for salt and mismatches.

```## Justin Lo
## January 8, 2007
## Code for predicting the Tm values for an arbitrary oligo sequence

from __future__ import division
from NN import NNEnergy
import math

done = False;
while(done is False):
seq = seq.upper(); ## convert to uniform state
seq = seq.replace(' ',''); ## get rid of any spaces
wxyz = [seq.count("A"),seq.count("C"),seq.count("G"),seq.count("T")]
TmOne = (wxyz[0]+wxyz[3])*2 + (wxyz[1]+wxyz[2])*4;
TmTwo = 64.9 + 41*(wxyz[1]+wxyz[2]-16.4)/(sum(wxyz));
salt = 50; ## salt concentration in mM
TmThree = 100.5 + 41*(wxyz[1]+wxyz[2]-36.4)/(sum(wxyz))+16.6*math.log10(salt/1000);
HS = NNEnergy((seq),salt/1000);
conc = 50e-6;
TmFour = (HS[0]*1000)/(HS[1]-1.987*math.log(4/conc))-273.15;

TmFive = 81.5 + (16.6)*math.log10((salt/1000)/(1+0.7*salt/1000))+41*(wxyz[1]+wxyz[2])/sum(wxyz)-500/sum(wxyz)
sym = ' oC'
print 'Wallace Method: '+str(TmOne)+sym
print 'Basic %GC Method: '+str(TmTwo)+sym
print '%GC Method: '+str(TmThree)+sym
print 'NN Method: '+str(TmFour)+sym
print 'Wetmur %GC Method: '+str(TmFive)+sym
query = raw_input("Do you want to try another sequence? y/n >>> ");
if(query == "n"):
done = True;
```