# User:Justinhlo

(Difference between revisions)
 Revision as of 16:31, 8 January 2007 (view source)← Previous diff Revision as of 16:33, 8 January 2007 (view source)m (→Potential Models for DNA modeling)Next diff → Line 90: Line 90: #*Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC). #*Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC). #The modified G-C content equation, with more length-dependent consideration: #The modified G-C content equation, with more length-dependent consideration: - #*Tm = 100.5 + 41*(yG+zC-836.4)/(wA+xT+yG+zC) + 16.6 log([Na]) + #*Tm = 100.5 + 41*(yG+zC-36.4)/(wA+xT+yG+zC) + 16.6 log([Na]) #*This is the same idea as the previous one unless the length changes.  So unless we are comparing length, this will be roughly the same result as above #*This is the same idea as the previous one unless the length changes.  So unless we are comparing length, this will be roughly the same result as above #The Wetmur variant of the G-C method includes a % mismatch term: #The Wetmur variant of the G-C method includes a % mismatch term: Line 96: Line 96: #*Tm = 81.5 + [(16.6)(log{[Na+]/(1.0+0.7([Na+]))}]+[0.41(%GC)] - 500/(wA+xT+yG+zC) - P #*Tm = 81.5 + [(16.6)(log{[Na+]/(1.0+0.7([Na+]))}]+[0.41(%GC)] - 500/(wA+xT+yG+zC) - P #*P = % mismatch (unclear if this is used as a direct number or as a decimal). #*P = % mismatch (unclear if this is used as a direct number or as a decimal). - #The NN model (with H and S ..): + #The NN model (with H and S ..): + #*Tm = H/(S - R ln(4/concentration of DNA)) #*It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions.  That means we have limited options for the fitting of the actual curve. #*It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions.  That means we have limited options for the fitting of the actual curve. #*This one is interesting in all three cases, as the ionic stuff affects entropy, the mismatching affects enthalpy (and entropy, I guess), and length affects both. #*This one is interesting in all three cases, as the ionic stuff affects entropy, the mismatching affects enthalpy (and entropy, I guess), and length affects both.

## DNA Melting Project: IAP 2007

• There are three main areas of experimentation listed here, but I think that some will yield more interesting results than others. Maybe test all three, but only write about the 2 that would be of most interest to others hoping to implement this module?
• Also, perhaps make an estimate of the costs for the materials, such as in the AFM paper? Only if it's an appealing price, of course ..
• I will record not only set-ups, but also an approximate timeline - how many hours must be spent at each stage, etc.; maybe highlight spots where doing things precisely is very important. It might be good to give an estimate for how long the module would take?
• While MATLAB/Python code probably won't be included in the actual paper, I've seen papers that say "if you want the code, just e-mail the author" or something like that - maybe that would be a good idea in case teachers wish to provide analysis hints to students?

### Ionic Strength

Using either NaCl or KCl (is there an advantage to one over the other? Does Na+ or K+ interact more with O-?). Original 19-bp sequence is fine ..

Goal: show how the set-up can be used to investigate the significant effects of ionic particle concentrations on DNA melting parameters.

```Control and Experimental Groups:
0 mM (control)
1 mM
10 mM (original module condition)
150 mM (physiological conditions, see
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2703503&dopt=Abstract)
1000 mM (SantaLucia paper’s conditions – I’m curious to see if we get similar results).
```

These conditions form an approximately logarithmic distribution between 0 M and 1 M. This is justified because the projected dependence of ΔS and ΔH on ionic concentration is also logarithmic. The 150 mM may be replaced by 100 mM if it is deemed more desirable.

It may be worth looking into the 50 mM concentration, because this is what PCRs are run at (50 mM KCl, specifically), and it is also what many of the models have actually been designed for.

### Mismatches

C, T are pyrimidines (small)
A, G are purines (large)

The original 19-bp sequences that we tested were:

```19bp perfect match:
5'- ATCAA GCAGC CATGC AAAT -3'
3'- TAGTT CGTCG GTACG TTTA -5'

19bp single-base mismatch (SNP) [T-C mismatch]:
5'- ATCAA GCATC CATGC AAAT -3'
3'- TAGTT CGTCG GTACG TTTA -5'
^
```

The mismatch is a pyrimidine-pyrimidine type. I would be curious to compare that to an SNP of purine-purine type would increase the differences, so for instance:

```19bp single-base mismatch (SNP) [A-G mismatch]:
5'- ATCAA GCAGC CATGC AAAT -3'
3'- TAGTT CGTAG GTACG TTTA -5'
^
```

The original sequence has 8 G-C pairs, or 42%, which is in the middle of what seems to be the normal 20-60% range (for proks, anyway .. http://insilico.ehu.es/oligoweb/index2.php?m=all). I think this is a good representative sequence, and it may be redundant to bother with different G-C contents, since it's pretty well-established that the more G-C, the higher the melting temperature (presumably due to hydrogen bonding effects?). One thing I do wonder about is the relationship of G-C content to ionic strength sensitivity of the Tm values, since the triple H-bond format could interact differently with Na/K as compared to the 2-H-bond format. Anyway, that isn't that important, probably.

Between the 1-mismatch and many-mismatch cases, it would be nice to have a somewhat-mismatched case that could simulate the actual relationship between different alleles at homologous chromosomes' loci or a random construct trying to integrate into DNA.

```5'- ATCAA GCAGC CATGC AAAT -3'
3'- TACTC CGTCT GAACG TCTA -5'
^ ^     ^  ^     ^
```

The original total mismatch case used the sequence:

```19bp total mismatch
5'- ATCAA GCAGC CATGC AAA -3'
3'- TATTC TGTTC CTGGT TTCC -5'
^^^ ^^ ^^ ^ ^^^   ^^
```

I do not know if the top line is a typo or not .. it doesn't actually have 19 bp ...

This is not really a "total mismatch," but even if it hybridizes, (at the ends), it will probably always show up as ssDNA due to the behavior of the intercalating dye. Is it significant that the ends are where there is a pair of matching bases on either side? Why are all the matching bases A/T? Could this lead to self-annealing?

### Length

Most models for calculating the Tm values of DNA are only designed to be accurate for relatively short strands of DNA, ~15-50 bp. The Wallace formula, which has T proportional to length, obviously cannot hold up for long strands of DNA. The GC methods max out at around 80-some degrees C, which is much more realistic. However, for long strands of DNA, most people are not interested in a melting temperature, since all DNA melts at 90 degrees C. Thus, the region of interest corresponds to the region in which people design primers and other short segments of DNA.

The original lengths of 19 and 40 are okay, but it would be instructive if other lengths such as 30 or 50 were included in order to extract more H and S values and thus help with N-N validation analysis.

### Potential Models for DNA modeling

This interesting paper actually lists a good number of methods (including the one we used in the module). http://bioinformatics.oxfordjournals.org/cgi/content/full/21/6/711. However, while it compares the methods thoroughly, it does not run the empirical experiments in order to see which one is actually right.

Here are a few models worth investigating:

1. The very basic equation used for very short sequences:
• Tm= (wA+xT) * 2 + (yG+zC) * 4.
• Probably only interesting for length <20, or else the temperatures predicted are quite bad.
2. The standard G-C content equation:
• Tm= 64.9 +41*(yG+zC-16.4)/(wA+xT+yG+zC).
3. The modified G-C content equation, with more length-dependent consideration:
• Tm = 100.5 + 41*(yG+zC-36.4)/(wA+xT+yG+zC) + 16.6 log([Na])
• This is the same idea as the previous one unless the length changes. So unless we are comparing length, this will be roughly the same result as above
4. The Wetmur variant of the G-C method includes a % mismatch term:
5. The NN model (with H and S ..):
• Tm = H/(S - R ln(4/concentration of DNA))
• It seems that the last one here is probably the only one that actually tries to include entropic and enthalpic conditions. That means we have limited options for the fitting of the actual curve.
• This one is interesting in all three cases, as the ionic stuff affects entropy, the mismatching affects enthalpy (and entropy, I guess), and length affects both.
• Maybe there are other equations for extracting the H and S from the curve. Will look into this.

While these models do include salt-correction terms and sequence-based terms, I have not seen any adjustments for mismatches (correction: found the Wetmur one above). Perhaps we could derive an approximate "mismatch cost" term? The first mismatch is the most costly, and after that, the subsequent temperature drop per mismatch is less until some point where there are so many mismatches that no hybridization is possible.

Here is a very basic python program for using these four methods. I have not yet included any considerations for salt and mismatches.

```## Justin Lo
## January 8, 2007
## Code for predicting the Tm values for an arbitrary oligo sequence

from __future__ import division
from NN import NNEnergy
import math

done = False;
while(done is False):
seq = seq.upper(); ## convert to uniform state
seq = seq.replace(' ',''); ## get rid of any spaces
wxyz = [seq.count("A"),seq.count("C"),seq.count("G"),seq.count("T")]
TmOne = (wxyz[0]+wxyz[3])*2 + (wxyz[1]+wxyz[2])*4;
TmTwo = 64.9 + 41*(wxyz[1]+wxyz[2]-16.4)/(sum(wxyz));
salt = 50; ## salt concentration in mM
TmThree = 100.5 + 41*(wxyz[1]+wxyz[2]-36.4)/(sum(wxyz))+16.6*math.log10(salt/1000);
HS = NNEnergy((seq),salt/1000);
conc = 50e-6;
TmFour = (HS[0]*1000)/(HS[1]-1.987*math.log(4/conc))-273.15;

TmFive = 81.5 + (16.6)*math.log10((salt/1000)/(1+0.7*salt/1000))+41*(wxyz[1]+wxyz[2])/sum(wxyz)-500/sum(wxyz)
sym = ' oC'
print 'Wallace Method: '+str(TmOne)+sym
print 'Basic %GC Method: '+str(TmTwo)+sym
print '%GC Method: '+str(TmThree)+sym
print 'NN Method: '+str(TmFour)+sym
print 'Wetmur %GC Method: '+str(TmFive)+sym
query = raw_input("Do you want to try another sequence? y/n >>> ");
if(query == "n"):
done = True;
```