User talk:Daniel M Jordan: Difference between revisions

Revision as of 10:24, 25 September 2009

Biophysics 101 Assignments

Python Code for Assignment 3

# encoding: utf-8
"""
assignment_3.py
(Biophysics 101 homework)
d m jordanus scripsit
2009.09.16
"""

from __future__ import division
import random

p53seg  = "cggagcagctcactattcacccgatgagaggggaggagagagagagaaaatgtcctttag"
p53seg += "gccggttcctcttacttggcagagggaggctgctattctccgcctgcatttctttttctg"
p53seg += "gattacttagttatggcctttgcaaaggcaggggtatttgttttgatgcaaacctcaatc"
p53seg += "cctccccttctttgaatggtgtgccccaccccccgggtcgcctgcaacctaggcggacgc"
p53seg += "taccatggcgtagacagggagggaaagaagtgtgcagaaggcaagcccggaggcactttc"
p53seg += "aagaatgagcatatctcatcttcccggagaaaaaaaaaaaagaatggtacgtctgagaat"
p53seg += "gaaattttgaaagagtgcaatgatgggtcgtttgataatttgtcgggaaaaacaatctac"
p53seg += "ctgttatctagctttgggctaggccattccagttccagacgcaggctgaacgtcgtgaag"
p53seg += "cggaaggggcgggcccgcaggcgtccgtgtggtcctccgtgcagccctcggcccgagccg"
p53seg += "gttcttcctggtaggaggcggaactcgaattcatttctcccgctgccccatctcttagct"
p53seg += "cgcggttgtttcattccgcagtttcttcccatgcacctgccgcgtaccggccactttgtg"
p53seg += "ccgtacttacgtcatctttttcctaaatcgaggtggcatttacacacagcgccagtgcac"
p53seg += "acagcaagtgcacaggaagatgagttttggcccctaaccgctccgtgatgcctaccaagt"
p53seg += "cacagacccttttcatcgtcccagaaacgtttcatcacgtctcttcccagtcgattcccg"
p53seg += "accccacctttattttgatctccataaccattttgcctgttggagaacttcatatagaat"
p53seg += "ggaatcaggatgggcgctgtggctcacgcctgcactttggctcacgcctgcactttggga"
p53seg += "ggccgaggcgggcggattacttgaggataggagttccagaccagcgtggccaacgtggt"

standard = { 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C',
             'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C',
             'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*',
             'ttg': 'L', 'tcg': 'S', 'tag': '*', 'tgg': 'W',

             'ctt': 'L', 'cct': 'P', 'cat': 'H', 'cgt': 'R',
             'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R',
             'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R',
             'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R',

             'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S',
             'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S',
             'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R',
             'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R',

             'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G',
             'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G',
             'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G',
             'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G'
        }

complement = {'a' : 't', 't' : 'a', 'g' : 'c', 'c' : 'g'}

# problem 1: determine GC content
print
print "1."
gc_count = p53seg.count('g') + p53seg.count('c')
total_count = len(p53seg)
percentage = gc_count / total_count * 100
print "GC count: {0}%".format(percentage)

# problem 2: find reverse complement sequence
print
print "2."
rev_compl_list = [complement[nucl] for nucl in reversed(p53seg)]
rev_compl_str = "".join(rev_compl_list)
print "reverse complement sequence:", rev_compl_str

# problem 3: find protein sequences
print
print "3."
def protein_sequence(sequence, offset):
    seq_iter = iter(sequence)
    for i in range(offset):
        next(seq_iter)
    aa_seq = ""
    try:
        while True:
            nuc1 = next(seq_iter)
            nuc2 = next(seq_iter)
            nuc3 = next(seq_iter)
            codon = nuc1 + nuc2 + nuc3
            aa_seq += standard[codon]
    except StopIteration:
        return aa_seq

plus1_sequence = protein_sequence(p53seg, 0) # saved for later comparison
print "+1 frame", plus1_sequence
print "+2 frame", protein_sequence(p53seg, 1)
print "+3 frame", protein_sequence(p53seg, 2)
print "-1 frame", protein_sequence(rev_compl_str, 0)
print "-2 frame", protein_sequence(rev_compl_str, 1)
print "-3 frame", protein_sequence(rev_compl_str, 2)

# problem 4: random mutation
print
print "4."
def mutate(sequence):
    new_sequence = ""
    mut_count = 0
    for character in sequence:
        new_char = character
        if random.random() < 0.01:
            mut_count += 1
            while new_char == character:
                new_char = random.choice(["a","c","t","g"])
        new_sequence += new_char
    print "made {0} nucleotide mutations".format(mut_count)
    return new_sequence

for i in range(4):
    print "mutation {0}".format(i + 1)
    mutated_sequence = protein_sequence(mutate(p53seg), 0)
    print mutated_sequence
    print "amino acid changes:"
    for j, (mutated, original) in enumerate(zip(mutated_sequence, plus1_sequence)):
        if mutated != original:
            print "{0}{1}{2}".format(original, j + 1, mutated)

print """
In about 1000 nucleotides, we expect about 10 nucleotide mutations.  Many of
these are silent when translated into amino acids, and every mutant typically
has at least one silent mutation.  Premature stop codons are rarer but 
definitely present.  I think the majority of mutants have no premature stops,
but many have 1 or 2.  Higher numbers are rarer.
(I wrote this note based on previous runs of this code; whatever random 
numbers it happens to have drawn this time may disagree with what I said.)
"""

My Observations on Exponentials (Due Tue 15-Sep-2009)

The exponentials for values near k = 1 look like what I expect exponentials to look like, including the fact that they are inverted when k < 1 and get steeper as k gets higher above 1. The logistic functions at the same values look like s-shaped curves, also inverted for k < 1 (though there may not actually be an s in that graph, it's difficult to tell) and getting steeper as k gets higher.
For k = 3 and above, they start to look less like smooth curves and more like sharp spikes, but still with the same general shape. It's also very difficult to tell the three curves with high k apart by eye. By contrast, the logistic curves at high k have totally different behavior from the ones at low k and are easy to tell apart from each other — they rise up to what would be the top of the s-curve and then start oscillating, with wider oscillations for higher k.
When the oscillations are large enough to take the population down to 0, the oscillation stops dead, even though it was at 1 a moment before. This models the fact that a population that's gone extinct will never come back — an obvious conclusion if we're thinking in terms of populations.
The exponential function we're modeling seems to be k^t, where t is time. The logistic function, according to google, is (1 + e^-t)^-1, but I can't figure out where the k comes in (which would have made writing it in python easier).

@@ Line 4: / Line 4: @@
 ===Python Code for Assignment 3===
-<pre><code># encoding: utf-8
+<code><pre># encoding: utf-8
 """
 assignment_3.py
@@ Line 130: / Line 130: @@
 (I wrote this note based on previous runs of this code; whatever random
 numbers it happens to have drawn this time may disagree with what I said.)
-"""</code></pre>
+"""</pre></code>
 ===My Observations on Exponentials (Due Tue 15-Sep-2009)===

User talk:Daniel M Jordan: Difference between revisions

Revision as of 10:24, 25 September 2009

Biophysics 101 Assignments

Python Code for Assignment 3

My Observations on Exponentials (Due Tue 15-Sep-2009)

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools