User talk:Kelly Brock: Difference between revisions

Revision as of 18:55, 25 September 2009

Hello, Kelly Brock! This is a welcome message from OpenWetWare. By the way, we've announced you on the home page! You can leave messages to any OWW member by editing their User_talk pages like this one. And don't forget to personalize your User Page so that we can get to know you better! We've included some tips below to get you started.

Personal/Lab Info

Assignment 1. Python Epicness

I organized the Python code by questions 1,2,3, and 4 as indicated in the comments. I redirected the output stream into a txt file, which is what I turned in on Thursday for my answers. This was by far my favorite problem set this week! I like python as a language because it seems like a really good mix of C, Scheme (especially the lists and dictionaries), and Matlab. As far as the experiment itself goes, part 4 was the most interesting to me because it modeled actual mutations instead of providing intrinsic data about the sequence.

Kelly Brock
File: BiophysAsst3P1.py
Answer four parts of Assignment 3

import random

For part 4, when we have to do multiple experiments

TRIALS = 6

Input genetic sequence into memory

sequence = "cggagcagctcactattcacccgatgagaggggaggagagagagagaaaatgtcctttag" sequence += "gccggttcctcttacttggcagagggaggctgctattctccgcctgcatttctttttctg" sequence += "gattacttagttatggcctttgcaaaggcaggggtatttgttttgatgcaaacctcaatc" sequence += "cctccccttctttgaatggtgtgccccaccccccgggtcgcctgcaacctaggcggacgc" sequence += "taccatggcgtagacagggagggaaagaagtgtgcagaaggcaagcccggaggcactttc" sequence += "aagaatgagcatatctcatcttcccggagaaaaaaaaaaaagaatggtacgtctgagaat" sequence += "gaaattttgaaagagtgcaatgatgggtcgtttgataatttgtcgggaaaaacaatctac" sequence += "ctgttatctagctttgggctaggccattccagttccagacgcaggctgaacgtcgtgaag" sequence += "cggaaggggcgggcccgcaggcgtccgtgtggtcctccgtgcagccctcggcccgagccg" sequence += "gttcttcctggtaggaggcggaactcgaattcatttctcccgctgccccatctcttagct" sequence += "cgcggttgtttcattccgcagtttcttcccatgcacctgccgcgtaccggccactttgtg" sequence += "ccgtacttacgtcatctttttcctaaatcgaggtggcatttacacacagcgccagtgcac" sequence += "acagcaagtgcacaggaagatgagttttggcccctaaccgctccgtgatgcctaccaagt" sequence += "cacagacccttttcatcgtcccagaaacgtttcatcacgtctcttcccagtcgattcccg" sequence += "accccacctttattttgatctccataaccattttgcctgttggagaacttcatatagaat" sequence += "ggaatcaggatgggcgctgtggctcacgcctgcactttggctcacgcctgcactttggga" sequence += "ggccgaggcgggcggattacttgaggataggagttccagaccagcgtggccaacgtggtg"

Part 1 - CG content
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

print "Kelly Brock\n" print "Biophysics 101 Asst 3\n" print "Part I\n\n"

Variable to keep track of how many 'cg's we've encountered

count = 0

Check each character in our genetic string

for i in range(0, len(sequence)): if (sequence[i] == 'g') | (sequence[i] == 'c'): count += 1

Compute fraction of total characters equal to c or g

answer = count*1.0/len(sequence) print "CG fraction is: " + str(answer) + "\n"

Part 2 - Find reverse complement
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

print "\nPart II\n"

Make list to hold reversed sequence

RevSeq = list((sequence[::-1]))

Change all values to their complements

for i in range(0,len(RevSeq)): if RevSeq[i] == 'c': RevSeq[i] = 'g' elif RevSeq[i] == 'g': RevSeq[i] = 'c' elif RevSeq[i] == 't': RevSeq[i] = 'a'; elif RevSeq[i] == 'a': RevSeq[i] = 't'

Recast our sequence back into a string

RevSeq = "".join(RevSeq) print "Reverse Complement Sequence" print RevSeq

Part 3 - Determining protein sequence
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

print "\nPart III\n"

Hardcode the protein dictionary

standard = { 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C', 'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C', 'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*', 'ttg': 'L', 'tcg': 'S', 'tag': '*', 'tgg': 'W',

'ctt': 'L', 'cct': 'P', 'cat': 'H', 'cgt': 'R', 'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R', 'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R', 'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R',

		'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S',
		'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S',

'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R',

 		'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R',

'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G', 'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G', 'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G', 'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G' }

Make function to find protein abbreviations
sequence is forward genetic seq, RevSeq is reverse
complement, and posneg indicates whether we want to
do all frames (>1) or just the positive ones (1)

def proteinabbr(sequence, RevSeq, posneg):

# Will hold the list of one-letter abbreviations for the proteins # encoded by p53 in different frames protein = list() totalprot = list()

# Top loop chooses + open frame (0) or - open frame (1) for l in range(0,posneg):

# Use + open frame with normal sequence if l == 0: sign = " + " seq = sequence

# The second time, use reverse complement sequence else: sign = " - " seq = RevSeq

# There are 3 possible reading frames for both normal and # reverse complement sequences for m in range(0,3): print "\nFrame" + sign + str(m+1)

# Go through each triple in our frame for i in range(m,len(seq),3):

# Prevents error if not evenly divisible by 3 if (i+2) < len(seq):

# Lookup protein value in dictionary and add it to the # protein list protein.append(standard[seq[i:(i+3)]])

# Prints result as string and clears list print "".join(protein) totalprot.append(protein) protein = list() return totalprot

Do all that we just defined and store as original, unmutated sequence

original = proteinabbr(sequence, RevSeq, 2)

Part 4
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

print "\nPart IV\n"

Have to do this for a certain number of trials

for j in range(0,TRIALS):

# Make list to hold the random numbers - there should be 1% of total mutspot = random.sample(range(0,(len(sequence)-1)),len(sequence)/100)

# Make string of sequence mutable mutseq = list(sequence)

# Possibilities for mutation for each character A = ['c','g','t'] C = ['a','g','t'] G = ['a','c','t'] T = ['a','c','g']

# Choose another nucleotide for each spot where you assigned a mutation for i in mutspot: if mutseq[i] == 'a': mutseq[i] = random.choice(A) elif mutseq[i] == 'c': mutseq[i] = random.choice(C) elif mutseq[i] == 'g': mutseq[i] = random.choice(G) else: mutseq[i] = random.choice(T)

print "\nMutated Protein Sequence for Frames +1,2,3 in Trial " + str(j) print "\nMUTATED SEQUENCE" print "".join(mutseq)

# Translate mutated string into 3-frame protein abbreviations mutprotseq = proteinabbr("".join(mutseq), RevSeq, 1)

print "\nNumber of immature stop codons: "

# Go through each ORF we computed for k in range(0,3):

# We also want to see how many changes were introduced countmut = 0

# We want to count how many times a new stop codon is introduced # into the code, compared to the original sequence. * = stop countstop = 0

# Find each protein within each ORF for l in range(0,len(mutprotseq[k])):

# Is it a mutation? if mutprotseq[k][l] != original[k][l]: countmut += 1

# Did you introduce a new stop codon? if mutprotseq[k][l] == '*': countstop += 1

print "\nThe total number of protein mutations was " + str(countmut) print "Of these, " + str(countstop) + " were incorrect stop codons." </syntax>

Assignment 0. Python and Excel I'm currently having technical difficulties getting Python to run - it doesn't want to recognize the matplotlib or numpy libraries. However, I did complete the Excel graphs - with increasing k for the first equation, the function values also increased as expected, resulting in different endpoints of the curve. For the second equation, the curve decreased to very negative numbers, like a reflection of a normal exponential curve. For the third graph, I got that all values were zero since (k* e^x * (1-e^x)) would always be <= 0, and the max would automatically choose zero.

User talk:Kelly Brock: Difference between revisions

Revision as of 18:55, 25 September 2009

Personal/Lab Info

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools

@@ Line 2: / Line 2: @@
 == Personal/Lab Info ==
-Assignment 0.  Python and Excel
-I'm currently having technical difficulties getting Python to run - it doesn't want to recognize the matplotlib or numpy    libraries.  However, I did complete the Excel graphs - with increasing k for the first equation, the function values also increased as expected, resulting in different endpoints of the curve.  For the second equation, the curve decreased to very negative numbers, like a reflection of a normal exponential curve.  For the third graph, I got that all values were zero since (k* e^x * (1-e^x)) would always be <= 0, and the max would automatically choose zero.
 Assignment 1.  Python Epicness
+I organized the Python code by questions 1,2,3, and 4 as indicated in the comments.  I redirected the output stream into a txt file, which is what I turned in on Thursday for my answers.  This was by far my favorite problem set this week!  I like python as a language because it seems like a really good mix of C, Scheme (especially the lists and dictionaries), and Matlab.  As far as the experiment itself goes, part 4 was the most interesting to me because it modeled actual mutations instead of providing intrinsic data about the sequence.
 <syntax = python>
@@ Line 218: / Line 218: @@
 		print "Of these, " + str(countstop) + " were incorrect stop codons."
 </syntax>
+Assignment 0.  Python and Excel
+I'm currently having technical difficulties getting Python to run - it doesn't want to recognize the matplotlib or numpy    libraries.  However, I did complete the Excel graphs - with increasing k for the first equation, the function values also increased as expected, resulting in different endpoints of the curve.  For the second equation, the curve decreased to very negative numbers, like a reflection of a normal exponential curve.  For the third graph, I got that all values were zero since (k* e^x * (1-e^x)) would always be <= 0, and the max would automatically choose zero.