Harvard:Biophysics 101/2007/Notebook:Christopher Nabel/2007-2-8
Assignment Due Feb 8
Proposed Program Construction
The goal of this program is to generate 10,000 10-letter strings, consisting solely of the letters H and T, and analyze these strings for varying overlapping stretches of H or T.
My ideal program would work in three parts. First, I would generate a loop that would create a string of 10 random letters. Next, as I generate each individual string, I would add it to a master list (before generating a new string). Once the master list is complete, I would run it through a screen for homogeneous letter stretches of successive length.
Technical Difficulties
I was unable to execute this design model as I ran into serious problems in the first phase of writing the program. I experimented with while and for loops, trying to add 10 randomly chosen variables to my initial string. I failed each time. I do not know how to implement the join function that other members of the class seem to use. Once this barrier is overcome, I think that I have the knowledge to use the append function to assemble the master list, and then analyze the master list through the code written for us on the assignment due the 6th. This is where I currently stand with this assignment, and more explanation of python and string manipulation would be greatly appreciated.
Updated Assignment
After helpful advice from Shawn, I was able to understand some of the programming basics. Here is my completed assignment:
Code
#!/usr/bin/env python # Load random operations for generation of random 10-mers import random # Create an empty list to store the 10-mers, an empty string # for the individual 10-mers, and a specific reference string # for when we sample through the data set data = [] string = [] refstring = [] # Generate 10,000 10-mers using a for loop and add those to the list for i in range(10000): string = ''.join([random.choice('HT') for n in range(10)]) data.append(string) # Iterate through the list and count up the stretches of H's and T's # to automate the iterations, I will incorporate an additional loop to scan for # each variable possibilities = ['H','T'] print "Using method 2 from the Feb. 1 Assignment..." for s in possibilities: for i in range(10): tally = 0 # We need a new variable to keep track of total substrings substr = ''.join([s for n in range(i+1)]) for j in range(10000): refstring = data[j] count = 0 pos = refstring.find(substr,0) while not pos == -1: count = count + 1 tally = tally + 1 pos = refstring.find(substr,pos+1) print substr, tally
Program Output
>>> ================================ RESTART ================================ >>> Using method 2 from the Feb. 1 Assignment... H 49909 HH 22455 HHH 9985 HHHH 4349 HHHHH 1893 HHHHHH 786 HHHHHHH 323 HHHHHHHH 116 HHHHHHHHH 34 HHHHHHHHHH 8 T 50091 TT 22645 TTT 10122 TTTT 4515 TTTTT 1950 TTTTTT 812 TTTTTTT 313 TTTTTTTT 115 TTTTTTTTT 35 TTTTTTTTTT 6 >>>