Harvard:Biophysics 101/2007/Notebook:Katie Fifer/2007-2-8

From OpenWetWare
Revision as of 15:28, 7 February 2007 by Kfifer (talk | contribs) (→‎Testing)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Script

 #!/usr/bin/env python
 
 # Katie Fifer 
 # asst2.py 
 # 2/7/07 
 # Description: A script to generate 10,000 strings of 10 random
 # coinflips (H or T) and outputs the tally of contiguous (overlapping
 # stretches of 2,3,4,5,6,7,8,9, and 10 H's or T's in that set of
 # 10,000 10-mers
 
 import random
 
 # set constants
 num_strings = 10000
 num_flips = 10
 max_repeat = 10
 all_strings = [ ]
 
 # random number generation
 
 # generate a list of new strings
 for i in range(num_strings):
     new_string = .join([random.choice(['H','T']) for n in range (num_flips)])
     all_strings.append(new_string)
 
 # figure out how many overlapping stretches of H's there are. will do
 # this for each string for each substring. in other words will find
 # all instances of 'HH' in each of the strings, and then all instances
 # of 'HHH' in each of the strings etc.
 
 def analyze (letter):
     for i in range(max_repeat):
 	  # generate the substring to search for. the i + 1 is to account
 	  # for the fact that i starts at 0
 	  substr = .join([letter for n in range (i + 1)])
 	  # for each of the strings in the list, find the number of
 	  # instances of the substring just set (overlapping)
 	  total = 0
 	  for j in range(num_strings):
 	      curr_string = all_strings[j]
 	      count = 0
 	      pos = curr_string.find(substr, 0)
 	      while not pos == -1:
 		  count = count + 1
 		  total = total + 1
 		  pos = curr_string.find(substr, pos + 1)
 	  print substr, total
   
 analyze('H')
 analyze('T')


Output

  • program run to generate 10,000 strings
 H 49831
 HH 22372
 HHH 9860
 HHHH 4232
 HHHHH 1813
 HHHHHH 754
 HHHHHHH 313
 HHHHHHHH 127
 HHHHHHHHH 44
 HHHHHHHHHH 8
 T 50169
 TT 22622
 TTT 10065
 TTTT 4401
 TTTTT 1937
 TTTTTT 824
 TTTTTTT 341
 TTTTTTTT 122
 TTTTTTTTT 37
 TTTTTTTTTT 7

Testing

  • Output for just 5 strings so you can double check by hand
 all strings: ['HHTTHTHTTT', 'THTTTTTTTH', 'TTTHHTHTHT', 'HTHTHTHTTH', 'HTTHHHTTTT']
 H 19
 HH 4
 HHH 1
 HHHH 0
 HHHHH 0
 HHHHHH 0
 HHHHHHH 0
 HHHHHHHH 0
 HHHHHHHHH 0
 HHHHHHHHHH 0
 T 31
 TT 16
 TTT 9
 TTTT 5
 TTTTT 3
 TTTTTT 2
 TTTTTTT 1
 TTTTTTTT 0
 TTTTTTTTT 0
 TTTTTTTTTT 0