# Harvard:Biophysics 101/2007/Notebook:Katie Fifer/2007-2-8

### From OpenWetWare

## Script

#!/usr/bin/env python # Katie Fifer # asst2.py # 2/7/07 # Description: A script to generate 10,000 strings of 10 random # coinflips (H or T) and outputs the tally of contiguous (overlapping # stretches of 2,3,4,5,6,7,8,9, and 10 H's or T's in that set of # 10,000 10-mers import random # set constants num_strings = 10000 num_flips = 10 max_repeat = 10 all_strings = [ ] # random number generation # generate a list of new strings for i in range(num_strings): new_string =.join([random.choice(['H','T']) for n in range (num_flips)])all_strings.append(new_string) # figure out how many overlapping stretches of H's there are. will do # this for each string for each substring. in other words will find # all instances of 'HH' in each of the strings, and then all instances # of 'HHH' in each of the strings etc. def analyze (letter): for i in range(max_repeat): # generate the substring to search for. the i + 1 is to account # for the fact that i starts at 0 substr =.join([letter for n in range (i + 1)])# for each of the strings in the list, find the number of # instances of the substring just set (overlapping) total = 0 for j in range(num_strings): curr_string = all_strings[j] count = 0 pos = curr_string.find(substr, 0) while not pos == -1: count = count + 1 total = total + 1 pos = curr_string.find(substr, pos + 1) print substr, total analyze('H') analyze('T')

## Output

- program run to generate 10,000 strings

H 49831 HH 22372 HHH 9860 HHHH 4232 HHHHH 1813 HHHHHH 754 HHHHHHH 313 HHHHHHHH 127 HHHHHHHHH 44 HHHHHHHHHH 8 T 50169 TT 22622 TTT 10065 TTTT 4401 TTTTT 1937 TTTTTT 824 TTTTTTT 341 TTTTTTTT 122 TTTTTTTTT 37 TTTTTTTTTT 7

## Testing

- Output for just 5 strings so you can double check by hand

all strings: ['HHTTHTHTTT', 'THTTTTTTTH', 'TTTHHTHTHT', 'HTHTHTHTTH', 'HTTHHHTTTT'] H 19 HH 4 HHH 1 HHHH 0 HHHHH 0 HHHHHH 0 HHHHHHH 0 HHHHHHHH 0 HHHHHHHHH 0 HHHHHHHHHH 0 T 31 TT 16 TTT 9 TTTT 5 TTTTT 3 TTTTTT 2 TTTTTTT 1 TTTTTTTT 0 TTTTTTTTT 0 TTTTTTTTTT 0