# Harvard:Biophysics 101/2007/Notebook:Katie Fifer/2007-2-8

## Script

``` #!/usr/bin/env python

# Katie Fifer
# asst2.py
# 2/7/07
# Description: A script to generate 10,000 strings of 10 random
# coinflips (H or T) and outputs the tally of contiguous (overlapping
# stretches of 2,3,4,5,6,7,8,9, and 10 H's or T's in that set of
# 10,000 10-mers

import random

# set constants
num_strings = 10000
num_flips = 10
max_repeat = 10
all_strings = [ ]

# random number generation

# generate a list of new strings
for i in range(num_strings):
new_string = .join([random.choice(['H','T']) for n in range (num_flips)])
all_strings.append(new_string)

# figure out how many overlapping stretches of H's there are. will do
# this for each string for each substring. in other words will find
# all instances of 'HH' in each of the strings, and then all instances
# of 'HHH' in each of the strings etc.

def analyze (letter):
for i in range(max_repeat):
# generate the substring to search for. the i + 1 is to account
# for the fact that i starts at 0
substr = .join([letter for n in range (i + 1)])
# for each of the strings in the list, find the number of
# instances of the substring just set (overlapping)
total = 0
for j in range(num_strings):
curr_string = all_strings[j]
count = 0
pos = curr_string.find(substr, 0)
while not pos == -1:
count = count + 1
total = total + 1
pos = curr_string.find(substr, pos + 1)
print substr, total

analyze('H')
analyze('T')
```

## Output

• program run to generate 10,000 strings
``` H 49831
HH 22372
HHH 9860
HHHH 4232
HHHHH 1813
HHHHHH 754
HHHHHHH 313
HHHHHHHH 127
HHHHHHHHH 44
HHHHHHHHHH 8
T 50169
TT 22622
TTT 10065
TTTT 4401
TTTTT 1937
TTTTTT 824
TTTTTTT 341
TTTTTTTT 122
TTTTTTTTT 37
TTTTTTTTTT 7
```

## Testing

• Output for just 5 strings so you can double check by hand
``` all strings: ['HHTTHTHTTT', 'THTTTTTTTH', 'TTTHHTHTHT', 'HTHTHTHTTH', 'HTTHHHTTTT']
H 19
HH 4
HHH 1
HHHH 0
HHHHH 0
HHHHHH 0
HHHHHHH 0
HHHHHHHH 0
HHHHHHHHH 0
HHHHHHHHHH 0
T 31
TT 16
TTT 9
TTTT 5
TTTTT 3
TTTTTT 2
TTTTTTT 1
TTTTTTTT 0
TTTTTTTTT 0
TTTTTTTTTT 0
```