User talk:Anna Turetsky: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
== SNPCupid ==
== SNPCupid ==


...what Joe said[http://openwetware.org/wiki/User_talk:Joseph_P._Torella]. (He pretty much wrote up all the details we discussed. Thanks, Joe)
...what Joe said[http://openwetware.org/wiki/User_talk:Joseph_P._Torella]. (He pretty much wrote up all the details we discussed. (Thanks Joe))





Revision as of 10:42, 8 October 2009

SNPCupid

...what Joe said[1]. (He pretty much wrote up all the details we discussed. (Thanks Joe))


Human 2.0

This might be too broad (as in, I'm not thinking about one specific trait), but this takes the personal genome project one step further.

For now, the only really effective way to control a person's genes is by controlling the alleles inherited from parents. Thus, I have been thinking about the idea of playing with the statistical recombination of genes due to meiosis during reproduction to create Humans 2.0. Not only could this type of selection induce extra diversity, but it could try to prevent offspring from being homozygous recessive for a well-characterized genetic disease, eradicating those disorders.

The idea: create a genetics-based dating website where potential partners are chosen based on genetic characteristics that their offspring are likely to exhibit. Users could choose what they care about most, i.e. highest diversity of alleles, avoidance of homozygous recessive alleles, highest probability of blue eyes, dark hair, high intelligence, etc. (Though some of these would run into problems by creating less diversity...). The search would return the best genetic matches after taking into account other more tangible considerations, such as age. Of course, this requires getting to the point where we have personal genetic information for everyone taking part, which would currently be a major barrier in doing this.

I will admit, even in the future the likelihood of someone using something like this is quite small. However, perhaps it could help people decide whether to have children once they are already together (similar but more broad than the genetic testing that is done for Tay Sachs likelihood in Jewish communities). With a few known diseases in mind, it would be easier to test users' genetic information, as well. Eventually, we can perhaps we can pave the way for generations of some very healthy (and maybe smart and good-looking!) people. It might even be well-received by the anti-abortion crowd!

The step we could take towards achieving this is to document known recessive genes and make a program to predict recombination into a homozygous recessive offspring based on the genetic information of the parents. We could also rate the different genes by severity (including negative vs. positive impact of the homozygous recessive, if any are non-disease) and include that in the algorithm as far as rating the desirability of possible combinations spanning the entire genome.

__________________________________________

Also:

I'm tempted to steal this idea: Journal of Imaginary Genomics, 2006

(It's from this article about scientific integrity from a few years ago: One Last Question: Who Did the Work?)


Assignment for Sept. 24

(this got formatted strangely by openwetware...hopefully it still makes sense...)

seq = 'cggagcagctcactattcacccgatgagaggggaggagagagagagaaaatgtcctttag\ gccggttcctcttacttggcagagggaggctgctattctccgcctgcatttctttttctg\ gattacttagttatggcctttgcaaaggcaggggtatttgttttgatgcaaacctcaatc\ cctccccttctttgaatggtgtgccccaccccccgggtcgcctgcaacctaggcggacgc\ taccatggcgtagacagggagggaaagaagtgtgcagaaggcaagcccggaggcactttc\ aagaatgagcatatctcatcttcccggagaaaaaaaaaaaagaatggtacgtctgagaat\ gaaattttgaaagagtgcaatgatgggtcgtttgataatttgtcgggaaaaacaatctac\ ctgttatctagctttgggctaggccattccagttccagacgcaggctgaacgtcgtgaag\ cggaaggggcgggcccgcaggcgtccgtgtggtcctccgtgcagccctcggcccgagccg\ gttcttcctggtaggaggcggaactcgaattcatttctcccgctgccccatctcttagct\ cgcggttgtttcattccgcagtttcttcccatgcacctgccgcgtaccggccactttgtg\ ccgtacttacgtcatctttttcctaaatcgaggtggcatttacacacagcgccagtgcac\ acagcaagtgcacaggaagatgagttttggcccctaaccgctccgtgatgcctaccaagt\ cacagacccttttcatcgtcccagaaacgtttcatcacgtctcttcccagtcgattcccg\ accccacctttattttgatctccataaccattttgcctgttggagaacttcatatagaat\ ggaatcaggatgggcgctgtggctcacgcctgcactttggctcacgcctgcactttggga\ ggccgaggcgggcggattacttgaggataggagttccagaccagcgtggccaacgtggtg'

len = len(seq)

  1. length is 1020 bp


  1. 1. Please determine the GC content of p53seg.
  1. a is the counter and the loop adds 1 to 'a' every time it sees a\
  2. c or a g in the sequence

a=0

for i in range(len):

   if seq[i] == 'c':
       a=a+1
   if seq[i] == 'g':
       a=a+1
  1. a, which is the number of g's and c's, is 540

print 'GC content is about', (a*100)/len, 'percent'



  1. 2. Determine the DNA reverse complement of p53seg.
  1. comp is the array of the complement sequence and the loop appends\
  2. the complement of each base. revcomp is the reverse comp- the\
  3. backwards sequence of comp

comp = []

for i in range(len):

   if seq[i] == 'c':
       comp.append('g')
   if seq[i] == 'g':
       comp.append('c')
   if seq[i] == 'a':
       comp.append('t')
   if seq[i] == 't':
       comp.append('a')

revcomp = []

for i in range(len):

   revcomp.append(comp[len-1-i])

revcompseq = "".join(revcomp)

print 'the reverse complement sequence is', revcompseq



  1. 3. Translate the p53seg gene into its protein\
  2. sequence in all 6 frames (+1, +2, +3, -1, -2, -3)
  1. The c function turns a sequence into an array of codons; that is, 3\
  2. bases per array element. The inputs are the sequence, which array is being\
  3. made, and the open reading frame(orf).

def c(seq, codonarray, orfstart):

   for i in range(orfstart,len,3):
       codonarray.append(seq[i:(i+3)])

codonarray1 = []

codonarray2 = []

codonarray3 = []

c(seq, codonarray1, 0)

c(seq, codonarray2, 1)

c(seq, codonarray3, 2)

revcodonarray1 = []

revcodonarray2 = []

revcodonarray3 = []

c(revcompseq, revcodonarray1, 0)

c(revcompseq, revcodonarray2, 1)

c(revcompseq, revcodonarray3, 2)

  1. The p function finds the start codon and then turns all the codons after it\
  2. into the amino acids they code, stopping when the sequence reaches a stop\
  3. codon. It then prints the amino acid sequence of the protein.

def p(codonarray):

   protseq = []
   for i in range(len/3):
       if codonarray[i] == 'atg':
           start = i
           break
   for i in range(start, len/3):
       if codonarray[i] == 'ttt':
           protseq.append('F')
       if codonarray[i] == 'tct':
           protseq.append('S')
       if codonarray[i] == 'tat':
           protseq.append('Y')
       if codonarray[i] == 'tgt':
           protseq.append('C')
       if codonarray[i] == 'ttc':
           protseq.append('F')
       if codonarray[i] == 'tcc':
           protseq.append('S')
       if codonarray[i] == 'tac':
           protseq.append('Y')
       if codonarray[i] == 'tgc':
           protseq.append('C')
       if codonarray[i] == 'tta':
           protseq.append('L')
       if codonarray[i] == 'tca':
           protseq.append('S')
       if codonarray[i] == 'taa':
           protseq.append('*')
           break
       if codonarray[i] == 'tga':
           protseq.append('*')
           break
       if codonarray[i] == 'ttg':
           protseq.append('L')
       if codonarray[i] == 'tcg':
           protseq.append('S')
       if codonarray[i] == 'tag':
           protseq.append('*')
           break
       if codonarray[i] == 'tgg':
           protseq.append('W')
       if codonarray[i] == 'ctt':
           protseq.append('L')
       if codonarray[i] == 'ctc':
           protseq.append('L')
       if codonarray[i] == 'cta':
           protseq.append('L')
       if codonarray[i] == 'ctg':
           protseq.append('L')
       if codonarray[i] == 'cct':
           protseq.append('P')
       if codonarray[i] == 'ccc':
           protseq.append('P')
       if codonarray[i] == 'cca':
           protseq.append('P')
       if codonarray[i] == 'ccg':
           protseq.append('P')
       if codonarray[i] == 'cat':
           protseq.append('H')
       if codonarray[i] == 'cac':
           protseq.append('H')
       if codonarray[i] == 'caa':
           protseq.append('Q')
       if codonarray[i] == 'cag':
           protseq.append('Q')
       if codonarray[i] == 'cgt':
           protseq.append('R')
       if codonarray[i] == 'cgc':
           protseq.append('R')
       if codonarray[i] == 'cga':
           protseq.append('R')
       if codonarray[i] == 'cgg':
           protseq.append('R')
       if codonarray[i] == 'att':
           protseq.append('I')
       if codonarray[i] == 'atc':
           protseq.append('I')
       if codonarray[i] == 'ata':
           protseq.append('I')
       if codonarray[i] == 'atg':
           protseq.append('M')
       if codonarray[i] == 'act':
           protseq.append('T')
       if codonarray[i] == 'acc':
           protseq.append('T')
       if codonarray[i] == 'aca':
           protseq.append('T')
       if codonarray[i] == 'acg':
           protseq.append('T')
       if codonarray[i] == 'aat':
           protseq.append('N')
       if codonarray[i] == 'aac':
           protseq.append('N')
       if codonarray[i] == 'aaa':
           protseq.append('K')
       if codonarray[i] == 'aag':
           protseq.append('K')
       if codonarray[i] == 'agt':
           protseq.append('S')
       if codonarray[i] == 'agc':
           protseq.append('S')
       if codonarray[i] == 'aga':
           protseq.append('R')
       if codonarray[i] == 'agg':
           protseq.append('R')
       if codonarray[i] == 'gtt':
           protseq.append('V')
       if codonarray[i] == 'gtc':
           protseq.append('V')
       if codonarray[i] == 'gta':
           protseq.append('V')
       if codonarray[i] == 'gtg':
           protseq.append('V')
       if codonarray[i] == 'gct':
           protseq.append('A')
       if codonarray[i] == 'gcc':
           protseq.append('A')
       if codonarray[i] == 'gca':
           protseq.append('A')
       if codonarray[i] == 'gcg':
           protseq.append('A')
       if codonarray[i] == 'gat':
           protseq.append('D')
       if codonarray[i] == 'gac':
           protseq.append('D')
       if codonarray[i] == 'gaa':
           protseq.append('E')
       if codonarray[i] == 'gag':
           protseq.append('E')
       if codonarray[i] == 'ggt':
           protseq.append('G')
       if codonarray[i] == 'ggc':
           protseq.append('G')
       if codonarray[i] == 'gga':
           protseq.append('G')
       if codonarray[i] == 'ggg':
           protseq.append('G')
   protseqjoined = "".join(protseq)
   print protseqjoined
   

print 'the amino acid sequence for reading frame +1 is:' p(codonarray1)

print 'the amino acid sequence for reading frame +2 is:' p(codonarray2)

print 'the amino acid sequence for reading frame +3 is:' p(codonarray3)

print 'the amino acid sequence for reading frame -1 is:' p(revcodonarray1)

print 'the amino acid sequence for reading frame -2 is:' p(revcodonarray2)

print 'the amino acid sequence for reading frame -3 is:' p(revcodonarray3)



  1. 4. Please introduce single base-pair mutations (i.e. replacement of\
  2. A by T/C/G, G by A/T/C, etc…) to the p53seg gene at a rate of 1% \
  3. (i.e. ~1 mutation every 100 base pairs) and document the changes to the\
  4. protein sequence (give a couple of trial results). How often do you see\
  5. premature terminations?
  1. mutseqarray is the sequence with the mutations put in through the loop that\
  2. picks a random base to change out of every 100 and then picks a random\
  3. nucleotide to change it to. the functions of writing the codons and then\
  4. the protein sequence are then repeated using the mutant sequence. The reading\
  5. frame of the mutant sequence can be modified in the program code.

seqarray = []

for i in range(len):

   seqarray.append(seq[i])

import random as rd

mutseqarray = seqarray

mutsite = []

basetype = []

for i in range(len/100):

   mutsite.append(rd.randrange(0, 100))

for j in range(10):

   mutsite[j] = mutsite[j]+(100*j)

for i in range(len/100):

   basetype.append(rd.randrange(0,4))
   if basetype[i] == 0:
       mutseqarray[mutsite[i]] = 'a'
   if basetype[i] == 1:
       mutseqarray[mutsite[i]] = 't'
   if basetype[i] == 2:
       mutseqarray[mutsite[i]] = 'g'
   if basetype[i] == 3:
       mutseqarray[mutsite[i]] = 'c'

mutseq = "".join(mutseqarray)

mutcodonarray = []

c(mutseq, mutcodonarray, 2)

print 'the amino acid sequence for reading frame +3 after mutations is:' p(mutcodonarray)


Assignment for Sept. 15

Not only have I not done much programming, but I haven't really done graphing in excel, making this a more confusing assignment than I think it should have been. I think I need to see during class how to graph the logistic growth curve. For the exponential growth, as others commented, whether it is growth or decay depends on the value of k being greater than or less than 1, respectively.