User talk:Anugraha Raman

From OpenWetWare
Revision as of 03:47, 13 October 2009 by Anugraha Raman (talk | contribs)
Jump to navigationJump to search

October 13/20 2009

  • editing

Step 1: We would need access to the PGP sequences

Step 2: Find gene of choice on OMIM Insert specific findings

Step 3: Look at Rs662799b (prevents weight gain from high fat diets)

Step 4:

October 8 2009

Our idea was to use OMIM, Gene Tests, and SNPedia in ordeer to find new linkages between genotype seuqences and thier corresponding phenotypes. We also wanted to attmept to find the minimal number of genotypic sequences that would correspond to a complex phenotypic trait. In order to show our working system we would first show our program working with known genotype-phenotype linkages. See the (Project Talk Page[[1]]) page under projects for a more detailed summary.

September 29 Due Assignment

Homo Sapiens 2.0 Application incorporating analytic and synthetic ideas

The premise of the major idea is that the environment and our lifestyle play major roles in disease onset probability due to their effect on our Epigenetic patterns. The Homo Sapiens toolkit should include the means for our species to test itself for specific diseases due to these epigenetic factors.

PBS Nova aired a show titled “Tale of Two Mice” that focused on Epigenetics. It featured genetically identical mice having the same sex and age, found to be phenotypically distinct due to a methyl-rich diet. Specific DNA regions becoming hyper-methylated can lead to onset of cancer. Amongst humans, one twin getting cancer and the other not, can be explained by diet and environmental factors that resulted in Methylation and eventually cancer.

Today the only known natural modification of human DNA is via DNA Methylation. This Methylation affects the Cytosine base (C) when it is followed by a Guanosine (G) or only at CpG sites. When promoter CpG islands because methylated the gene associated becomes permanently silenced.

Wet-Lab methylation ‘‘profiling’’ studies have shown characteristic set of aberrantly methylated genes with varying CpG island methylation patterns in specific cancer tumors. One of the challenges faced by the lab techniques is degradation of 90% of incubated DNA. The conditions necessary for complete conversion, such as long incubation times, elevated temperature, and high Bisulphite concentration, can lead to this degradation.

An immediate small step towards the Homo Sapiens 2.0 goal of self testing for epigentic factor based diseases is trying to predict if a specific gene is methyaltion prone or resistant algorithmically

References

  1. Herceg Z. Epigenetics and cancer: towards an evaluation of the impact of environmental and dietary factors. Mutagenesis. 2007 Mar;22(2):91-103. DOI:10.1093/mutage/gel068 | PubMed ID:17284773 | HubMed [Paper0]

    example environment and lifestyle linkage to epigenetics

    // Small Step 1: Predict algorithmically if a specific gene is Methylation prone or resistant

  2. Bird A. DNA methylation patterns and epigenetic memory. Genes Dev. 2002 Jan 1;16(1):6-21. DOI:10.1101/gad.947102 | PubMed ID:11782440 | HubMed [Paper1]
  3. Fang F, Fan S, Zhang X, and Zhang MQ. Predicting methylation status of CpG islands in the human brain. Bioinformatics. 2006 Sep 15;22(18):2204-9. DOI:10.1093/bioinformatics/btl377 | PubMed ID:16837523 | HubMed [Paper2]
  4. Feltus FA, Lee EK, Costello JF, Plass C, and Vertino PM. Predicting aberrant CpG island methylation. Proc Natl Acad Sci U S A. 2003 Oct 14;100(21):12253-8. DOI:10.1073/pnas.2037852100 | PubMed ID:14519846 | HubMed [Paper3]

    Medium sized Step 2: ‘Count’ and curate Methylation levels for specific genes which are normal and diseased

  5. Ongenaert M, Van Neste L, De Meyer T, Menschaert G, Bekaert S, and Van Criekinge W. PubMeth: a cancer methylation database combining text-mining and expert annotation. Nucleic Acids Res. 2008 Jan;36(Database issue):D842-6. DOI:10.1093/nar/gkm788 | PubMed ID:17932060 | HubMed [Paper4]
  6. Seike M, Gemma A, Hosoya Y, Hemmi S, Taniguchi Y, Fukuda Y, Yamanaka N, and Kudoh S. Increase in the frequency of p16INK4 gene inactivation by hypermethylation in lung cancer during the process of metastasis and its relation to the status of p53. Clin Cancer Res. 2000 Nov;6(11):4307-13. PubMed ID:11106248 | HubMed [Paper5]

    Lung Cancer example: CDKN2A gene showing normal methyaltion ~0 and diseased methyaltion around ~40%

  7. Toyooka S, Toyooka KO, Miyajima K, Reddy JL, Toyota M, Sathyanarayana UG, Padar A, Tockman MS, Lam S, Shivapurkar N, and Gazdar AF. Epigenetic down-regulation of death-associated protein kinase in lung cancers. Clin Cancer Res. 2003 Aug 1;9(8):3034-41. PubMed ID:12912953 | HubMed [Paper6]

    Another Lung Cancer example DAPK1 gene showing normal methyaltion ~4% and diseased methyaltion around ~40%


    // Large sized Step 3: Predict Methylation levels based on variables - tbd


    // Much larger sized Step 4: Create in vivo logic based “counter” that will light up when it detects biomarkers within range of disease based on Methylation levels

  8. Friedland AE, Lu TK, Wang X, Shi D, Church G, and Collins JJ. Synthetic gene networks that count. Science. 2009 May 29;324(5931):1199-202. DOI:10.1126/science.1172005 | PubMed ID:19478183 | HubMed [Paper7]
  9. Rinaudo K, Bleris L, Maddamsetti R, Subramanian S, Weiss R, and Benenson Y. A universal RNAi-based logic evaluator that operates in mammalian cells. Nat Biotechnol. 2007 Jul;25(7):795-801. DOI:10.1038/nbt1307 | PubMed ID:17515909 | HubMed [Paper8]

    Final large sized Step 5: Make the step 4 setup into a kit and let Homo Sapiens test themselves

  10. Douglas SM, Dietz H, Liedl T, Högberg B, Graf F, and Shih WM. Self-assembly of DNA into nanoscale three-dimensional shapes. Nature. 2009 May 21;459(7245):414-8. DOI:10.1038/nature08016 | PubMed ID:19458720 | HubMed [Paper9]
  11. Andersen ES, Dong M, Nielsen MM, Jahn K, Subramani R, Mamdouh W, Golas MM, Sander B, Stark H, Oliveira CL, Pedersen JS, Birkedal V, Besenbacher F, Gothelf KV, and Kjems J. Self-assembly of a nanoscale DNA box with a controllable lid. Nature. 2009 May 7;459(7243):73-6. DOI:10.1038/nature07971 | PubMed ID:19424153 | HubMed [Paper10]

All Medline abstracts: PubMed | HubMed

Useful links

Week 3 Assignment

Problems 1,2 and 3 were done within one Python script and problem 4 in a separate script. The tutorial Biopython Tutorial helped me in understanding how to proceed with the functions that had to be written for this assignment!

Problems 1 2 and 3 output

Problem 4 output showing changing sequences


The fourth problem was very interesting! I had a blast working on it. Hopefuly it is done correctly :)

The length of the given sequence is 1020 base pairs. For every 100 base pairs the script randomly tries to mutate a to t/g/c t to a/g/c etc. with a probability of 0.01. I then made the script run this a 100 times and called it a simulation. The script did 800 such simulations and aggregated the results in the plot shown below:

Simulations Output
Simulation Analysis

As you can see the output plot after 800 simulations of 100 sets of single base pair evolutionary mutations as described in assignment 3b problem 4 produces the above plot showing about four to five (4.65) premature terminations for every 1020 mutations.


  • Reading in the Input Sequence

I created a simple FASTA type text file to read in the p53seg sequence that was provided.

p53 seg sequence text file

<syntax type="python">

input_file = open('p53seg.txt', 'r') for cur_record in SeqIO.parse(input_file, "fasta"):

   my_seq = cur_record.seq

</syntax>


  • Problem 1 (GC Content)

The GC content % required that a float be used in the denominator to get results.

<syntax type="python">

  1. GC count done explicitly, i.e. problem #1 in this assignment set
  2. Get the number of Guanines in the sequence

g_count = cur_record.seq.count('g')

  1. Get the number of Cytosines in the sequence

c_count = cur_record.seq.count('c')

  1. Get the length of the sequence

seq_count = len(cur_record)

  1. use float in denominator to get the decimal answer for GC%

gc_percent = ((g_count + c_count) / float(seq_count)) * 100 print 'GC % is: ' + str(gc_percent) </syntax>

  • Problem 2 (Reverse Complement)

The reverse complement was obtained by simply using the Seq.reverse_complement() function.

<syntax type="python">

  1. get the reversed complement of the sequence, i.e. problem #2 in this assignment set

rev_seq = my_seq.reverse_complement() print 'DNA reverse complement of p53seg is: '

output_file.write('

DNA reverse complement of p53seg is: ') print rev_seq output_file.write(str(rev_seq)) </syntax>

  • Problem 3 (Frame Translation)

The third problem was done two ways. The first way was to use the Standard table provided by Harris in the 3b word document. The second was to use the standard definition in the Bio.Data CodonTable.py file using the default implementation Seq.translate(sequence). <syntax type="python">

  1. Standard translation from Biophys101_assign3b.doc

standard3b = { 'ttt': 'F', 'tct': 'S', 'tat': 'Y', 'tgt': 'C', 'ttc': 'F', 'tcc': 'S', 'tac': 'Y', 'tgc': 'C', 'tta': 'L', 'tca': 'S', 'taa': '*', 'tga': '*', 'ttg': 'L', 'tcg': 'S', 'tag': '*', 'tgg': 'W', 'ctt': 'L', 'cct': 'P', 'cat': 'H', 'cgt': 'R', 'ctc': 'L', 'ccc': 'P', 'cac': 'H', 'cgc': 'R', 'cta': 'L', 'cca': 'P', 'caa': 'Q', 'cga': 'R', 'ctg': 'L', 'ccg': 'P', 'cag': 'Q', 'cgg': 'R', 'att': 'I', 'act': 'T', 'aat': 'N', 'agt': 'S', 'atc': 'I', 'acc': 'T', 'aac': 'N', 'agc': 'S', 'ata': 'I', 'aca': 'T', 'aaa': 'K', 'aga': 'R', 'atg': 'M', 'acg': 'T', 'aag': 'K', 'agg': 'R', 'gtt': 'V', 'gct': 'A', 'gat': 'D', 'ggt': 'G', 'gtc': 'V', 'gcc': 'A', 'gac': 'D', 'ggc': 'G', 'gta': 'V', 'gca': 'A', 'gaa': 'E', 'gga': 'G', 'gtg': 'V', 'gcg': 'A', 'gag': 'E', 'ggg': 'G' } def translate_dna(seq): """ translates tri-nucleotide sequences (codon) to its one letter amino acid """ aa_translation = "" for codon_loc in xrange(0,len(seq),3): # if you do not find the codon translation i.e partial codon # or something else replace with ? aa_translation = aa_translation + standard3b.get(str(seq[codon_loc:codon_loc+3]), "?") return aa_translation </syntax> Frame +1 and Frame +2 or -1 or -2 could be done simply as shown below <syntax type="python"> plusone_seq = seq amino_seq1 = translate_dna(plusone_seq) print '(+1) frame translation is: ' print amino_seq1

  1. ...
  2. +2 Frame

# # original sequence minus the first nucleic acid in the sequence plustwo_seq = seq[1:] amino_seq2 = translate_dna(plustwo_seq)

  1. ...

r_seq = seq.reverse_complement()

  1. -1 Frame

# # original sequence reversed minusone_seq = r_seq amino_seq4 = translate_dna(r_seq)

  1. ...

# -2 Frame # # reversed sequence minus the first nucleic acid minustwo_seq = r_seq[1:] amino_seq5 = translate_dna(minustwo_seq) </syntax> Way 2 using the standard table predefined in Biopython is shown below <syntax type="python">

  1. Method 2 ===>Using the Standard table defined

#in Bio.Data CodonTable.py # by using the default Seq.translate # +1 Frame # using the translate method in Bio.Seq # implemented in Libs/sitepackages/Bio/Seq.py plusone_seq = seq amino_seq1 = Seq.translate(plusone_seq) </syntax>

  • Problem 4 (Single bp mutation simulation to detect early terminations)

many functions were defined for this script however the 2 that are key are mutatesinglebp and findstops. <syntax type="python"> from Bio import SeqIO from Bio.Seq import Seq from Bio.Alphabet import IUPAC from random import * import os

  1. Functions defined in this script file are as follows:
  2. writeheader(myfile)  : Writes a specific HTML header using the myfile handle
  3. writefooter(myfile)  : Writes a specific HTML footer using the myfile handle
  4. writerunsummary(myfile) : Writes a specific set of sumamry information using the myfile handle
  5. mutatesinglebp(seq, random_Seed, forevery, prob) : Mutates a single base pair for forevery
  6. location range using a probability of prob; returns mutated sequence
  7. writeAA(myfile, seq,stop_locs) : Writes the amino acids using the myfile handle
  8. findstops(seq)  : Finds the stop locations in the given DNA sequence

</syntax> mutatesinglebp <syntax type="python">

  1. mutate_singlebp mutates a single base pair for forevery location range using a
  2. probability of prob

def mutate_singlebp(seq, random_seed=0, forevery=100, prob=0.01): # reset seed if random_seed == 0: seed() else: seed(random_seed) for x in range(0,forevery): r=randrange(0,1/prob) if r == 0: mutate_pos = randrange(0,len(seq)-1) old_base = seq[mutate_pos] if old_base == 'a': mutated_base = choice(['c', 't', 'g']) elif old_base == 'c': mutated_base = choice(['a', 't', 'g']) elif old_base == 't': mutated_base = choice(['a', 'c', 'g']) else: # 'g' mutated_base = choice(['a', 'c', 't']) # mutable sequence seq is updated seq[mutate_pos] = mutated_base # end if # end for

  1. end of function mutate_singlebp

def findstops(seq): stop_array = [] start = 0 stop_pos = 0 while stop_pos != -1: stop_pos = Seq.translate(seq).find('*',start) start = stop_pos + 1 stop_array = stop_array + [stop_pos] stop_array.remove(-1) return stop_array # end of while

  1. end of function findstops

</syntax> calling the functions <syntax type="python">

  1. I have to create a mutable sequence to use the mutate_singlebp function

rand_seq = my_seq.tomutable() # calling the function that will do most of the heavy lifting mutate_singlebp(rand_seq) # I have to convert the mutable sequence back to a normal DNA sequence in order to use the transcribe() method new_seq = Seq(rand_seq.toseq().tostring(), IUPAC.unambiguous_dna) b = findstops(new_seq) </syntax>

Week 2 Assignment

With the first graph (exponential), with larger values of k the graph increased much faster. In face as you can see when you compare the "red triangles" to the red circles, the exponential curve associated with the red triangles (k=4.03) dwarves the exponetial curve associated with the "red circles" (k=.9) so much that the "red circles curve" appears to almost be linear in the top graph.

Exponential Graphs


In the second graph (logistic), with larger values of k, the graph not only grows faster, but also starts leveling off sooner. If we were relating this to population growth, larger k values result in the population reaching its carrying capcity sooner.

Logistics and Exponential Graphs

The last graph (not shown) is mean to correct for "negative population." I am still having trouble using the max function to evaluate individual values in my array.