Harvard:Biophysics 101/2007/Notebook:Denizkural/2007-2-6

From OpenWetWare

Jump to: navigation, search

Assignment due February 6

Here is the code for my assignment:

#!/usr/bin/env python

from Bio import GenBank, Seq
from Bio.Seq import translate

# We can create a GenBank object that will parse a raw record
# This facilitates extracting specific information from the sequences
record_parser = GenBank.FeatureParser()

# NCBIDictionary is an interface to Genbank
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank', parser = record_parser)

# If you pass NCBIDictionary a GenBank id, it will download that record
parsed_record = ncbi_dict['124484046']

print "GenBank id:", parsed_record.id

# Extract the sequence from the parsed_record
s = parsed_record.seq.tostring()
print "total sequence length:", len(s)

max_repeat = 9

# Translate the sequence into a protein
my_protein = translate(s)

print "protein length:", len(my_protein) 
print 'protein translation is: \n%s' %my_protein

print "\nmethod 1"
for i in range(max_repeat):
    substr = ''.join(['T' for n in range(i+1)])
    print substr, s.count(substr)

print "\nmethod 2"
for i in range(max_repeat):
    substr = ''.join(['T' for n in range(i+1)])
    count = 0
    pos = s.find(substr,0)
    while not pos == -1:
        count = count + 1
        pos = s.find(substr,pos+1)
    print substr, count

print "\nNow we would like to print raw records:"

# Create new dictionary without parser
ncbi_dict = GenBank.NCBIDictionary('nucleotide', 'genbank')
gb_record = ncbi_dict['124484046']

print '\n%s' %gb_record

And here is the output:

GenBank id: AM491363.1
total sequence length: 1496
protein length: 498
protein translation is: 
PSMAFRVHSRNGKSYTFLISSDYERAEWRENIREQQKKCFRSFSLTSVELQMPTNSC
VKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSSNLYCTLEVDSFGYFVN
KAKTRVYRDTAEPNWNEEFEIELEGSQTLRILCYEKCYNKTKIPKEDGESTDRLMGK
GQVQLDPQALQDRDWQRTVIAMNGIEVKLSVKFNSREFSLKRMPSRKQTGVLGVKIA
VVTKRERSKVPYIVRQCVEEIERRGMEEVGIYRVSGVATDIQALKAAFDVKALQRPV
ASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSITKGEKLRV
LGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEHLLSSGING
SFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAELVHHHST
VADGLITTLHYPAPKRNKPSVYGVSPNYDKWEMERTDITMKH

method 1
T 290
TT 39
TTT 9
TTTT 3
TTTTT 0
TTTTTT 0
TTTTTTT 0
TTTTTTTT 0
TTTTTTTTT 0

method 2
T 290
TT 48
TTT 12
TTTT 3
TTTTT 0
TTTTTT 0
TTTTTTT 0
TTTTTTTT 0
TTTTTTTTT 0

Now we would like to print raw records:

LOCUS       AM491363                1496 bp    mRNA    linear   PRI 13-FEB-2007
DEFINITION  Homo sapiens partial mRNA for bcr-abl1 e19a2 chimeric protein.
ACCESSION   AM491363
VERSION     AM491363.1  GI:124484046
KEYWORDS    bcr-abl1 e19a2 chimeric protein; BCR-ABL1 gene.
SOURCE      Homo sapiens (human)
  ORGANISM  Homo sapiens
            Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
            Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;
            Catarrhini; Hominidae; Homo.
REFERENCE   1
  AUTHORS   Burmeister,T. and Reinhardt,R.
  TITLE     A multiplex PCR for improved detection of all known BCR-ABL fusion
            transcripts
  JOURNAL   Unpublished
REFERENCE   2  (bases 1 to 1496)
  AUTHORS   Burmeister,T.
  TITLE     Direct Submission
  JOURNAL   Submitted (02-FEB-2007) Burmeister T., Medizinische Klinik III,
            Charite Universitaetsmedizin Berlin, CBF, Hindenburgdamm 30, 12200
            Berlin, GERMANY
FEATURES             Location/Qualifiers
     source          1..1496
                     /organism="Homo sapiens"
                     /mol_type="mRNA"
                     /db_xref="taxon:9606"
                     /cell_type="leukocyte"
                     /note="fusion of BCR exon 19 and ABL1 exon 2"
     source          1..835
                     /organism="Homo sapiens"
                     /mol_type="mRNA"
                     /db_xref="taxon:9606"
                     /map="22q11"
     source          836..1496
                     /organism="Homo sapiens"
                     /mol_type="mRNA"
                     /db_xref="taxon:9606"
                     /map="9q34"
     gene            <1..>1496
                     /gene="BCR-ABL1 e19a2"
     CDS             <1..>1496
                     /gene="BCR-ABL1 e19a2"
                     /function="tyrosine kinase, oncogene"
                     /codon_start=1
                     /product="bcr-abl1 e19a2 chimeric protein"
                     /protein_id="CAM33013.1"
                     /db_xref="GI:124484047"
                     /translation="PSMAFRVHSRNGKSYTFLISSDYERAEWRENIREQQKKCFRSFS
                     LTSVELQMPTNSCVKLQTVHSIPLTINKEDDESPGLYGFLNVIVHSATGFKQSSNLYC
                     TLEVDSFGYFVNKAKTRVYRDTAEPNWNEEFEIELEGSQTLRILCYEKCYNKTKIPKE
                     DGESTDRLMGKGQVQLDPQALQDRDWQRTVIAMNGIEVKLSVKFNSREFSLKRMPSRK
                     QTGVLGVKIAVVTKRERSKVPYIVRQCVEEIERRGMEEVGIYRVSGVATDIQALKAAF
                     DVKALQRPVASDFEPQGLSEAARWNSKENLLAGPSENDPNLFVALYDFVASGDNTLSI
                     TKGEKLRVLGYNHNGEWCEAQTKNGQGWVPSNYITPVNSLEKHSWYHGPVSRNAAEHL
                     LSSGINGSFLVRESESSPGQRSISLRYEGRVYHYRINTASDGKLYVSSESRFNTLAEL
                     VHHHSTVADGLITTLHYPAPKRNKPSVYGVSPNYDKWEMERTDITMKH"
     variation       158
                     /gene="BCR-ABL1 e19a2"
                     /note="T->C"
                     /replace="t"
     variation       667
                     /gene="BCR-ABL1 e19a2"
                     /note="C->T"
                     /replace="c"
     variation       1171
                     /gene="BCR-ABL1 e19a2"
                     /note="T->C"
                     /replace="t"
     variation       1426
                     /gene="BCR-ABL1 e19a2"
                     /note="A->T"
                     /replace="a"
ORIGIN      
        1 cccagcatgg ccttcagggt gcacagccgc aacggcaaga gttacacgtt cctgatctcc
       61 tctgactatg agcgtgcaga gtggagggag aacatccggg agcagcagaa gaagtgtttc
      121 agaagcttct ccctgacatc cgtggagctg cagatgccga ccaactcgtg tgtgaaactc
      181 cagactgtcc acagcattcc gctgaccatc aataaggaag atgatgagtc tccggggctc
      241 tatgggtttc tgaatgtcat cgtccactca gccactggat ttaagcagag ttcaaatctg
      301 tactgcaccc tggaggtgga ttcctttggg tattttgtga ataaagcaaa gacgcgcgtc
      361 tacagggaca cagctgagcc aaactggaac gaggaatttg agatagagct ggagggctcc
      421 cagaccctga ggatactgtg ctatgaaaag tgttacaaca agacgaagat ccccaaggag
      481 gacggcgaga gcacggacag actcatgggg aagggccagg tccagctgga cccgcaggcc
      541 ctgcaggaca gagactggca gcgcaccgtc atcgccatga atgggatcga agtaaagctc
      601 tcggtcaagt tcaacagcag ggagttcagc ttgaagagga tgccgtcccg aaaacagaca
      661 ggggtcctcg gagtcaagat tgctgtggtc accaagagag agaggtccaa ggtgccctac
      721 atcgtgcgcc agtgcgtgga ggagatcgag cgccgaggca tggaggaggt gggcatctac
      781 cgcgtgtccg gtgtggccac ggacatccag gcactgaagg cagccttcga cgtcaaagcc
      841 cttcagcggc cagtagcatc tgactttgag cctcagggtc tgagtgaagc cgctcgttgg
      901 aactccaagg aaaaccttct cgctggaccc agtgaaaatg accccaacct tttcgttgca
      961 ctgtatgatt ttgtggccag tggagataac actctaagca taactaaagg tgaaaagctc
     1021 cgggtcttag gctataatca caatggggaa tggtgtgaag cccaaaccaa aaatggccaa
     1081 ggctgggtcc caagcaacta catcacgcca gtcaacagtc tggagaaaca ctcctggtac
     1141 catgggcctg tgtcccgcaa tgccgctgag catctgctga gcagcgggat caatggcagc
     1201 ttcttggtgc gtgagagtga gagcagtcct ggccagaggt ccatctcgct gagatacgaa
     1261 gggagggtgt accattacag gatcaacact gcttctgatg gcaagctcta cgtctcctcc
     1321 gagagccgct tcaacaccct ggccgagttg gttcatcatc attcaacggt ggccgacggg
     1381 ctcatcacca cgctccatta tccagcccca aagcgcaaca agccctctgt ctatggtgtg
     1441 tcccccaact acgacaagtg ggagatggaa cgcacggaca tcaccatgaa gcacaa
//
Personal tools