Chris Rhodes Week 8: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
Line 21: Line 21:
**Known Secondary Structure and ontologies
**Known Secondary Structure and ontologies
**Details about protein domains, cellular location, additional miscellaneous information.
**Details about protein domains, cellular location, additional miscellaneous information.
**References to the studies and labs used to create all the information found in the UniProt protein entry.


'''ORFing your DNA sequence''
'''ORFing your DNA sequence''

Revision as of 16:00, 19 October 2011

For today's lab we will working out of the Bioinformatics for Dummies 2nd edition book performing selected activities from Chapters 2, 4, 5, and 6 but modifying the protocols to apply to the current website formats and the use of HIV-1 gp120.

In Class Activities

Retrieving Protein Sequences

  • The protein retrieved in this exercise is HIV-1 gp120. It was found by going to ExPASy and searching "HIV gp120 envelope protein" using the UniProtKB database, but verified independent gp120 protein could not be found. The gp120 protein sequence was instead taken from an entry of gp160 which contains the gp120 sqeuence. The UniProtKB entry of the gp160 protein used is found here and the sequence of the gp120 protein, shown as the highlighted residues within the gp160 protein sequence, is found here http://www.uniprot.org/blast/?about=P04578[33-511] -> This address couldn't be properly hyperlinked due to the [33-511] text causing problems with the linking format.
  • The fasta form of the gp120 protein sequence was retrieved from the entry page and is shown here:
>sp|P04578|33-511
KLWVTVYYGVPVWKEATTTLFCASDAKAYDTEVHNVWATHACVPTDPNPQEVVLVNVTENFNMWKNDMVEQMHEDIISL
WDQSLKPCVKLTPLCVSLKCTDLKNDTNTNSSSGRMIMEKGEIKNCSFNISTSIRGKVQKEYAFFYKLDIIPIDNDTTSY
KLTSCNTSVITQACPKVSFEPIPIHYCAPAGFAILKCNNKTFNGTGPCTNVSTVQCTHGIRPVVSTQLLLNGSLAEEEVVI
RSVNFTDNAKTIIVQLNTSVEINCTRPNNNTRKRIRIQRGPGRAFVTIGKIGNMRQAHCNISRAKWNNTLKQIASKLRE
QFGNNKTIIFKQSSGGDPEIVTHSFNCGGEFFYCNSTQLFN
  • From the list of gp160 proteins found when searching for gp120 in the first step 5 sequences were chosen to be used in the multiple retrieval exercise. The UniProt ID numbers of the 5 sequences are P04578, P03377, P03375, P35961, and P05877. From the options for downloading the sequences I chose the FASTA format, the txt version of the combined sequence FASTA file can be found here

Reading a Swiss-Prot Entry

  • As with the first activity a UniProt verified gp120 protein could not be found so I will be working with the gp160 entry instead.
  • The UniProt entry of the gp160 protein used can be found here
  • The entry itself is very in-depth and contains a lot of information. Some of the major features of the entry include:
    • The protein name along with the names of the proteins that result from the cleavage of the original protein
    • The protein sequence, source organism, and in this case the viral host.
    • In-depth description of known functions and mechanisms of function.
    • Known Secondary Structure and ontologies
    • Details about protein domains, cellular location, additional miscellaneous information.
    • References to the studies and labs used to create all the information found in the UniProt protein entry.

'ORFing your DNA sequence

  • The NCBI ORF Finder can be found here
  • The sequence used for this experiment was found by searching the NCBI nucleotide database for gp120 of HIV-1. The NCBI entry page for the sequence chosen can be found here and the fasta form is shown below
>gi|328550457|gb|JF701706.1| HIV-1 isolate gp120_Oct_10 from USA vpu protein (vpu) and envelope glycoprotein (env) genes, partial cds
CAGAAAGAGCAGAAGACAGTGGCAATGAGAGTGAAGGGGATCAGGAAGAATTATCAGCACTTGTGGACAT
GGGGCATCATGCTCCTTGGGATGTTAATGATCTGTAGTGCTGCAGGAAATTGGTGGGTCACAGTCTATTA
TGGAGTRCCTGTGTGGAAAGAAGCAACCACCACTCTATTTTGTGCATCAGATGCTAAAGCATATGATACA
GAGGTACATAATGTTTGGGCCACACATGCCTGTGTACCCACAGACCCCAACCCACAAGAAATATTATTGA
AAAATGTGACAGAAAATTTTAACATGTGGAAAAATGGCATGGTAGAACAAATGCATGAGGATATAATCAG
TTTATGGGATCAAAGCCTAAAGCCATGTGTGAAATTAACCCCACTCTGTGTTACTTTAAATTGCACTARC
TTGAATGTTACTAATACCACTGCTACTAACACAACGAATAATGGCGGGACAACAATGGCGGGAGAAATGA
GAAACTGCTCTTTCAATGTCACCACAAGCATAGGAAATAGGAGACAAAAAGAATATGCGCTTTTGTATAA
ACATGATATAGTACCAATAGATAATAGTACYAACTATATACTAATAAGTTGTAACACCTCAGTCATTACA
CAGGCCTGTCCAAAGATATCCTTTGAACCAATTCCCATACATTATTGTGCCCCAGCTGGTTTTGCGATTC
TAAAGTGTAAYGAGAAGAAGTTCAATGGCACAGGACCATGTAAAAATGTCAGCACAGTACAATGTACACA
TGGAATTAAGCCAGTAGTATCAACTCAACTGTTGTTAAATGGCAGTCTAGCAGAAGAAGAGGTAGTAATT
AGATCTGAAAATTTCACAAACAATGCTAAAACCATAATAGTACAGCTAAACAGTCCTGTATTAATTAATT
GTACAAGACCCAACAACAATACAAGAAAAGGTATACGGATAGGACCAGGGAGAACATTCWTTGCAACAGA
AAGAATAATAGGAGATATAAGACAAGCACATTGYAATCTTAGTAGAGAACAATGGAATAACACTTTAGAA
AAGGTAGCTGCAAAATTAAGAGAACAATTTGAAAATAAGACAATAATCTTTAATCACTCCTCAGGAGGGG
ACCCAGAAATTGTAATGCACAGTTTTAATTGTGGAGGRGAATTTTTCTATTGTAATACAACACAGCTGTT
TAATAGTACTTGGAATAGTACAGGGTCAAATAACRCTAAAGGAGATGAMGTTATCACACTCCCATGCAGA
ATAAAACAAATTGTAAATATGTGGCAGGAAGTAGGAAAAGCAATGTATGCCCCTCCCATCAGWGGACAAA
TTAATTGTTCGTCAAATATTACAGGGCTGCTATTAACAAGAGACGGYGGTAATAATAATAACMTCCAAAA
TGAGACCTTCAGACCTGGAGGAGGAAATATGAAGGACAATTGGAGAAGTGAATTATATAAATATAAAGTA
GTAAARATACAACCATTAGGA

  • The ORF's for the gp120 sequence were analyzed by placing the gp120 sequence ORF Finder box and pressing OrfFind
  • The ORF Finder output for the gp120 sequence is shown below:

  • The results of the ORF finder tell us the amino acid sequences that will be made through translation of the sequence in the six different ORFs shown. In this case, since the gp120 sequence used was determined from a gp120 protein isolate, the ORF containing the longest or most representative amino acid sequence can usually be assumed to be the correct ORF or the ORF most likely to be biologically relevant.
  • Based on the results of the ORF Finder for the gp120 sequence it can assumed that the +1 ORF is the most likely to be biologically relevant for the sequence.

Working with a Single Protein Sequence


HIV Structure Project

Links

  1. Chris Rhodes User Page
  2. Week 2 Journal
  3. Week 3 Journal
  4. Week 4 Journal
  5. Week 5 Journal
  6. Week 6 Journal
  7. Week 7 Journal
  8. Week 8 Journal
  9. Week 9 Journal
  10. Week 10 Journal
  11. Week 11 Journal
  12. Week 12 Journal
  13. Week 13 Journal
  14. Week 14 Journal
  15. Home Page
  16. Week 5 Assignment Page
  17. Week 6 Assignment Page
  18. Week 7 Assignment Page
  19. Week 8 Assignment Page
  20. Week 9 Assignment Page
  21. Week 10 Assignment Page
  22. Week 11 Assignment Page
  23. Week 12 Assignment Page
  24. Week 13 Assignment Page
  25. Week 14 Assignment Page