Angela A. Garibaldi Week 8

From OpenWetWare
Jump to navigationJump to search

Retrieving Protein Sequences

  1. Go to UniProt www.expasy.org/sprot UniProt
  2. Enter dUTPase in search window. This produces more than 3 relevant sequences, so found DUT ECOLI (P06968) on page 4
  3. Scroll down for FASTA format of amino acid sequences
  • In the case that your beginning information is not enough to find the protein sequence you seek,
  1. find the advanced search option. This no longer exists. You have to click the add and search button and a drop down menu will be displayed to give you the same search options as described in Figure 2-16 of the Bioinformatics for Dummeies

Retrieving a List of Related Protein Sequences

  1. Go to the Advanced Search UniProt as described above
  2. Because the advanced search is completely different, cannot deselect TrEMBL. Instead Select Reviewed- Yes as an alternative
  3. Input dUTPase in search again. There is no "description" field any longer.Yields many possibilities
  4. Since there are more than 211 total possibilities, so we selected entire first page of sequences (25)
  5. In newer version click retrieve at the bottom right corner instead of french button.
  6. Once you retrieve these, it is put into a list of which you can add to and then below choose the format you want the sequences in. No longer have to copy and paste into a document. FASTA format is available.

Reading a Swiss-Prot Entry

This time we skipped the example and did the activity using HIV gp120.

  1. Select the Reviewed - Yes. Our overall query to achieve these results: HIV gp120 AND reviewed:yes
  2. We selected the first option in the list
Entry Name: ENV_HV1H2 
Accession Number: P04578
  • Scroll down to Sequence Annotation - Region to Look at V3 sequence specifically.

ORFing your DNA Sequences

  1. Go to NCBI ORF Finder
  2. Input a DNA sequence for practice
I input the following sequence: >S7V1-1 
GAGATAGTAATTAGATCTGCCAATTTCACGGACAATACTAAGACCATAATAGTACAGCTGAATGTATCTG
TAGAAATTAATTGTACGAGACCCAACAACAATACAAGAAAAAGTATACCTATAGGACCAGGGAGAGCATT
TTATGCTACAGGAGAAATAATAGGGAATATAAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAAT
AACACTTTAAAACAGATAGCTACAAAATTAAGAAAACAATTTGAGAATAAAACAATAGTCTTTAATCAAT
CCTCA

Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means. ExPASy also has a translation tool you can use here

  • Based on the ExPASy tool, the following amino acid sequence was the only viable ORF. All others had stop codons within the first few codons

E I V I R S A N F T D N T K T I I V Q L N V S V E I N C T R P N N N T R K S I P I G P G R A F Y A T G E I I G N I R Q A H C N I S R A K W N N T L K Q I A T K L R K Q F E N K T I V F N Q S S

Working with a single protein sequence