Angela A. Garibaldi Week 8
From OpenWetWare
Retrieving Protein Sequences
- Go to UniProt www.expasy.org/sprot UniProt
- Enter dUTPase in search window. This produces more than 3 relevant sequences, so found DUT ECOLI (P06968) on page 4
- Scroll down for FASTA format of amino acid sequences
- In the case that your beginning information is not enough to find the protein sequence you seek,
- find the advanced search option. This no longer exists. You have to click the add and search button and a drop down menu will be displayed to give you the same search options as described in Figure 2-16 of the Bioinformatics for Dummeies
Retrieving a List of Related Protein Sequences
- Go to the Advanced Search UniProt as described above
- Because the advanced search is completely different, cannot deselect TrEMBL. Instead Select Reviewed- Yes as an alternative
- Input dUTPase in search again. There is no "description" field any longer.Yields many possibilities
- Since there are more than 211 total possibilities, so we selected entire first page of sequences (25)
- In newer version click retrieve at the bottom right corner instead of french button.
- Once you retrieve these, it is put into a list of which you can add to and then below choose the format you want the sequences in. No longer have to copy and paste into a document. FASTA format is available.
Reading a Swiss-Prot Entry
This time we skipped the example and did the activity using HIV gp120.
- Select the Reviewed - Yes. Our overall query to achieve these results: HIV gp120 AND reviewed:yes
- We selected the first option in the list
Entry Name: ENV_HV1H2 Accession Number: P04578
- Scroll down to Sequence Annotation - Region to Look at V3 sequence specifically.
ORFing your DNA Sequences
- Go to NCBI ORF Finder
- Input a DNA sequence for practice
I input the following sequence: >S7V1-1 GAGATAGTAATTAGATCTGCCAATTTCACGGACAATACTAAGACCATAATAGTACAGCTGAATGTATCTG TAGAAATTAATTGTACGAGACCCAACAACAATACAAGAAAAAGTATACCTATAGGACCAGGGAGAGCATT TTATGCTACAGGAGAAATAATAGGGAATATAAGACAAGCACATTGTAACATTAGTAGAGCAAAATGGAAT AACACTTTAAAACAGATAGCTACAAAATTAAGAAAACAATTTGAGAATAAAACAATAGTCTTTAATCAAT CCTCA
Compare your results with the SWISS-PROT entry you found for the protein above to decipher what the output means. ExPASy also has a translation tool you can use here
- Based on the ExPASy tool, the following amino acid sequence was the only viable ORF. All others had stop codons within the first few codons
E I V I R S A N F T D N T K T I I V Q L N V S V E I N C T R P N N N T R K S I P I G P G R A F Y A T G E I I G N I R Q A H C N I S R A K W N N T L K Q I A T K L R K Q F E N K T I V F N Q S S