Jmenzago Week 13
Purpose
The purpose of this assignment is to use various bioinformatical tools to analyze the spike glycoprotein of SARS-CoV-2 to better understand its structure-function relationship
Combined Methods/Results
Converting the DNA sequence to a protein sequence
- The sequence converted was that of surface glycoprotein [Severe acute respiratory syndrome coronavirus 2]
>spike protein (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1) DNA sequence ATGTTTGTTTTTCTTGTTTTATTGCCACTAGTCTCTAGTCAGTGTGTTAATCTTACAACCAGAACTCA ATTACCCCCTGCATACACTAATTCTTTCACACGTGGTGTTTATTACCCTGACAAAGTTTTCAGATCCTCA GTTTTACATTCAACTCAGGACTTGTTCTTACCTTTCTTTTCCAATGTTACTTGGTTCCATGCTATACATG TCTCTGGGACCAATGGTACTAAGAGGTTTGATAACCCTGTCCTACCATTTAATGATGGTGTTTATTTTGC TTCCACTGAGAAGTCTAACATAATAAGAGGCTGGATTTTTGGTACTACTTTAGATTCGAAGACCCAGTCC CTACTTATTGTTAATAACGCTACTAATGTTGTTATTAAAGTCTGTGAATTTCAATTTTGTAATGATCCAT TTTTGGGTGTTTATTACCACAAAAACAACAAAAGTTGGATGGAAAGTGAGTTCAGAGTTTATTCTAGTGC GAATAATTGCACTTTTGAATATGTCTCTCAGCCTTTTCTTATGGACCTTGAAGGAAAACAGGGTAATTTC AAAAATCTTAGGGAATTTGTGTTTAAGAATATTGATGGTTATTTTAAAATATATTCTAAGCACACGCCTA TTAATTTAGTGCGTGATCTCCCTCAGGGTTTTTCGGCTTTAGAACCATTGGTAGATTTGCCAATAGGTAT TAACATCACTAGGTTTCAAACTTTACTTGCTTTACATAGAAGTTATTTGACTCCTGGTGATTCTTCTTCA GGTTGGACAGCTGGTGCTGCAGCTTATTATGTGGGTTATCTTCAACCTAGGACTTTTCTATTAAAATATA ATGAAAATGGAACCATTACAGATGCTGTAGACTGTGCACTTGACCCTCTCTCAGAAACAAAGTGTACGTT GAAATCCTTCACTGTAGAAAAAGGAATCTATCAAACTTCTAACTTTAGAGTCCAACCAACAGAATCTATT GTTAGATTTCCTAATATTACAAACTTGTGCCCTTTTGGTGAAGTTTTTAACGCCACCAGATTTGCATCTG TTTATGCTTGGAACAGGAAGAGAATCAGCAACTGTGTTGCTGATTATTCTGTCCTATATAATTCCGCATC ATTTTCCACTTTTAAGTGTTATGGAGTGTCTCCTACTAAATTAAATGATCTCTGCTTTACTAATGTCTAT GCAGATTCATTTGTAATTAGAGGTGATGAAGTCAGACAAATCGCTCCAGGGCAAACTGGAAAGATTGCTG ATTATAATTATAAATTACCAGATGATTTTACAGGCTGCGTTATAGCTTGGAATTCTAACAATCTTGATTC TAAGGTTGGTGGTAATTATAATTACCTGTATAGATTGTTTAGGAAGTCTAATCTCAAACCTTTTGAGAGA GATATTTCAACTGAAATCTATCAGGCCGGTAGCACACCTTGTAATGGTGTTGAAGGTTTTAATTGTTACT TTCCTTTACAATCATATGGTTTCCAACCCACTAATGGTGTTGGTTACCAACCATACAGAGTAGTAGTACT TTCTTTTGAACTTCTACATGCACCAGCAACTGTTTGTGGACCTAAAAAGTCTACTAATTTGGTTAAAAAC AAATGTGTCAATTTCAACTTCAATGGTTTAACAGGCACAGGTGTTCTTACTGAGTCTAACAAAAAGTTTC TGCCTTTCCAACAATTTGGCAGAGACATTGCTGACACTACTGATGCTGTCCGTGATCCACAGACACTTGA GATTCTTGACATTACACCATGTTCTTTTGGTGGTGTCAGTGTTATAACACCAGGAACAAATACTTCTAAC CAGGTTGCTGTTCTTTATCAGGATGTTAACTGCACAGAAGTCCCTGTTGCTATTCATGCAGATCAACTTA CTCCTACTTGGCGTGTTTATTCTACAGGTTCTAATGTTTTTCAAACACGTGCAGGCTGTTTAATAGGGGC TGAACATGTCAACAACTCATATGAGTGTGACATACCCATTGGTGCAGGTATATGCGCTAGTTATCAGACT CAGACTAATTCTCCTCGGCGGGCACGTAGTGTAGCTAGTCAATCCATCATTGCCTACACTATGTCACTTG GTGCAGAAAATTCAGTTGCTTACTCTAATAACTCTATTGCCATACCCACAAATTTTACTATTAGTGTTAC CACAGAAATTCTACCAGTGTCTATGACCAAGACATCAGTAGATTGTACAATGTACATTTGTGGTGATTCA ACTGAATGCAGCAATCTTTTGTTGCAATATGGCAGTTTTTGTACACAATTAAACCGTGCTTTAACTGGAA TAGCTGTTGAACAAGACAAAAACACCCAAGAAGTTTTTGCACAAGTCAAACAAATTTACAAAACACCACC AATTAAAGATTTTGGTGGTTTTAATTTTTCACAAATATTACCAGATCCATCAAAACCAAGCAAGAGGTCA TTTATTGAAGATCTACTTTTCAACAAAGTGACACTTGCAGATGCTGGCTTCATCAAACAATATGGTGATT GCCTTGGTGATATTGCTGCTAGAGACCTCATTTGTGCACAAAAGTTTAACGGCCTTACTGTTTTGCCACC TTTGCTCACAGATGAAATGATTGCTCAATACACTTCTGCACTGTTAGCGGGTACAATCACTTCTGGTTGG ACCTTTGGTGCAGGTGCTGCATTACAAATACCATTTGCTATGCAAATGGCTTATAGGTTTAATGGTATTG GAGTTACACAGAATGTTCTCTATGAGAACCAAAAATTGATTGCCAACCAATTTAATAGTGCTATTGGCAA AATTCAAGACTCACTTTCTTCCACAGCAAGTGCACTTGGAAAACTTCAAGATGTGGTCAACCAAAATGCA CAAGCTTTAAACACGCTTGTTAAACAACTTAGCTCCAATTTTGGTGCAATTTCAAGTGTTTTAAATGATA TCCTTTCACGTCTTGACAAAGTTGAGGCTGAAGTGCAAATTGATAGGTTGATCACAGGCAGACTTCAAAG TTTGCAGACATATGTGACTCAACAATTAATTAGAGCTGCAGAAATCAGAGCTTCTGCTAATCTTGCTGCT ACTAAAATGTCAGAGTGTGTACTTGGACAATCAAAAAGAGTTGATTTTTGTGGAAAGGGCTATCATCTTA TGTCCTTCCCTCAGTCAGCACCTCATGGTGTAGTCTTCTTGCATGTGACTTATGTCCCTGCACAAGAAAA GAACTTCACAACTGCTCCTGCCATTTGTCATGATGGAAAAGCACACTTTCCTCGTGAAGGTGTCTTTGTT TCAAATGGCACACACTGGTTTGTAACACAAAGGAATTTTTATGAACCACAAATCATTACTACAGACAACA CATTTGTGTCTGGTAACTGTGATGTTGTAATAGGAATTGTCAACAACACAGTTTATGATCCTTTGCAACC TGAATTAGACTCATTCAAGGAGGAGTTAGATAAATATTTTAAGAATCATACATCACCAGATGTTGATTTA GGTGACATCTCTGGCATTAATGCTTCAGTTGTAAACATTCAAAAAGAAATTGACCGCCTCAATGAGGTTG CCAAGAATTTAAATGAATCTCTCATCGATCTCCAAGAACTTGGAAAGTATGAGCAGTATATAAAATGGCC ATGGTACATTTGGCTAGGTTTTATAGCTGGCTTGATTGCCATAGTAATGGTGACAATTATGCTTTGCTGT ATGACCAGTTGCTGTAGTTGTCTCAAGGGCTGTTGTTCTTGTGGATCCTGCTGCAAATTTGATGAAGACG ACTCTGAGCCAGTGCTCAAAGGAGTCAAATTACATTACACATAA
- Insert the FASTA sequence of the genome into the text box on NCBI Open Reading Frame Finder then click "submit"
- Translated protein sequence:
>lcl|ORF1 MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHS TQDLFLPFFSNVTWFHAIHVSGTNGTKRFDNPVLPFNDGVYFASTEKSNI IRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPFLGVYYHKNNK SWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGY FKIYSKHTPINLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLT PGDSSSGWTAGAAAYYVGYLQPRTFLLKYNENGTITDAVDCALDPLSETK CTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASV YAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSF VIRGDEVRQIAPGQTGKIADYNYKLPDDFTGCVIAWNSNNLDSKVGGNYN YLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYFPLQSYGFQPT NGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTG VLTESNKKFLPFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITP GTNTSNQVAVLYQDVNCTEVPVAIHADQLTPTWRVYSTGSNVFQTRAGCL IGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLG AENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECS NLLLQYGSFCTQLNRALTGIAVEQDKNTQEVFAQVKQIYKTPPIKDFGGF NFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGDIAARDLI CAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAM QMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQD VVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGR LQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLM SFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGT HWFVTQRNFYEPQIITTDNTFVSGNCDVVIGIVNNTVYDPLQPELDSFKE ELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDL QELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSC GSCCKFDEDDSEPVLKGVKLHYT
- Results on page:
- The correct reading frame for these results is ORF1
- Correct reading frame can be found by looking for the one that begins with the start codon (AUG), covers the sequence for the entire protein, and runs from 5' to 3'
- AUG represented as "M" on the software
- Correct reading frame can be checked on NCBI protein record
- Correct reading frame can be found by looking for the one that begins with the start codon (AUG), covers the sequence for the entire protein, and runs from 5' to 3'
Determining what is already known about the S protein
- Information about the S protein was found using UniProt Knowledgebase (UniProt KB)
- Run a search with "SARS-CoV" as the only keyword
- Produced 70 reviewed results and 748 unreviewed results
- Run a search using the accession number "P59594"
- Produced one reviewed result
- Information provided in the database entry
- Protein name, gene, and organism it is from
- Function of all its subuints (S1,S2,S2')
- Different names and taxonomy of the protein
- Subcellular location of the protein
- Pathology and biotech
- Post-translational modification and processing
- Protein interaction
- Structure
- Family and domains
- Amino acid and genetic sequences
- Similar proteins
- Cross-references
Analyzing the S protein sequence
- Protein sequence analyzed was taken from Yan et al. (2020)
- 6M17
- Links to the 2019-nCoV RBD/ACE2-B0AT1 complex discussed in the paper
- The FASTA file has three sections, one for each part of the complex
- The FASTA sequence for the S protein is the 3rd one on the file
- Links to the 2019-nCoV RBD/ACE2-B0AT1 complex discussed in the paper
- 6M17
- Insert the FASTA sequence for the S protein into the text box on PredictProtein server and click "PredictProtein"
- Results:
- The sequence was 223 amino acids long
- There were 90 aligned proteins
- There were 31 matched PDB structures
- PredictProtein also provides structure and function annotations
- Structure annotations available:
- Secondary structure and solvent accessiblity
- Transmembrane helices
- Protein disorder and flexibility
- Disulphide bridges
- Function annotations available:
- Effect of point mutations
- Gene ontology terms
- Subcellular localization
- Binding sites
- Structure annotations available:
- The sequence was 223 amino acids long
- Results:
- Whereas UniProt provides a collection of known information about a protein, PredictProtein offers a variety of predictions on how a protein would change its structure of function is part of the sequence is altered.
Image of resulting predicted features for S protein:
Analyzing a 3D model of the S protein
- Protein structure from Yan et al. (2020) 6M17 used for this task
- Structure is the 2019-nCoV RBD/ACE2-B0AT1 complex, not just the S protein
- Model analyzed using NCBI Structure viewer iCn3D
- Search for protein on NCBI using PDB ID (6M17 for Yan et al.)
- Click on "full-feature 3D viewer" to interact with protein
- Rotate the 3D model so that it is oriented in the same way as Figure 4A from Yan et al. (2020)
- There are two RBD-PD complexes in the entire structure, it does not matter which is used to match Figure 4A
- To hide other tertiary structures:
- Select the structure by clicking on it
- On the tabs at the top, go to View->Hide Selection
- To change the color of a structure:
- Select a structure
- On the tabs at the top, go to Color->Unicolor->select desired color
- To show N and C termini:
- Select a structure
- On the tabs at the top, go to View->Label->N- & C- Termini
- To highlight secondary structures in a complex
- Select a structure
- On the tabs at the top, go to Color->Secondary->select desired color
Figure 4A from Yan et al. (2020)
- Top left - Figure 4A from Yan et al. (2020)
- Top right - Replicate of whole RBD-PD complex in Figure 4A
- Alpha helices and beta sheets not labeled like in Figure 4A
- It is difficult to see beta3 and beta4 unless the secondary structures are different colors
- For labeled secondary structures, see image with colored secondary structures below
- Alpha helices and beta sheets not labeled like in Figure 4A
- Bottom left - Replicate of RBD-PD bridge from Figure 4A
- Unable to show polar interactions
- Bottom right - Replicate of RBD-PD bridge rotated 180 degrees from Figure 4A
- Unable to show polar interactions
- Image shows the secondary structures and N and C termini in RBD-PD complex from Figure 4A
- SARS-CoV-RBD
- No alpha helices
- Beta sheets yellow
- N and C termini labeled in yellow
- ACE2-PD
- Alpha helices in red
- Beta sheets in green
- N terminus labeled in cyan
- C terminus does not appear in figure from Yan et al. (2020), so it is not labeled in the replicate
- SARS-CoV-RBD
- The image is similar to the results generated by PredictProtein
- Most of the S protein was predicted to consist of loops (grey), and about 25% of it would be made of beta strands (yellow), which is reflected in the image above
- About 5% was predicted to be alpha helices, which was incorrect as there are no helices in the image above
- The ACE2-PD was predicted to be about 50% loops and 50% helices, with about 5% of it made up of strands
- This seems correct based on the image above, as it looks to be made of an even amount of loops (green) and helices (red), with a small amount of strands (green)
- Most of the S protein was predicted to consist of loops (grey), and about 25% of it would be made of beta strands (yellow), which is reflected in the image above
- All amino acids discussed in the paper came from any of the four labeled secondary structures (marked in blue)
- Primarily alpha1
Beginning your research project
- When comparing the RBD-PD complex of SARS-CoV and SARS-CoV-2, Yan et al. (2020) found that differences in residues lead to weaker interactions. Walls et al. (2020) discuss that when the SARS outbreak of 2002 reemerged in 2003-04, the virus had a weaker interaction with ACE2 and patients showed milder symptoms. This research project will further explore the role that certain amino acids play in the structure-function relationship of SARS-CoV-2 and ACE2.
- ACE2 sequences from humans, mice, and bats will be analyzed
- Humans because our race is currently in a pandemic because of this virus
- Mice because they have been unaffected by the virus, and it could be because of the interaction between ACE2 and the S protein
- Bats because it is theorized that the virus jumped from them to humans
- Sequences will be taken from UniProt and will be compared through a multiple sequence alignment. Their 3D structures will also be compared using iCn3D or other similar programs to see if any apparent structural differences will weaken or inhibit binding to the S protein
Scientific Conclusion
The purpose of this assignment was to use various bioinformatical tools to analyze the spike glycoprotein of SARS-CoV-2. Working with these tools to visualize and predict aspects of the protein from its sequence built familiarity with the software and how they can be used in conjunction to answer potential research questions. Using a few of the tools from this assignment to take a closer look at the role that polar residues play in the strength of the interaction between ACE2 and SARS-CoV-2 can provide insight into the structure-function relationship of the two.
Acknowledgements
- My homework partners for the week were Drew Cartmel and Nicholas Yeo
- We communicated multiple times through Zoom or texts to brainstorm about out research project
- I followed the instructions on BIOL368/S20:Week 13 to complete this assignment
- Syntax for links to any software or protein structures used for this assignment were copied from this page
- Citation for Walls et al. (2020) copied from this page
- DNA sequence for spike protein (Severe acute respiratory syndrome coronavirus 2 isolate Wuhan-Hu-1) used in task 1 copied from this page
- I used MediaWiki Help:Images to learn how to resize the images on this page.
- Except for what is noted above, this individual journal entry was completed by me and not copied from another source.
Jmenzago (talk) 23:01, 22 April 2020 (PDT)
References
- MediaWiki. (2020). Help:Images. Retrieved April 22, 2020, from https://www.mediawiki.org/wiki/Help:Images
- OpenWetWare. (2020). BIOL368/S20:Week 13. Retrieved April 20, 2020 from https://openwetware.org/wiki/BIOL368/S20:Week_13.
- NCBI. (2020). 6M17: The 2019-nCoV RBD/ACE2-B0AT1 complex. Retrieved April 22, 2020, from https://www.ncbi.nlm.nih.gov/Structure/pdb/6M17.
- NCBI. (n.d.). Home - ORFfinder - NCBI. Retrieved April 22, 2020, from https://www.ncbi.nlm.nih.gov/orffinder/.
- NCBI. (n.d.). Surface glycoprotein [Severe acute respiratory syndrome coronavirus 2]. Retrieved April 22, 2020, from https://www.ncbi.nlm.nih.gov/protein/1791269090.
- PredictProtein (2020). RostLab. Retrieved April 22, 2020, from https://open.predictprotein.org/
- RCSB PDB. (2020). 6M17:The 2019-nCoV RBD/ACE2-B0AT1 complex. Retrieved April 22, 2020, from https://www.rcsb.org/structure/6M17.
- Walls, A. C., Park, Y. J., Tortorici, M. A., Wall, A., McGuire, A. T., & Veesler, D. (2020). Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell. DOI: 10.1016/j.cell.2020.02.058.
- Yan, R., Zhang, Y., Li, Y., Xia, L., Guo, Y., & Zhou, Q. (2020). Structural basis for the recognition of SARS-CoV-2 by full-length human ACE2. Science, 367(6485), 1444-1448. doi: 10.1126/science.abb2762.