BIOL368/S19 Week 4

From OpenWetWare
Jump to navigationJump to search



BIOL368/F20 BIOL368/F20:People

Purpose

The purpose of this assignment is to use GenBank to find genetic sequences. Moreover to learn how to find protein sequence and compare sequences with one another and to be able to create phylogenetic trees and compare them too.

Methods and results

Part 1: GneBank

  • Coronavirus BtRs-BetaCoV/YN2018B GenBank records were chosen and both the full record and the FASTA formatted sequence were viewed.
    • An accession number of the sequence was chosen: MK211376
    • Information provided in the GenBank record was noted.
    • The nucleotide sequence was downloaded in FASTA formate to my local hard drive
    • The file was then opened in Word processor to make sure that it was downloaded and is in FASTA format, in FASTA format each sequence is preceded by a label which begins with the greater than sign (>).


  • Links have been provided to the individual spike protein sequences corresponding to each of the viral genome records listed in the Data & Tools section.
    • I was assigned a nucleotide sequence accession number from Figure 2 in class.
      • spike glycoprotein [SARS coronavirus GD03T0013
    • The GenBank record associated with that sequence was searched and a hyperlink to the GenBank record was added to the list of sequences in the Data & Tools section.
    • The spike protein accession number in the GenBank was recorded: AAS10463
    • A hyperlink to the spike protein record to the list of sequences in the Data & Tools section was added with making sure that to format the list in the same way as it is already formatted.
    • Assigned protein sequence was downloaded in FASTA format like it was done for the whole genome sequence.
    • The protein sequence was added to the "Talk" page.

>AAS10463.1 spike glycoprotein [SARS coronavirus

GD03T0013]MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFHTINHTFDDPVIPFK   
DGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFVVSKPMGTRTHTMIFDNAFNCTFEYISDAFSLDVSEKS   
GNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLPSGFNTLKPIFKLPLGINITNFRAILTAFSPAQDTWGTSAAAYFVGYLKPTTFMLKYDEN  
GTITDAVDCSQNPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKRISNCVADYSVLYNSTSFST  
FKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCVLAWNTRNIDATSTGNYNYKYRYLRHGKLRPFER   
DISNVPFSPDGKPCTPPAPNCYWPLNGYGFYTTSGIGYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRF   
QPFQQFGRDVSDFTDSVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLIHAEQLTPAWRIYSTGNNVFQTQAGCLI.   
GAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNFSISITTEVMPVSMAKTSVDCNMYICGDSTECA  
NLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQMYKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLG 
DINARDLICAQKFNGLTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAIS 
QIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAA  
TKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITT  
DNTFVSGNCDVVIGIINNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRLNEVAKNLNESLIDLQELGKYEQYIK   
WPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT 

figure 1. protein sequance for AAS10463

Part 2: Creating a phylogenetic tree with Phylogeny.fr

  • the Phylogeny.fr, a free, simple to use web service dedicated to reconstructing and analyzing phylogenetic relationships between molecular sequences was used to analyze sequence data.
  1. In the browser, the website www.phylogeny.fr. was opened, down on the page there was a section labeled ‘Phylogeny analysis’, and it was clicked on the text ‘One Click’.
  2. Then it was Clicked in the large text field labeled ‘Upload your set of sequences in FASTA, EMBL, or NEXUS format’. Then the list of sequences from the talk page was Copied and Command-V was used to paste the sequences there, then the “Submit” button was clicked.
  3. a page named Alignment results was seen. After alignment was complete, a new page named Phylogeny results was seen. Finally, a page named Tree rendering results was seen. Those pages were used later in the methods. For this part of the methods, the numbered tabs located just beneath the text One Click Mode was found, and the tab labeled 3. The alignment was clicked. Individual positions were color-coded to indicate their conservation, or how similar the sequences are to each other. Blue highlights indicated high conservation, while gray highlights indicated lower conservation and white highlights indicated little if any conservation.
  4. Near the bottom of the page, under Outputs, Alignment in Clustal format was clicked. This displayed alignment in a text-only format in which each position's conservation was indicated by a symbol underneath the alignment block (“*” for invariant, “:” for highly conserved, “.” for weakly conserved, and a space for not conserved). The entire alignment was then copied and pasted into the individual journal entry and the space character at the beginning of each line was used so that the sequence lines up properly on the page.


>QHD43416.1 surface glycoprotein [Severe acute respiratory syndrome coronavirus 2]
MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHSTQDLFLPFFSNVTWFHAIHV
SGTNGTKRFDNPVLPFNDGVYFASTEKSNIIRGWIFGTTLDSKTQSLLIVNNATNVVIKVCEFQFCNDPF
LGVYYHKNNKSWMESEFRVYSSANNCTFEYVSQPFLMDLEGKQGNFKNLREFVFKNIDGYFKIYSKHTPI
NLVRDLPQGFSALEPLVDLPIGINITRFQTLLALHRSYLTPGDSSSGWTAGAAAYYVGYLQPRTFLLKYN
ENGTITDAVDCALDPLSETKCTLKSFTVEKGIYQTSNFRVQPTESIVRFPNITNLCPFGEVFNATRFASV
YAWNRKRISNCVADYSVLYNSASFSTFKCYGVSPTKLNDLCFTNVYADSFVIRGDEVRQIAPGQTGKIAD
YNYKLPDDFTGCVIAWNSNNLDSKVGGNYNYLYRLFRKSNLKPFERDISTEIYQAGSTPCNGVEGFNCYF
PLQSYGFQPTNGVGYQPYRVVVLSFELLHAPATVCGPKKSTNLVKNKCVNFNFNGLTGTGVLTESNKKFL
PFQQFGRDIADTTDAVRDPQTLEILDITPCSFGGVSVITPGTNTSNQVAVLYQDVNCTEVPVAIHADQLT
PTWRVYSTGSNVFQTRAGCLIGAEHVNNSYECDIPIGAGICASYQTQTNSPRRARSVASQSIIAYTMSLG
AENSVAYSNNSIAIPTNFTISVTTEILPVSMTKTSVDCTMYICGDSTECSNLLLQYGSFCTQLNRALTGI
AVEQDKNTQEVFAQVKQIYKTPPIKDFGGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDC
LGDIAARDLICAQKFNGLTVLPPLLTDEMIAQYTSALLAGTITSGWTFGAGAALQIPFAMQMAYRFNGIG
VTQNVLYENQKLIANQFNSAIGKIQDSLSSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDI
LSRLDKVEAEVQIDRLITGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLM
SFPQSAPHGVVFLHVTYVPAQEKNFTTAPAICHDGKAHFPREGVFVSNGTHWFVTQRNFYEPQIITTDNT
FVSGNCDVVIGIVNNTVYDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVA
KNLNESLIDLQELGKYEQYIKWPWYIWLGFIAGLIAIVMVTIMLCCMTSCCSCLKGCCSCGSCCKFDEDD
SEPVLKGVKLHYT
>AAP13441.1 S protein [SARS coronavirus Urbani]
MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFH
TINHTFGNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFAV
SKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLP
SGFNTLKPIFKLPLGINITNFRAILTAFSPAQDIWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQ
NPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKKISNCVA
DYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
LAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIG
YQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTD
SVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNNVFQ
TQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNF
SISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQM
YKTPTLKYFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFN
KAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLIT
GRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYV
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVY
DPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ
YIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>QDF43825.1 spike glycoprotein [Coronavirus BtRs-BetaCoV/YN2018B]
MKLLVLVFATLVSSYTIEKCTDFDDRTPPSNTQFLSSHRGVYYPDDIFRSNVLHLVQDHFLPFDSNVTRF
ITFGLNFDNPIIPFRDGVYFAATEKSNVIRGWVFGSTMNNKSQSVIIMNNSTNLVIRACNFELCDNPFFV
VLRSNNTQIPSYIFNNAFNCTFEYVSKDFNLDIGEKPGNFKDLREFVFRNKDGFLHVYSGYQPISAASGL
PTGFNALKPIFKLPLGINITNFRTLLTAFPPNPGYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCS
QNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSVYAWERKRISNCV
ADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGC
VLAWNTRNIDATSTGNYNYKYRSLRHGKLRPFERDISNVPFSPDGKPCTPPAFNCYWPLNDYGFFTTNGI
GYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFT
DSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAIHADQLTPAWRIYSTGNNVF
QTQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTN
FSISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
MYKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNG
LTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQF
NKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLI
TGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTY
VPSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGSCDVVIGIINNTV
YDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE
QYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AFS88936.1 S protein [Human betacoronavirus 2c EMC/2012]
MIHSVFLLMFLLTPTESYVDVGPDSVKSACIEVDIQQTFFDKTWPRPIDVSKADGIIYPQGRTYSNITIT
YQGLFPYQGDHGDMYVYSAGHATGTTPQKLFVANYSQDVKQFANGFVVRIGAAANSTGTVIISPSTSATI
RKIYPAFMLGSSVGNFSDGKMGRFFNHTLVLLPDGCGTLLRAFYCILEPRSGNHCPAGNSYTSFATYHTP
ATDCSDGNYNRNASLNSFKEYFNLRNCTFMYTYNITEDEILEWFGITQTAQGVHLFSSRYVDLYGGNMFQ
FATLPVYDTIKYYSIIPHSIRSIQSDRKAWAAFYVYKLQPLTFLLDFSVDGYIRRAIDCGFNDLSQLHCS
YESFDVESGVYSVSSFEAKPSGSVVEQAEGVECDFSPLLSGTPPQVYNFKRLVFTNCNYNLTKLLSLFSV
NDFTCSQISPAAIASNCYSSLILDYFSYPLSMKSDLSVSSAGPISQFNYKQSFSNPTCLILATVPHNLTT
ITKPLKYSYINKCSRLLSDDRTEVPQLVNANQYSPCVSIVPSTVWEDGDYYRKQLSPLEGGGWLVASGST
VAMTEQLQMGFGITVQYGTDTNSVCPKLEFANDTKIASQLGNCVEYSLYGVSGRGVFQNCTAVGVRQQRF
VYDAYQNLVGYYSDDGNYYCLRACVSVPVSVIYDKETKTHATLFGSVACEHISSTMSQYSRSTRSMLKRR
DSTYGPLQTPVGCVLGLVNSSLFVEDCKLPLGQSLCALPDTPSTLTPRSVRSVPGEMRLASIAFNHPIQV
DQLNSSYFKLSIPTNFSFGVTQEYIQTTIQKVTVDCKQYVCNGFQKCEQLLREYGQFCSKINQALHGANL
RQDDSVRNLFASVKSSQSSPIIPGFGGDFNLTLLEPVSISTGSRSARSAIEDLLFDKVTIADPGYMQGYD
DCMQQGPASARDLICAQYVAGYKVLPPLMDVNMEAAYTSSLLGSIAGVGWTAGLSSFAAIPFAQSIFYRL
NGVGITQQVLSENQKLIANKFNQALGAMQTGFTTTNEAFQKVQDAVNNNAQALSKLASELSNTFGAISAS
IGDIIQRLDVLEQDAQIDRLINGRLTTLNAFVAQQLVRSESAALSAQLAKDKVNECVKAQSKRSGFCGQG
THIVSFVVNAPNGLYFMHVGYYPSNHIEVVSAYGLCDAANPTNCIAPVNGYFIKTNNTRIVDEWSYTGSS
FYAPEPITSLNTKYVAPQVTYQNISTNLPPPLLGNSTGIDFQDELDEFFKNVSTSIPNFGSLTQINTTLL
DLTYEMLSLQQVVKALNESYIDLKELGNYTYYNKWPWYIWLGFIAGLVALALCVFFILCCTGCGTNCMGK
LKCNRCCDRYEEYDLEPHKVHVH
>YP_001039953.1 spike glycoprotein [Tylonycteris bat coronavirus HKU4]
MTLLMCLLMSLLIFVRGCDSQFVDMSPASNTSECLESQVDAAAFSKLMWPYPIDPSKVDGIIYPLGRTYS
NITLAYTGLFPLQGDLGSQYLYSVSHAVGHDGDPTKAYISNYSLLVNDFDNGFVVRIGAAANSTGTIVIS
PSVNTKIKKAYPAFILGSSLTNTSAGQPLYANYSLTIIPDGCGTVLHAFYCILKPRTVNRCPSGTGYVSY
FIYETVHNDCQSTINRNASLNSFKSFFDLVNCTFFNSWDITADETKEWFGITQDTQGVHLYSSRKGDLYG
GNMFRFATLPVYEGIKYYTVIPRSFRSKANKREAWAAFYVYKLHQLTYLLDFSVDGYIRRAIDCGHDDLS
QLHCSYTSFEVDTGVYSVSSYEASATGTFIEQPNATECDFSPMLTGVAPQVYNFKRLVFSNCNYNLTKLL
SLFAVDEFSCNGISPDSIARGCYSTLTVDYFAYPLSMKSYIRPGSAGNIPLYNYKQSFANPTCRVMASVL
ANVTITKPHAYGYISKCSRLTGANQDVETPLYINPGEYSICRDFSPGGFSEDGQVFKRTLTQFEGGGLLI
GVGTRVPMTDNLQMSFIISVQYGTGTDSVCPMLDLGDSLTITNRLGKCVDYSLYGVTGRGVFQNCTAVGV
KQQRFVYDSFDNLVGYYSDDGNYYCVRPCVSVPVSVIYDKSTNLHATLFGSVACEHVTTMMSQFSRLTQS
NLRRRDSNIPLQTAVGCVIGLSNNSLVVSDCKLPLGQSLCAVPPVSTFRSYSASQFQLAVLNYTSPIVVT
PINSSGFTAAIPTNFSFSVTQEYIETSIQKVTVDCKQYVCNGFTRCEKLLVEYGQFCSKINQALHGANLR
QDESVYSLYSNIKTTSTQTLEYGLNGDFNLTLLQVPQIGGSSSSYRSAIEDLLFDKVTIADPGYMQGYDD
CMKQGPQSARDLICAQYVSGYKVLPPLYDPNMEAAYTSSLLGSIAGAGWTAGLSSFAAIPFAQSMFYRLN
GVGITQQVLSENQKLIANKFNQALGAMQTGFTTSNLAFSKVQDAVNANAQALSKLASELSNTFGAISSSI
SDILARLDTVEQDAQIDRLINGRLISLNAFVSQQLVRSETAARSAQLASDKVNECVKSQSKRNGFCGSGT
HIVSFVVNAPNGFYFFHVGYVPTNYTNVTAAYGLCNNNNPPLCIAPIDGYFITNQTTTYSVDTEWYYTGS
SFYKPEPITQANSRYVSSDVKFDKLENNLPPPLLENSTDVDFKDELEEFFKNVTSHGPNFAEISKINTTL
LDLSDEMAMLQEVVKQLNDSYIDLKELGNYTYYNKWPWYVWLGFIAGLVALLLCVFFLLCCTGCGTSCLG
KMKCKNCCDSYEEYDVEKIHVH
>QDF43820.1 spike glycoprotein [Coronavirus BtRs-BetaCoV/YN2018A]
MKILIFAFLVTLVEAQEGCGIISRKPQPKMAQVSSSRRGVYYNDDIFRSDVLHLTQDYFLPFDSNLTQYF
SLNVDSDRYTYFDNPILDFGDGVYFAATEKSNVIRGWIFGSTFDNTTQSAVIVNNSTHIIIRVCNFNLCK
EPMYTVSRGTQQSSWVYQSAFNCTYDRVERSFQLDTAPKTGNFKDLREYVFKNRDGFLSVYQTYTAVNLP
RGLPIGFSVLRPILKLPFGINITSYRVVMAMFSQTTSNFLPESAAYYVGNLKYTTFMLRFNENGTITDAI
DCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVVRFPNITNRCPFDKVFNASRFPNVYAWERTKIS
DCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQVAPGETGVIADYNYKLPDDF
TGCVIAWNTAKQDTGHYYYRSHRKTKLKPFERDLSSDDGNGVYTLSTYDFNPNVPVAYQATRVVVLSFEL
LNAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGRDTSDFTDSVRDPQTLEILDI
TPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAIRADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVN
ASYECDIPIGAGICASYHTASTLRSVGQKSIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVSM
AKTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGVALEQDKNTQEVFAQVKQMYKTPAIKDFGGFN
FSQILPDPSKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDMIA
AYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTT
STALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQ
LIRAAEIRASANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAI
CHEGKAYFPREGVFVSNGTFWFITQRNFYSPQIITTDNTFVAGNCDVVIGIINNTVYDPLQPELDSFKEE
LDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFI
AGLIAIVMATILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AAZ67052.1 spike protein [Bat SARS CoV Rp3/2004]
MKILILAFLASLAKAQEGCGIISRKPQPKMAQVSSSRRGVYYNDDIFRSNVLHLTQDYFLPFDSNLTQYF
SLNVDSDRFTYFDNPILDFGDGVYFAATEKSNVIRGWIFGSTFDNTTQSAVIVNNSTHIIIRVCNFNLCK
EPMYTVSRGAQQSSWVYQSAFNCTYDRVEKSFQLDTAPKTGNFKDLREYVFKNRDGFLSVYQTYTAVNLP
RGLPIGFSVLRPILKLPFGINITSYRVVMAMFSQTTSNFLPESAAYYVGNLKYTTFMLSFNENGTITNAI
DCAQNPLAELKCTIKNFNVSKGIYQTSNFRVSPTQEVIRFPNITNRCPFDKVFNATRFPNVYAWERTKIS
DCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQVAPGETGVIADYNYKLPDDF
TGCVIAWNTAKQDQGQYYYRSHRKTKLKPFERDLSSDENGVRTLSTYDFYPSVPVAYQATRVVVLSFELL
NAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTESSKRFQSFQQFGRDTSDFTDSVRDPQTLEILDIS
PCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPAAIHADQLTPAWRVYSTGTNVFQTQAGCLIGAEHVNA
SYECDIPIGAGICASYHTASTLRSVGQKSIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVSMA
KTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQMYKTPAIKDFGGFNF
SQILPDPSKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDISARDLICAQKFNGLTVLPPLLTDEMIAA
YTAALVSGTATAGWTFGAGSALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTS
TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQL
IRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAIC
HEGKAYFPREGVFVSNGTSWFITQRNFYSPQIITTDNTFVAGSCDVVIGIINNTVYDPLQPELDSFKEEL
DKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIA
GLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>QDF43835.1 spike glycoprotein [Coronavirus BtRs-BetaCoV/YN2018D]
MKVLIVLLCLGLVTAQDGCGHISTKPQPLLDKFSSSRRGVYYNDDIFRSDVLHLTQDYFLPFDTNLTRYL
SFNMDSATKVYFDNPTLPFGDGIYFAATEKSNVVRGWIFGSTMDNTTQSAIIVNNSTHIIIRVCYFNLCK
EPMYAISNEQHYKSWVYQNAYNCTYDRVEQSFQLDTAPQTGNFKDLREYVFKNKDGFLSVYNAYSPIDIP
RGLPVGFSVLKPILKLPIGINITSFKVVMSMFSRTTSNFLPEVAAYFVGNLKYSTFMLNFNENGTITDAI
DCAQNPLSELKCTIKNFNVSKGIYQTSNFRVSPTHEVIRFPNITNRCPFDKVFNASRFPNVYAWERTKIS
DCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQVAPGETGVIADYNYKLPDDF
TGCVIAWNTAKQDQGQYYYRSSRKTKLKPFERDLTSDENGVRTLSTYDFYPNVPIEYQATRVVVLSFELL
NAPATVCGPKLSTGLVKNQCVNFNFNGLRGTGVLTDSSKRFQSFQQFGRDTSDFTDSVRDPQTLEILDIT
PCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTAIRADQLTPAWRVYSTGINVFQTQAGCLIGAEHVNA
SYECDIPIGAGICASYHTASTLRSVGQKSIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVSMS
KTSVDCTMYICGDSQECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQMYKTPAIKDFGGFNF
SQILPDPSKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGLTVLPPLLTDDMIAA
YTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTS
TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQL
IRAAEIRASANLAATKMSECVLGQSKRVDFCGRGYHLMSFPQAAPHGVVFLHVTYVPSQEKNFTTAPAIC
HEGKAYFPREGVFVSNGTSWFITQRNFYSPQIITTDNTFVAGSCDVVIGIINNTVYDPLQPELDSFKEEL
DKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIA
GLIAIVMATILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>ALK02457.1 spike protein [SARS-like coronavirus WIV16]
MFIFLFFLTLTSGSDLESCTTFDDVQAPNYPQHSSSRRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFH
TINHRFDNPVIPFKDGVYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFAV
SKPTGTQTHTMIFDNAFNCTFEYISDSFSLDVAEKSGNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLP
SGFNILKPIFKLPLGINITNFRAILTAFLPAQDTWGTSAAAYFVGYLKPATFMLKYDENGTITDAVDCSQ
NPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSVYAWERKRISNCVA
DYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFTGCV
LAWNTRNIDATQTGNYNYKYRSLRHGKLRPFERDISNVPFSPDGKPCTPPAFNCYWPLNDYGFYITNGIG
YQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVLDFTD
SVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAIHADQLTPSWRVYSTGNNVFQ
TQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNF
SISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQM
YKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFN
KAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLIT
GRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYV
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGSCDVVIGIINNTVY
DPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ
YIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>ABD75323.1 spike protein [Bat SARS CoV Rf1/2004]
MKILIFAFLVTLVKAQEGCGVINLRTQPKLTQVSSSRRGVYYNDDIFRSDVLHLTQDYFLPFHSNLTQYF
SLNIESDKIVYFDNPILKFGDGVYFAATEKSNVIRGWVFGSTFDNTTQSAIIVNNSTHIIIRVCYFNLCK
DPMYTVSAGTQKSSWVYQSAFNCTYDRVEKSFQLDTSPKTGNFTDLREFVFKNRDGFFTAYQTYTPVNLL
RGLPSGLSVLKPILKLPFGINITSFRVVMAMFSKTTSNYVPESAAYYVGNLKQSTFMLSFNQNGTIVDAV
DCSQDPLAELKCTTKSFNVSKGIYQTSNFRVSPVTEVVRFPNITNLCPFDKVFNATRFPSVYAWERTKIS
DCVADYTVFYNSTSFSTFNCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQVAPGQTGVIADYNYKLPDDF
TGCVIAWNTAKQDVGSYFYRSHRSSKLKPFERDLSSEENGVRTLSTYDFNQNVPLEYQATRVVVLSFELL
NAPATVCGPKLSTSLVKNQCVNFNFNGFKGTGVLTDSSKTFQSFQQFGRDASDFTDSVRDPQTLRILDIS
PCSFGGVSVITPGTNTSSAVAVLYQDVNCTDVPRTIQADQLAPSWRVYTTGPYVFQTQAGCLIGAEHVNA
SYQCDIPIGAGICASYHTASHLRSTGQKSIVAYTMSLGAENSVAYANNSIAIPTNFSISVTTEVMPVSMA
KTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALSGIAVEQDKNTQEVFAQVKQMYKTPTIRDFGGFNF
SQILPDPLKPTKRSFIEDLLYNKVTLADAGFMKQYADCLGGINARDLICAQKFNGLTVLPPLLTDDMIAA
YTAALISGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAITQIQESLTTTS
TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSALNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQL
IRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYVPSQEKNFTTAPAIC
HEGKAYFPREGVFVSNGSSWFITQRNFYSPQIITTDNTFVAGSCDVVIGIINNTVYDPLQPELDSFKQEL
DKYFKNHTSPDVDLGDISGINASVVDIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIA
GLVGLFMAIILLCYFTSCCSCCKGMCSCGSCCRFDEDDSEPVLKGVKLHYT


>AAS10463.1 spike glycoprotein [SARS coronavirus GD03T0013]
MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFH
TINHTFDDPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFVV
SKPMGTRTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLP
SGFNTLKPIFKLPLGINITNFRAILTAFSPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQ
NPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKRISNCVA
DYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
LAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPAPNCYWPLNGYGFYTTSGIG
YQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTD
SVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTLIHAEQLTPAWRIYSTGNNVFQ
TQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNF
SISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCRQLNRALSGIAAEQDRNTREVFVQVKQM
YKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFN
KAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLIT
GRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYV
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVY
DPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQEEIDRLNEVAKNLNESLIDLQELGKYEQ
YIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AAP13567.1 putative E2 glycoprotein precursor [SARS coronavirus CUHK-W1]
MFIFLLFLTLTSGSDLDRCTTFDDVQAPNYTQHTSSMRGVYYPDEIFRSDTLYLTQDLFLPFYSNVTGFH
TINHTFDNPVIPFKDGIYFAATEKSNVVRGWVFGSTMNNKSQSVIIINNSTNVVIRACNFELCDNPFFAV
SKPMGTQTHTMIFDNAFNCTFEYISDAFSLDVSEKSGNFKHLREFVFKNKDGFLYVYKGYQPIDVVRDLP
SGFNTLKPIFKLPLGINITNFRAILTAFSPAQDTWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCSQ
NPLAELKCSVKSFEIDKGIYQTSNFRVVPSGDVVRFPNITNLCPFGEVFNATKFPSVYAWERKKISNCVA
DYSVLYNSTFFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFMGCV
LAWNTRNIDATSTGNYNYKYRYLRHGKLRPFERDISNVPFSPDGKPCTPPALNCYWPLNDYGFYTTTGIG
YQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFTD
SVRDPKTSEILDISPCSFGGVSVITPGTNASSEVAVLYQDVNCTDVSTAIHADQLTPAWRIYSTGNNVFQ
TQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSLLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTNF
SISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAAEQDRNTREVFAQVKQM
YKTPTLKYFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNGL
TVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFN
KAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLIT
GRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYV
PSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGNCDVVIGIINNTVY
DPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQ
YIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AVP78031.1 spike protein [Bat SARS-like coronavirus]
MLFFLFLQFALVNSQCVNLTGRTPLNPNYTNSSQRGVYYPDTIYRSDTLVLSQGYFLPFYSNVSWYYSLT
TNNAATKRTDNPILDFKDGIYFAATEHSNIIRGWIFGTTLDNTSQSLLIVNNATNVIIKVCNFDFCYDPY
LSGYYHNNKTWSIREFAVYSSYANCTFEYVSKSFMLNISGNGGLFNTLREFVFRNVDGHFKIYSKFTPVN
LNRGLPTGLSVLQPLVELPVSINITKFRTLLTIHRGDPMPNNGWTAFSAAYFVGYLKPRTFMLKYNENGT
ITDAVDCALDPLSETKCTLKSLTVQKGIYQTSNFRVQPTQSVVRFPNITNVCPFHKVFNATRFPSVYAWE
RTKISDCIADYTVFYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRFSEVRQVAPGQTGVIADYNYK
LPDDFTGCVIAWNTAKQDVGNYFYRSHRSTKLKPFERDLSSDENGVRTLSTYDFNPNVPLEYQATRVVVL
SFELLNAPATVCGPKLSTQLVKNQCVNFNFNGLKGTGVLTDSSKRFQSFQQFGKDASDFIDSVRDPQTLE
ILDITPCSFGGVSVITPGTNTSLEVAVLYQDVNCTDVPTTIHADQLTPAWRIYATGTNVFQTQAGCLIGA
EHVNASYECDIPIGAGICASYHTASILRSTSQKAIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVM
PVSMAKTSVDCTMYICGDSIECSNLLLQYGSFCTQLNRALSGIAIEQDKNTQEVFAQVKQIYKTPPIKDF
GGFNFSQILPDPSKPSKRSFIEDLLFNKVTLADAGFIKQYGDCLGGISARDLICAQKFNGLTVLPPLLTD
EMIAAYTAALISGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKLIANQFNSAIGKIQES
LTSTASALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTY
VTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQSAPHGVVFLHVTYIPSQEKNFTT
APAICHEGKAHFPREGVFVSNGTHWFVTQRNFYEPKIITTDNTFVSGNCDVVIGIINNTVYDPLQPELDS
FKEELDKYFKNHTSPDIDLGDISGINASVVNIQKEIDRLNEVARNLNESLIDLQELGKYEQYIKWPWYVW
LGFIAGLIAIVMVTILLCCMTSCCSCLKGCCSCGSCCKFDEDDSEPVLKGVKLHYT
>ABD75332.1 spike protein [Bat SARS CoV Rm1/2004]
MKVLIFALLFSLAKAQEGCGIISRKPQPKMEKVSSSRRGVYYNDDIFRSDVLHLTQDYFLPFDSNLTQYF
SLNIDSNKYTYFDNPILDFGDGVYFAATEKSNVIRGWIFGSSFDNTTQSAIIVNNSTHIIIRVCNFNLCK
EPMYTVSKGTQQSSWVYQSAFNCTYDRVEKSFQLDTAPKTGNFKDLREYVFKNKGGFLRVYQTYTAVNLP
RGFPAGFSVLRPILKLPFGINITSYRVVMTMFSQFNSNFLPESAAYYVGNLKYTTFMLSFNENGTITDAV
DCSQNPLAELKCTIKNFNVSKGIYQTSNFRVTPTQEVVRFPNITNRCPFDKVFNASRFPNVYAWERTKIS
DCVADYTVLYNSTSFSTFKCYGVSPSKLIDLCFTSVYADTFLIRSSEVRQVAPGETGVIADYNYKLPDDF
TGCVIAWNTAQQDQGQYYYRSYRKEKLKPFERDLSSDENGVYTLSTYDFYPSIPVEYQATRVVVLSFELL
NAPATVCGPKLSTQLVKNQCVNFNFNGLRGTGVLTTSSKRFQSFQQFGRDTSDFTDSVRDPQTLEILDIS
PCSFGGVSVITPGTNASSEVAVLYQDVNCTDVPTSIHADQLTPAWRVYSTGVNVFQTQAGCLIGAEHVNA
SYECDIPIGAGICASYHTASVLRSTGQKSIVAYTMSLGAENSIAYANNSIAIPTNFSISVTTEVMPVSIA
KTSVDCTMYICGDSLECSNLLLQYGSFCTQLNRALTGIAIEQDKNTQEVFAQVKQMYKTPAIKDFGGFNF
SQILPDPSKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDISARDLICAQKFNGLTVLPPLLTDEMIAA
YTAALVSGTATAGWTFGAGSALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQFNKAISQIQESLTTTS
TALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLITGRLQSLQTYVTQQL
IRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTYVPSQERNFTTAPAIC
HEGKAYFPREGVFVSNGTSWFITQRNFYSPQIITTDNTFVAGNCDVVIGIINNTVYDPLQPELDSFKEEL
DKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYEQYIKWPWYVWLGFIA
GLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AGZ48818.1 spike protein [Bat SARS-like coronavirus Rs3367]
MKLLVLVFATLVSSYTIEKCLDFDDRTPPANTQFLSSHRGVYYPDDIFRSNVLHLVQDHFLPFDSNVTRF
ITFGLNFDNPIIPFKDGIYFAATEKSNVIRGWVFGSTMNNKSQSVIIMNNSTNLVIRACNFELCDNPFFV
VLKSNNTQIPSYIFNNAFNCTFEYVSKDFNLDLGEKPGNFKDLREFVFRNKDGFLHVYSGYQPISAASGL
PTGFNALKPIFKLPLGINITNFRTLLTAFPPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCS
QNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSVYAWERKRISNCV
ADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFTGC
VLAWNTRNIDATQTGNYNYKYRSLRHGKLRPFERDISNVPFSPDGKPCTPPAFNCYWPLNDYGFYITNGI
GYQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFT
DSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAIHADQLTPSWRVYSTGNNVF
QTQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTN
FSISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
MYKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNG
LTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQF
NKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLI
TGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTY
VPSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGSCDVVIGIINNTV
YDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEINRLNEVAKNLNESLIDLQELGKYE
QYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AGZ48806.1 spike protein [Bat SARS-like coronavirus RsSHC014]
MKLLVLVFATLVSSYTIEKCLDFDDRTPPANTQFLSSHRGVYYPDDIFRSNVLHLVQDHFLPFDSNVTRF
ITFGLNFDNPIIPFRDGIYFAATEKSNVIRGWVFGSTMNNKSQSVIIMNNSTNLVIRACNFELCDNPFFV
VLKSNNTQIPSYIFNNAFNCTFEYVSKDFNLDLGEKPGNFKDLREFVFRNKDGFLHVYSGYQPISAASGL
PTGFNALKPIFKLPLGINITNFRTLLTAFPPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCS
QNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSVYAWERKRISNCV
ADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFLGC
VLAWNTNSKDSSTSGNYNYLYRWVRRSKLNPYERDLSNDIYSPGGQSCSAVGPNCYNPLRPYGFFTTAGV
GHQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFT
DSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAIHADQLTPSWRVYSTGNNVF
QTQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTN
FSISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
MYKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNG
LTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQF
NKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLI
TGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTY
VPSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGSCDVVIGIINNTV
YDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE
QYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
>AGZ48806.1 spike protein [Bat SARS-like coronavirus RsSHC014]
MKLLVLVFATLVSSYTIEKCLDFDDRTPPANTQFLSSHRGVYYPDDIFRSNVLHLVQDHFLPFDSNVTRF
ITFGLNFDNPIIPFRDGIYFAATEKSNVIRGWVFGSTMNNKSQSVIIMNNSTNLVIRACNFELCDNPFFV
VLKSNNTQIPSYIFNNAFNCTFEYVSKDFNLDLGEKPGNFKDLREFVFRNKDGFLHVYSGYQPISAASGL
PTGFNALKPIFKLPLGINITNFRTLLTAFPPRPDYWGTSAAAYFVGYLKPTTFMLKYDENGTITDAVDCS
QNPLAELKCSVKSFEIDKGIYQTSNFRVAPSKEVVRFPNITNLCPFGEVFNATTFPSVYAWERKRISNCV
ADYSVLYNSTSFSTFKCYGVSATKLNDLCFSNVYADSFVVKGDDVRQIAPGQTGVIADYNYKLPDDFLGC
VLAWNTNSKDSSTSGNYNYLYRWVRRSKLNPYERDLSNDIYSPGGQSCSAVGPNCYNPLRPYGFFTTAGV
GHQPYRVVVLSFELLNAPATVCGPKLSTDLIKNQCVNFNFNGLTGTGVLTPSSKRFQPFQQFGRDVSDFT
DSVRDPKTSEILDISPCSFGGVSVITPGTNTSSEVAVLYQDVNCTDVPVAIHADQLTPSWRVYSTGNNVF
QTQAGCLIGAEHVDTSYECDIPIGAGICASYHTVSSLRSTSQKSIVAYTMSLGADSSIAYSNNTIAIPTN
FSISITTEVMPVSMAKTSVDCNMYICGDSTECANLLLQYGSFCTQLNRALSGIAVEQDRNTREVFAQVKQ
MYKTPTLKDFGGFNFSQILPDPLKPTKRSFIEDLLFNKVTLADAGFMKQYGECLGDINARDLICAQKFNG
LTVLPPLLTDDMIAAYTAALVSGTATAGWTFGAGAALQIPFAMQMAYRFNGIGVTQNVLYENQKQIANQF
NKAISQIQESLTTTSTALGKLQDVVNQNAQALNTLVKQLSSNFGAISSVLNDILSRLDKVEAEVQIDRLI
TGRLQSLQTYVTQQLIRAAEIRASANLAATKMSECVLGQSKRVDFCGKGYHLMSFPQAAPHGVVFLHVTY
VPSQERNFTTAPAICHEGKAYFPREGVFVFNGTSWFITQRNFFSPQIITTDNTFVSGSCDVVIGIINNTV
YDPLQPELDSFKEELDKYFKNHTSPDVDLGDISGINASVVNIQKEIDRLNEVAKNLNESLIDLQELGKYE
QYIKWPWYVWLGFIAGLIAIVMVTILLCCMTSCCSCLKGACSCGSCCKFDEDDSEPVLKGVKLHYT
  • tab 6. Tree Rendering was then clicked after going back to the previous screen, and the phylogenetic tree of the five sequences was seen.

  • comparison of the tree to the multiple sequence alignment. and the differences in the sequences:

The class sequence alignment is very similar to the phylogenetic tree above. For example, both (QDF43825.1) and (AGZ48818.1) are closely related in the phylogeny tree and when looking at their sequences they are very similar too. Further, when we compare the sequence of (QDF43825.1) to (ALK02457.1) we find similarities between the two but the sequences are more different from each other than (QDF43825.1) and (AGZ48818.1) are. This makes sense when looking at the phylogenetic tree as (QDF43825.1) and (ALK02457.1) branch from the same point but are further away from each other than (QDF43825.1)and (AGZ48818.1).

The sequence shown in figure 3 of the paper is conserved, there’s much more prevalence in the class sequence than there’s in the paper. The sequence provided by the article has different amino acids when compared to the class sequence.

The tree from the article used genomic sequences however the tree from the class sequence included spike proteins sequences. Also, both trees have two branches. The phylogenetic tree from the paper seems to have one outgroup including one sequence (BtSCoV PDF2386) compared to the phylogenetic tree obtained from the class sequence had an outgroup including two sequences (AAP13567.1) and (AAS10463.1). The tree from the class sequence included all the sequences but from the paper, it included only part of it.

  • It was then interpreted if there was enough information provided by Wan et al (2020) in their paper for their analysis to be reproduced:

the paper did not provide sufficient information for it to be replicated. The methods section was very limited and it's hard to tell how they found the information that they did from just looking at the methods. Also, there's not much information about why they chose to include some results but not the other. Therefore, I don't believe this paper is replicable.

Conclusion

In this exercise, I was able to Learn about obtaining sequence data and comparing it using multiple sequence alignments which is an important skill for a biologist to have in order to be able to scientifically analyze sequences and the phylogenetic tree which tell a lot about the origin of which species, proteins, viruses, etc. come from and how they are related thereby providing essential information that could help me when conducting my own research.

Acknowledgements

  1. GenBank for the SARS Coronavirus Urbani DNA sequences and information related to assigned
  2. copied and modified the protocol shown on the Week 4 page
  3. I acknowledge the instructions from week 4[1]
  4. I discussed questions with My partner Kam Taghizadeh.
  5. Phylogeny.fr used to generate phylogenetic trees
  6. Referred to data in the article et al (2020) paper
  7. Dr. Kam D. Dahlquist, Ph.D. helped me in understanding the homework.

"Except for what is noted above, this individual journal entry was completed by me and not copied from another source"Falghane (talk) 17:19, 1 October 2020 (PDT)

References

  1. OpenWetWare. (2020). BIOL368/F20:Week 1. Retrieved October 1, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_4
  2. OpenWetWare. (2020). BIOL368/F20:Week 4. Retrieved October 1, 2020, from https://openwetware.org/wiki/BIOL368/F20:Week_1
  3. http://www.phylogeny.fr/simple_phylogeny.cgi
  4. Ncbi.nlm.nih.gov. 2020. Genbank Overview. [online] Available at: https://www.ncbi.nlm.nih.gov/genbank/ [Accessed 2 October 2020].
  5. http://www.phylogeny.fr/
  6. Wan, Y., Shang, J., Graham, R., Baric, R. and Li, F., 2020. Receptor Recognition By The Novel Coronavirus From Wuhan: An Analysis Based On Decade-Long Structural Studies Of SARS Coronavirus.