Zrusso Biol 368 week 3
From OpenWetWare
Exploring HIV Evolution Electronic Notebook
- Found in the article the GenBank Accession numbers, but it only lists the first and last of the groups of sequences, so it only lists four sequences
- can navigate among the sequences by changing the URL, but as far as I can tell, there is no way to just go between the different sequences from the paper.
- I chose GenBank Accession number AF016821
- It says right in the title that it was taken from subject 2, visit 4, clone 5 from the USA
- Downloaded 5 FASTA sequences to a single word doc for use in the biology workbench
- created a bio workbench userID and attempted to input my sequences but using the browse and upload buttons would kick me back to the homepage, so instead I just manually copied and pasted the sequences in.
- performed a clustalW multiple sequence alignment on the 5 sequences to get an idea of how closely related they are
- rooted tree diagram of my 5 sequences
- multiple sequence alignment
- created new session for the subject HIV cross examination
- uploaded all the subject sequences first into .txt files and then into bio workbench
- Did a preliminary analysis of three sequences from 4 different subjects as an unrooted tree
- obviously the clones from each of the subjects were more closely related to each other than to clones from other subjects, but subjects 2 and 5 were more closely related to each other than to subject 3 or 13.
- I picked subject 2 clones 1,2,3; subject 3 clones 2,3,4: subject 5 clones 6,7,8.; and subject 13 clones 2,3,4
- subject 5 seems to show the greatest variation in his/her clonal variants with clone 6 deviating much higher up-branch than the other two who sit at the end.
- I picked subjects 7, 2, and 11 to determine S and Theta as well as the min and max difference.
- figuring out how to determine theta is turning out to be difficult since it is a harmonic sum
- found and linked a website that will use sigma notation
- imported the alignments and used clustaldist to create a distance matrix
- subject 2 had a minimum score of .004 and a max score of .011. After multiplying by the number of bases present, this is 1 min and 3 max
- subject 7 had a min score of .007 and max score of .063. After multiplying by the number of bases present, this is 2 min and 18 max subject 11 had a min score of .003 and a max score of .024. After multiplying by the number of bases present, this is 1 min and 7 max
- compared subjects 2 and 7 and min score was .081 and max score was .116. After multiplying by the number of bases present this is 23 min and 33 max.
- compared subjects 2 and 11 and min score was .121 and max score was .145. After multiplying by number of bases present this is 34 min and 41 max
- compared subjects 7 and 11 and min score was .131 and max score was .167. After multiplying by number of bases present this is 37 min and 48 max.
- excel spreadsheet of data can be found here