# KP Ramirez Week 5

## Assignment

Question: Are their differences in HIV-1 diversity or divergence between participants with high CD4 T cell variability within the study (between visits) as compared to participants with linear ‘progression’ (defined as CD4 T cell counts which fall rapidly, or linearly, over time (slope ~ -1)?

Prediction: We predict that participants with high variability in T cell count between visits will show a lower HIV-1 diversity and divergence than participants with linear progression. This is predicted under two assumptions

• (1) High diversity and divergence of HIV-1 variants indicates a more rapidly progressing virus (and thus a steadily falling CD4 T cell count)
• (2) high variability in T-Cell count will indicate a participant’s immune system was able to manage this virus better than a participant with a steadily falling CD4 counts. If we do see high diversity in participants with high variability in T cell count between visits, we predict that these will be predominantly synonymous mutations as opposed to non-synonymous mutations (which we would expect to see with linear progressors).

Subjects Chosen:

• Linear Progressors: (slope: -1) Subject: 4, 10
• High Variability between visits: Subject 12, 8
• (Low Variability between visits: 5
Article: Early viral load and CD4+ T cell count, but not percentage of CCR5+ or CXCR4+ CD4+ T cells, are associated with R5-to-X4 HIV type 1 virus evolution.*
Error fetching PMID 12803997:
1. Error fetching PMID 12803997: [Paper1]

## Notes

Divergence and diversity

• Divergence is how different are the sequences are from each other. If you have 10 sequences and 1 is different. They are not very divergent because they are only different by one. If you have like say 10 bp differences then they are very divergent.
• All the individual branches on the unrooted tree indicated how far they were on the branches, those were more divergent with longer branches then shorter.
• S was counting every base pair that was different from the multiple sequence alignment. There were 25 sequences.
• S is going to tell you DIVERSITY. Its a rough estimate of how different those sequences are from each other.

Then you calculate θ. Defined as the average genetic pairwise difference. It indicates the average genetic pairwise difference between two sequences. You can get an absolute number of the pair of differences. So we wanna know what is the average differences between the two sequences. θ is an estimate. So we take the S value and divide it by the harmonic sum.

• The last measure was to look at the minimum and maximum differences through the clustdist tool. This created a pairwise distance matrix. The distance matrix is to find the lowest number that is not 0. You have to look for the smallest distance between the sequences. You had to import the data from subjects to find the min and max between the two subjects.