Drummond:PopGen

(Difference between revisions)
 Revision as of 19:22, 4 July 2008 (view source) (popgen material)← Previous diff Current revision (22:40, 28 March 2011) (view source) (→Continuous rate of change) (23 intermediate revisions not shown.) Line 1: Line 1: {{Drummond_Top}} {{Drummond_Top}}
- ==Notes on population genetics== + ==Introduction== + Here I will treat some basic questions in population genetics.  For personal reasons, I tend to include all the algebra. + + ==Per-generation and instantaneous growth rates== +

+ What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth? +

- Let $n_i(t)$ be the number of organisms of type $i$ at time $t$, and let $R$ be the ''per-capita reproductive rate'' per generation.  If $t$ counts generations, then + Let $n_i(t)$ be the number of organisms of type $i$ at time $t$, and let $R$ be the ''per-capita reproductive rate per generation''.  If $t$ counts generations, then - :$n_i(t+1) = n_i(t)R$ + :$n_i(t+1) = n_i(t)R\!$ and and - :$n_i(t) = n_i(0)R^t$. + :$n_i(t) = n_i(0)R^t.\!$

Now we wish to move to the case where $t$ is continuous and real-valued. Now we wish to move to the case where $t$ is continuous and real-valued. As before,
As before,
- :$n_i(t+1) = n_i(t)R + :[itex]n_i(t+1) = n_i(t)R\!$ but now
but now
- :\begin{matrix} + :{| - n_i(t+\Delta t) &=& n_i(t)R^{\Delta t}\\ + |align="right" |[itex]n_i(t+\Delta t)\! - n_i(t+\Delta t) &=& n_i(t)R^{\Delta t} + n_i(t) - n_i(t)\\ + |$=n_i(t)R^{\Delta t}\!$ - n_i(t+\Delta t) - n_i(t) &=& n_i(t)R^{\Delta t} - n_i(t)\\ + |- - \frac{n_i(t+\Delta t) - n_i(t)}{\Delta t} &=& \frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}\\ + |align="right" |$n_i(t+\Delta t) - n_i(t)\!$ - \frac{n_i(t+\Delta t) - n_i(t)}{\Delta t} &=& n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\\ + |$= n_i(t)R^{\Delta t} - n_i(t)\!$ - \lim_{\Delta t \to 0} \left[\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}\right] &=& \lim_{\Delta t \to 0} n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\\ + |- - \frac{d n_i(t)}{dt} &=& n_i(t) \lim_{\Delta t \to 0} \frac{R^{\Delta t} - 1}{\Delta t}\\ + |align="right" |$\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$ - \frac{d n_i(t)}{dt} &=& n_i(t) \ln R\\ + |$=\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}$ - \end{matrix}[/itex] + |- - where the last simplification follows from [http://en.wikipedia.org/wiki/L%27Hopital%27s_rule L'Hopital's rule].  Explicitly, let $\epsilon=\Delta t$.  Then
+ |align="right" |$\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$ - :$+ |[itex]=n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}$ - \begin{matrix} + |- - \lim_{\Delta t \to 0} \frac{R^{\Delta t} - 1}{\Delta t} &=& \lim_{\epsilon \to 0} \frac{R^{\epsilon} - 1}{\epsilon}\\ + |align="right" |$\lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right]$ - &=& \lim_{\epsilon \to 0} \frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\\ + |$=\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right]$ - &=& \lim_{\epsilon \to 0} \frac{R^{\epsilon}\ln R}{1}\\ + |- - &=& \ln R \lim_{\epsilon \to 0} \frac{R^{\epsilon}}{1}\\ + |align="right" |$\frac{d n_i(t)}{dt}$ - &=& \ln R. + |$=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right]$ - \end{matrix} + |- - [/itex] + |align="right" |$\frac{d n_i(t)}{dt}$ + |$=n_i(t) \ln R\!$ + |} + where the last simplification follows from [http://en.wikipedia.org/wiki/L%27H%C3%B4pital%27s_rule L'Hôpital's rule].  Explicitly, let $\epsilon=\Delta t$.  Then
+ :{| + |- + |$\lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right]$ + |$= \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right]$ + |- + | + |$=\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right]$ + |- + | + |$=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right]$ + |- + | + |$=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right]$ + |- + | + |$=\ln R\!$ + |}

Line 39: Line 65: :$\frac{d n_i(t)}{dt} = n_i(t) \ln R$ :$\frac{d n_i(t)}{dt} = n_i(t) \ln R$ is is - :$n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}$ + :$n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!$ - Note that the continuous case and the original discrete-generation case agree for all values of $t$.  We can define the ''instantaneous rate of increase'' $r = \ln R$ for convenience. + Note that the continuous case and the original discrete-generation case agree for all integer values of $t$.  We can define the ''instantaneous growth rate'' $r = \ln R$ for convenience.

+ ==Continuous rate of change== +

+ If two organisms grow at different rates, how do their proportions in the population change over time? +

+

+ Let $r_1$ and $r_2$ be the instantaneous rates of increase of type 1 and type 2, respectively.  Then + :${dn_i(t) \over dt} = r_i n_i(t).$ + With the total population size + :$n(t) = n_1(t) + n_2(t)\!$ + we have the proportion of type 1 + :$p(t) = {n_1(t) \over n(t)}$ + Define the fitness advantage + :$s \equiv s_{12} = r_1 - r_2\!$ + + Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of $p(t)$. + :{| + |${\partial p(t) \over \partial t}$ + |$= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)$ + |- + | + |$= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}$ + |- + | + |$= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)$ + |- + | + |$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)$ + |- + | + |$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)$ + |- + | + |$= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)$ + |- + | + |$= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)$ + |- + | + |$= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$ + |- + | + |$= s p(t)(1-p(t))\!$ + |} + This result says that the proportion of type 1, $p$, changes most rapidly when $p=0.5$ and most slowly when $p$ is very close to 0 or 1. + + ==Evolution is linear on a log-odds scale== + The logit function $\mathrm{logit} (p) = \ln {p \over 1-p}$, which takes $p \in [0,1] \to \mathbb{R}$, induces a more natural space for considering changes in frequencies.  Rather than tracking the proportion of type 1 or 2, we instead track their log odds.  In logit terms, with $L_p(t) \equiv \mathrm{logit} (p(t))\!$, + + :{| + |${\partial L_p(t) \over \partial t}$ + |$= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)$ + |- + | + |$= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)$ + |- + | + |$= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)$ + |- + | + |$= s. \!$ + |} + + This differential equation $L_p'(t) = s$ has the solution + + :$L_p(t) = L_p(0) + st\!$ + + showing that the log-odds of finding type 1 changes linearly in time, increasing if $s>0$ and decreasing if $s<0$. + + ==Diffusion approximation== + Insert math here. + + ==Statistical analysis of relative growth rates== + We have three strains, $i$, $j$ and $r$, where $r$ is a reference strain. + Strains $i$ and $j$ have fitness $w_i = e^{r_i}$ and $w_j=e^{r_j}$.  Define the selection coefficient $s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j$ as usual. + We have data consisting of triples ($g=$number of generations, $n_i=$number of cells of type $i$, $n_r=$number of cells of type $r$). + We have data consisting of pairs ($g=$number of generations, $p_{ir}= n_i/n_r$) where $n_i$=number of cells of type $i$ and $n_r=$number of cells of type $r$. + + What is the best estimate, and error, on $s_{ij}$? + + ===Model=== + Assuming exponential growth, $\ln p_{ir} =$ + + Let $\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})$. + + ===Maximum-likelihood approach=== + Add text. -

+ ===Bayesian approach=== - {{Drummond_Bottom}} + Add text.

the drummond lab

Introduction

Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

Per-generation and instantaneous growth rates

What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?

Let ni(t) be the number of organisms of type i at time t, and let R be the per-capita reproductive rate per generation. If t counts generations, then

$n_i(t+1) = n_i(t)R\!$
and
$n_i(t) = n_i(0)R^t.\!$

Now we wish to move to the case where t is continuous and real-valued. As before,

$n_i(t+1) = n_i(t)R\!$
but now
 $n_i(t+\Delta t)\!$ $=n_i(t)R^{\Delta t}\!$ $n_i(t+\Delta t) - n_i(t)\!$ $= n_i(t)R^{\Delta t} - n_i(t)\!$ $\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$ $=\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t}$ $\frac{n_i(t+\Delta t) - n_i(t)}{\Delta t}$ $=n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}$ $\lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right]$ $=\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right]$ $\frac{d n_i(t)}{dt}$ $=n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right]$ $\frac{d n_i(t)}{dt}$ $=n_i(t) \ln R\!$

where the last simplification follows from L'Hôpital's rule. Explicitly, let ε = Δt. Then

 $\lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right]$ $= \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right]$ $=\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right]$ $=\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right]$ $=\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right]$ $=\ln R\!$

The solution to the equation

$\frac{d n_i(t)}{dt} = n_i(t) \ln R$
is
$n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!$
Note that the continuous case and the original discrete-generation case agree for all integer values of t. We can define the instantaneous growth rate r = lnR for convenience.

Continuous rate of change

If two organisms grow at different rates, how do their proportions in the population change over time?

Let r1 and r2 be the instantaneous rates of increase of type 1 and type 2, respectively. Then

${dn_i(t) \over dt} = r_i n_i(t).$
With the total population size
$n(t) = n_1(t) + n_2(t)\!$
we have the proportion of type 1
$p(t) = {n_1(t) \over n(t)}$
$s \equiv s_{12} = r_1 - r_2\!$
Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of p(t).
 ${\partial p(t) \over \partial t}$ $= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)$ $= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}$ $= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)$ $= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)$ $= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)$ $= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)$ $= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)$ $= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)$ $= s p(t)(1-p(t))\!$

This result says that the proportion of type 1, p, changes most rapidly when p = 0.5 and most slowly when p is very close to 0 or 1.

Evolution is linear on a log-odds scale

The logit function $\mathrm{logit} (p) = \ln {p \over 1-p}$, which takes $p \in [0,1] \to \mathbb{R}$, induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with $L_p(t) \equiv \mathrm{logit} (p(t))\!$,

 ${\partial L_p(t) \over \partial t}$ $= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)$ $= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)$ $= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)$ $= s. \!$

This differential equation Lp'(t) = s has the solution

$L_p(t) = L_p(0) + st\!$

showing that the log-odds of finding type 1 changes linearly in time, increasing if s > 0 and decreasing if s < 0.

Diffusion approximation

Insert math here.

Statistical analysis of relative growth rates

We have three strains, i, j and r, where r is a reference strain. Strains i and j have fitness $w_i = e^{r_i}$ and $w_j=e^{r_j}$. Define the selection coefficient $s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j$ as usual. We have data consisting of triples (g = number of generations, ni = number of cells of type i, nr = number of cells of type r). We have data consisting of pairs (g = number of generations, pir = ni / nr) where ni=number of cells of type i and nr = number of cells of type r.

What is the best estimate, and error, on sij?

Model

Assuming exponential growth, lnpir =

Let $\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})$.