Drummond:PopGen: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Diffusion approximation: Added statistical analysis section.)
 
(3 intermediate revisions by the same user not shown)
Line 112: Line 112:
|<math>= s p(t)(1-p(t))\!</math>
|<math>= s p(t)(1-p(t))\!</math>
|}
|}
This result says that the proportion of type 1 <math>p</math> changes most rapidly when <math>p=0.5</math> and most slowly when <math>p</math> is very close to 0 or 1.  
This result says that the proportion of type 1, <math>p</math>, changes most rapidly when <math>p=0.5</math> and most slowly when <math>p</math> is very close to 0 or 1.


==Evolution is linear on a log-odds scale==
==Evolution is linear on a log-odds scale==
Line 143: Line 143:
We have three strains, <math>i</math>, <math>j</math> and <math>r</math>, where <math>r</math> is a reference strain.
We have three strains, <math>i</math>, <math>j</math> and <math>r</math>, where <math>r</math> is a reference strain.
Strains <math>i</math> and <math>j</math> have fitness <math>w_i = e^{r_i}</math> and <math>w_j=e^{r_j}</math>.  Define the selection coefficient <math>s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j</math> as usual.
Strains <math>i</math> and <math>j</math> have fitness <math>w_i = e^{r_i}</math> and <math>w_j=e^{r_j}</math>.  Define the selection coefficient <math>s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j</math> as usual.
We have data consisting of triples (number of generations, number of cells of type <math>i</math>, number of cells of type <math>r</math>).
We have data consisting of triples (<math>g=</math>number of generations, <math>n_i=</math>number of cells of type <math>i</math>, <math>n_r=</math>number of cells of type <math>r</math>).
We have data consisting of pairs (<math>g=</math>number of generations, <math>p_{ir}= n_i/n_r</math>) where <math>n_i</math>=number of cells of type <math>i</math> and <math>n_r=</math>number of cells of type <math>r</math>.


What is the best estimate, and error, on <math>s_{ij}</math>?
What is the best estimate, and error, on <math>s_{ij}</math>?


===Model===
===Model===
Given
Assuming exponential growth, <math>\ln p_{ir} = </math>
 
Let <math>\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})</math>.
Let <math>\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})</math>.


===Maximum-likelihood approach===
===Maximum-likelihood approach===
 
Add text.


===Bayesian approach===
===Bayesian approach===
Add text.

Latest revision as of 19:40, 28 March 2011

Introduction

Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

Per-generation and instantaneous growth rates

What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?

Let [math]\displaystyle{ n_i(t) }[/math] be the number of organisms of type [math]\displaystyle{ i }[/math] at time [math]\displaystyle{ t }[/math], and let [math]\displaystyle{ R }[/math] be the per-capita reproductive rate per generation. If [math]\displaystyle{ t }[/math] counts generations, then

[math]\displaystyle{ n_i(t+1) = n_i(t)R\! }[/math]
and
[math]\displaystyle{ n_i(t) = n_i(0)R^t.\! }[/math]

Now we wish to move to the case where [math]\displaystyle{ t }[/math] is continuous and real-valued. As before,

[math]\displaystyle{ n_i(t+1) = n_i(t)R\! }[/math]
but now
[math]\displaystyle{ n_i(t+\Delta t)\! }[/math] [math]\displaystyle{ =n_i(t)R^{\Delta t}\! }[/math]
[math]\displaystyle{ n_i(t+\Delta t) - n_i(t)\! }[/math] [math]\displaystyle{ = n_i(t)R^{\Delta t} - n_i(t)\! }[/math]
[math]\displaystyle{ \frac{n_i(t+\Delta t) - n_i(t)}{\Delta t} }[/math] [math]\displaystyle{ =\frac{n_i(t)R^{\Delta t} - n_i(t)}{\Delta t} }[/math]
[math]\displaystyle{ \frac{n_i(t+\Delta t) - n_i(t)}{\Delta t} }[/math] [math]\displaystyle{ =n_i(t) \frac{R^{\Delta t} - 1}{\Delta t} }[/math]
[math]\displaystyle{ \lim_{\Delta t \to 0} \left[{n_i(t+\Delta t) - n_i(t) \over \Delta t}\right] }[/math] [math]\displaystyle{ =\lim_{\Delta t \to 0} \left[ n_i(t) \frac{R^{\Delta t} - 1}{\Delta t}\right] }[/math]
[math]\displaystyle{ \frac{d n_i(t)}{dt} }[/math] [math]\displaystyle{ =n_i(t) \lim_{\Delta t \to 0} \left[\frac{R^{\Delta t} - 1}{\Delta t}\right] }[/math]
[math]\displaystyle{ \frac{d n_i(t)}{dt} }[/math] [math]\displaystyle{ =n_i(t) \ln R\! }[/math]

where the last simplification follows from L'Hôpital's rule. Explicitly, let [math]\displaystyle{ \epsilon=\Delta t }[/math]. Then

[math]\displaystyle{ \lim_{\Delta t \to 0} \left[{R^{\Delta t} - 1 \over \Delta t}\right] }[/math] [math]\displaystyle{ = \lim_{\epsilon \to 0} \left[\frac{R^{\epsilon} - 1}{\epsilon}\right] }[/math]
[math]\displaystyle{ =\lim_{\epsilon \to 0} \left[\frac{\frac{d}{d\epsilon}\left(R^{\epsilon} - 1\right)}{\frac{d}{d\epsilon}\epsilon}\right] }[/math]
[math]\displaystyle{ =\lim_{\epsilon \to 0} \left[\frac{R^{\epsilon}\ln R}{1}\right] }[/math]
[math]\displaystyle{ =\ln R \lim_{\epsilon \to 0} \left[R^{\epsilon}\right] }[/math]
[math]\displaystyle{ =\ln R\! }[/math]

The solution to the equation

[math]\displaystyle{ \frac{d n_i(t)}{dt} = n_i(t) \ln R }[/math]
is
[math]\displaystyle{ n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\! }[/math]
Note that the continuous case and the original discrete-generation case agree for all integer values of [math]\displaystyle{ t }[/math]. We can define the instantaneous growth rate [math]\displaystyle{ r = \ln R }[/math] for convenience.

Continuous rate of change

If two organisms grow at different rates, how do their proportions in the population change over time?

Let [math]\displaystyle{ r_1 }[/math] and [math]\displaystyle{ r_2 }[/math] be the instantaneous rates of increase of type 1 and type 2, respectively. Then

[math]\displaystyle{ {dn_i(t) \over dt} = r_i n_i(t). }[/math]
With the total population size
[math]\displaystyle{ n(t) = n_1(t) + n_2(t)\! }[/math]
we have the proportion of type 1
[math]\displaystyle{ p(t) = {n_1(t) \over n(t)} }[/math]
Define the fitness advantage
[math]\displaystyle{ s \equiv s_{12} = r_1 - r_2\! }[/math]
Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of [math]\displaystyle{ p(t) }[/math].
[math]\displaystyle{ {\partial p(t) \over \partial t} }[/math] [math]\displaystyle{ = {\partial \over \partial t}\left({n_1(t) \over n(t)}\right) }[/math]
[math]\displaystyle{ = {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t} }[/math]
[math]\displaystyle{ = {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right) }[/math]
[math]\displaystyle{ = {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right) }[/math]
[math]\displaystyle{ = {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right) }[/math]
[math]\displaystyle{ = {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right) }[/math]
[math]\displaystyle{ = {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right) }[/math]
[math]\displaystyle{ = s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right) }[/math]
[math]\displaystyle{ = s p(t)(1-p(t))\! }[/math]

This result says that the proportion of type 1, [math]\displaystyle{ p }[/math], changes most rapidly when [math]\displaystyle{ p=0.5 }[/math] and most slowly when [math]\displaystyle{ p }[/math] is very close to 0 or 1.

Evolution is linear on a log-odds scale

The logit function [math]\displaystyle{ \mathrm{logit} (p) = \ln {p \over 1-p} }[/math], which takes [math]\displaystyle{ p \in [0,1] \to \mathbb{R} }[/math], induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with [math]\displaystyle{ L_p(t) \equiv \mathrm{logit} (p(t))\! }[/math],

[math]\displaystyle{ {\partial L_p(t) \over \partial t} }[/math] [math]\displaystyle{ = {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right) }[/math]
[math]\displaystyle{ = {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right) }[/math]
[math]\displaystyle{ = {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right) }[/math]
[math]\displaystyle{ = s. \! }[/math]

This differential equation [math]\displaystyle{ L_p'(t) = s }[/math] has the solution

[math]\displaystyle{ L_p(t) = L_p(0) + st\! }[/math]

showing that the log-odds of finding type 1 changes linearly in time, increasing if [math]\displaystyle{ s\gt 0 }[/math] and decreasing if [math]\displaystyle{ s\lt 0 }[/math].

Diffusion approximation

Insert math here.

Statistical analysis of relative growth rates

We have three strains, [math]\displaystyle{ i }[/math], [math]\displaystyle{ j }[/math] and [math]\displaystyle{ r }[/math], where [math]\displaystyle{ r }[/math] is a reference strain. Strains [math]\displaystyle{ i }[/math] and [math]\displaystyle{ j }[/math] have fitness [math]\displaystyle{ w_i = e^{r_i} }[/math] and [math]\displaystyle{ w_j=e^{r_j} }[/math]. Define the selection coefficient [math]\displaystyle{ s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j }[/math] as usual. We have data consisting of triples ([math]\displaystyle{ g= }[/math]number of generations, [math]\displaystyle{ n_i= }[/math]number of cells of type [math]\displaystyle{ i }[/math], [math]\displaystyle{ n_r= }[/math]number of cells of type [math]\displaystyle{ r }[/math]). We have data consisting of pairs ([math]\displaystyle{ g= }[/math]number of generations, [math]\displaystyle{ p_{ir}= n_i/n_r }[/math]) where [math]\displaystyle{ n_i }[/math]=number of cells of type [math]\displaystyle{ i }[/math] and [math]\displaystyle{ n_r= }[/math]number of cells of type [math]\displaystyle{ r }[/math].

What is the best estimate, and error, on [math]\displaystyle{ s_{ij} }[/math]?

Model

Assuming exponential growth, [math]\displaystyle{ \ln p_{ir} = }[/math]

Let [math]\displaystyle{ \Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij}) }[/math].

Maximum-likelihood approach

Add text.

Bayesian approach

Add text.