# Drummond:PopGen

### From OpenWetWare

(→Notes on population genetics) |
Current revision (21:40, 28 March 2011) (view source) (→Continuous rate of change) |
||

(22 intermediate revisions not shown.) | |||

Line 1: | Line 1: | ||

{{Drummond_Top}} | {{Drummond_Top}} | ||

<div style="width: 750px"> | <div style="width: 750px"> | ||

+ | ==Introduction== | ||

+ | Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra. | ||

+ | |||

==Per-generation and instantaneous growth rates== | ==Per-generation and instantaneous growth rates== | ||

<p> | <p> | ||

- | Let <math>n_i(t)</math> be the number of organisms of type <math>i</math> at time <math>t</math>, and let <math>R</math> be the ''per-capita reproductive rate | + | What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth? |

+ | </p> | ||

+ | <p> | ||

+ | Let <math>n_i(t)</math> be the number of organisms of type <math>i</math> at time <math>t</math>, and let <math>R</math> be the ''per-capita reproductive rate per generation''. If <math>t</math> counts generations, then | ||

:<math>n_i(t+1) = n_i(t)R\!</math> | :<math>n_i(t+1) = n_i(t)R\!</math> | ||

and | and | ||

Line 60: | Line 66: | ||

is | is | ||

:<math>n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!</math> | :<math>n_i(t) = n_i(0) e^{t\ln R} = n_i(0) R^{t}.\!</math> | ||

- | Note that the continuous case and the original discrete-generation case agree for all values of <math>t</math>. We can define the ''instantaneous rate | + | Note that the continuous case and the original discrete-generation case agree for all integer values of <math>t</math>. We can define the ''instantaneous growth rate'' <math>r = \ln R</math> for convenience. |

</p> | </p> | ||

+ | |||

+ | ==Continuous rate of change== | ||

+ | <p> | ||

+ | If two organisms grow at different rates, how do their proportions in the population change over time? | ||

+ | </p> | ||

+ | <p> | ||

+ | Let <math>r_1</math> and <math>r_2</math> be the instantaneous rates of increase of type 1 and type 2, respectively. Then | ||

+ | :<math>{dn_i(t) \over dt} = r_i n_i(t).</math> | ||

+ | With the total population size | ||

+ | :<math>n(t) = n_1(t) + n_2(t)\!</math> | ||

+ | we have the proportion of type 1 | ||

+ | :<math>p(t) = {n_1(t) \over n(t)}</math> | ||

+ | Define the fitness advantage | ||

+ | :<math>s \equiv s_{12} = r_1 - r_2\!</math> | ||

+ | |||

+ | Given our interest in understanding the change in gene frequencies, our goal is to compute the rate of change of <math>p(t)</math>. | ||

+ | :{| | ||

+ | |<math>{\partial p(t) \over \partial t}</math> | ||

+ | |<math>= {\partial \over \partial t}\left({n_1(t) \over n(t)}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}{\partial n(t) \over \partial t}</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {\partial n_1(t) \over \partial t}\left({1 \over n(t)}\right) + n_1(t){-1 \over n(t)^2}\left({\partial n_1(t) \over \partial t} + {\partial n_2(t) \over \partial t}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + r_2 n_2(t)\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n_1(t) + (r_1-s)(n(t)-n_1(t))\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {r_1 n_1(t) \over n(t)} - {n_1(t) \over n(t)^2}\left(r_1 n(t) -s n(t) + s n_1(t))\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {n_1(t) \over n(t)^2}\left(s n(t) - s n_1(t))\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= s{n_1(t) \over n(t)}\left(1 - {n_1(t) \over n(t)}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= s p(t)(1-p(t))\!</math> | ||

+ | |} | ||

+ | This result says that the proportion of type 1, <math>p</math>, changes most rapidly when <math>p=0.5</math> and most slowly when <math>p</math> is very close to 0 or 1. | ||

+ | |||

+ | ==Evolution is linear on a log-odds scale== | ||

+ | The logit function <math>\mathrm{logit} (p) = \ln {p \over 1-p}</math>, which takes <math>p \in [0,1] \to \mathbb{R}</math>, induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with <math>L_p(t) \equiv \mathrm{logit} (p(t))\!</math>, | ||

+ | |||

+ | :{| | ||

+ | |<math>{\partial L_p(t) \over \partial t} </math> | ||

+ | |<math>= {\partial \over \partial t}\left(\ln {p(t) \over 1-p(t)}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {\partial \over \partial t}\left(\ln {n_1(t) \over n_2(t)}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= {\partial \over \partial t}\left(\ln {n_1(0) \over n_2(0)} e^{st}\right)</math> | ||

+ | |- | ||

+ | | | ||

+ | |<math>= s. \!</math> | ||

+ | |} | ||

+ | |||

+ | This differential equation <math>L_p'(t) = s</math> has the solution | ||

+ | |||

+ | :<math>L_p(t) = L_p(0) + st\!</math> | ||

+ | |||

+ | showing that the log-odds of finding type 1 changes linearly in time, increasing if <math>s>0</math> and decreasing if <math>s<0</math>. | ||

+ | |||

+ | ==Diffusion approximation== | ||

+ | Insert math here. | ||

+ | |||

+ | ==Statistical analysis of relative growth rates== | ||

+ | We have three strains, <math>i</math>, <math>j</math> and <math>r</math>, where <math>r</math> is a reference strain. | ||

+ | Strains <math>i</math> and <math>j</math> have fitness <math>w_i = e^{r_i}</math> and <math>w_j=e^{r_j}</math>. Define the selection coefficient <math>s_{ij} = \ln \frac{w_i}{w_j} = r_i - r_j</math> as usual. | ||

+ | We have data consisting of triples (<math>g=</math>number of generations, <math>n_i=</math>number of cells of type <math>i</math>, <math>n_r=</math>number of cells of type <math>r</math>). | ||

+ | We have data consisting of pairs (<math>g=</math>number of generations, <math>p_{ir}= n_i/n_r</math>) where <math>n_i</math>=number of cells of type <math>i</math> and <math>n_r=</math>number of cells of type <math>r</math>. | ||

+ | |||

+ | What is the best estimate, and error, on <math>s_{ij}</math>? | ||

+ | |||

+ | ===Model=== | ||

+ | Assuming exponential growth, <math>\ln p_{ir} = </math> | ||

+ | |||

+ | Let <math>\Pr(s_{ij}=t) = \mathcal{N}(t;\mu_{ij}, \sigma^2_{ij})</math>. | ||

+ | |||

+ | ===Maximum-likelihood approach=== | ||

+ | Add text. | ||

+ | |||

+ | ===Bayesian approach=== | ||

+ | Add text. |

## Current revision

## Introduction

Here I will treat some basic questions in population genetics. For personal reasons, I tend to include all the algebra.

## Per-generation and instantaneous growth rates

What is the relationship between per-generation growth rates and the Malthusian parameter, the instantaneous rate of growth?

Let *n*_{i}(*t*) be the number of organisms of type *i* at time *t*, and let *R* be the *per-capita reproductive rate per generation*. If *t* counts generations, then

Now we wish to move to the case where *t* is continuous and real-valued.
As before,

where the last simplification follows from L'Hôpital's rule. Explicitly, let ε = Δ*t*. Then

The solution to the equation

*t*. We can define the

*instantaneous growth rate*

*r*= ln

*R*for convenience.

## Continuous rate of change

If two organisms grow at different rates, how do their proportions in the population change over time?

Let *r*_{1} and *r*_{2} be the instantaneous rates of increase of type 1 and type 2, respectively. Then

*p*(

*t*).

This result says that the proportion of type 1, *p*, changes most rapidly when *p* = 0.5 and most slowly when *p* is very close to 0 or 1.

## Evolution is linear on a log-odds scale

The logit function , which takes , induces a more natural space for considering changes in frequencies. Rather than tracking the proportion of type 1 or 2, we instead track their log odds. In logit terms, with ,

This differential equation *L*_{p}'(*t*) = *s* has the solution

showing that the log-odds of finding type 1 changes linearly in time, increasing if *s* > 0 and decreasing if *s* < 0.

## Diffusion approximation

Insert math here.

## Statistical analysis of relative growth rates

We have three strains, *i*, *j* and *r*, where *r* is a reference strain.
Strains *i* and *j* have fitness and . Define the selection coefficient as usual.
We have data consisting of triples (*g* = number of generations, *n*_{i} = number of cells of type *i*, *n*_{r} = number of cells of type *r*).
We have data consisting of pairs (*g* = number of generations, *p*_{ir} = *n*_{i} / *n*_{r}) where *n*_{i}=number of cells of type *i* and *n*_{r} = number of cells of type *r*.

What is the best estimate, and error, on *s*_{ij}?

### Model

Assuming exponential growth, ln*p*_{ir} =

Let .

### Maximum-likelihood approach

Add text.