User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: add info for binary phenotypes) 
m (→Bayesian model of univariate linear regression for QTL detection) 

(5 intermediate revisions not shown.)  
Line 290:  Line 290:  
There are many equivalent ways to write the likelihood, the usual one being:  There are many equivalent ways to write the likelihood, the usual one being:  
  <math>y_i \; \overset{i.i.d}{\sim} \; Bernoulli(p_i)  +  <math>y_i  p_i \; \overset{i.i.d}{\sim} \; Bernoulli(p_i)</math> with the [http://en.wikipedia.org/wiki/Logodds logodds] (logit function) being <math>\mathrm{ln} \frac{p_i}{1  p_i} = \mu + a \, g_i + d \, \mathbf{1}_{g_i=1}</math> 
  +  Let's use <math>X_i^T=[1 \; g_i \; \mathbf{1}_{g_i=1}]</math> to denote the <math>i</math>th row of the design matrix <math>X</math>. We can also keep the same definition as above for <math>B=[\mu \; a \; d]^T</math>. Thus we have:  
<math>p_i = \frac{e^{X_i^TB}}{1 + e^{X_i^TB}}</math>  <math>p_i = \frac{e^{X_i^TB}}{1 + e^{X_i^TB}}</math>  
Line 298:  Line 298:  
As the <math>y_i</math>'s can only take <math>0</math> and <math>1</math> as values, the likelihood can be written as:  As the <math>y_i</math>'s can only take <math>0</math> and <math>1</math> as values, the likelihood can be written as:  
  <math>\mathcal{L}(B) = \prod_{i=1}^N p_i^{y_i} (1p_i)^{1y_i}</math>  +  <math>\mathcal{L}(B) = \mathsf{P}(Y  X, B) = \prod_{i=1}^N p_i^{y_i} (1p_i)^{1y_i}</math> 
  We still use the same prior as above for <math>B</math> (but there is no <math>\tau</math> anymore)  +  We still use the same prior as above for <math>B</math> (but there is no <math>\tau</math> anymore), so that: 
  <math>  +  <math>B  \Sigma_B \sim \mathcal{N}_3(0, \Sigma_B)</math> 
  +  where <math>\Sigma_B</math> is a 3 x 3 matrix with <math>[\sigma_\mu^2 \; \sigma_a^2 \; \sigma_d^2]</math> on the diagonal and 0 elsewhere.  
  +  As above, the Bayes factor is used to compare the two models:  
  <math>\mathsf{P} (YX) = \  +  <math>\mathrm{BF} = \frac{\mathsf{P}(Y  X, M1)}{\mathsf{P}(Y  X, M0)} = \frac{\mathsf{P}(Y  X, a \neq 0, d \neq 0)}{\mathsf{P}(Y  X, a=0, d=0)} = \frac{\int \mathsf{P}(B) \mathsf{P}(Y  X, B) \mathrm{d}B}{\int \mathsf{P}(\mu) \mathsf{P}(Y  X, \mu) \mathrm{d}\mu}</math> 
  +  The interesting point here is that there is no way to analytically calculate these integrals (marginal likelihoods). Therefore, we will use [http://en.wikipedia.org/wiki/Laplace_approximation Laplace's method] to approximate them, as in Guan & Stephens (2008).  
  +  Starting with the numerator:  
  <math>\mathsf{P} (YX) = \int \exp \left(  +  <math>\mathsf{P}(YX,M1) = \int \exp \left[ N \left( \frac{1}{N} \mathrm{ln} \, \mathsf{P}(B) + \frac{1}{N} \mathrm{ln} \, \mathsf{P}(Y  X, B) \right) \right] \mathsf{d}B</math> 
  +  <math>\mathsf{P}(YX,M1) = \int \exp \left\{ N \left[ \frac{1}{N} \left( \mathrm{ln} \left( (2 \pi)^{\frac{3}{2}} \, \frac{1}{\sigma_\mu \sigma_a \sigma_d} \, \exp\left( \frac{1}{2} (\frac{\mu^2}{\sigma_\mu^2} + \frac{a^2}{\sigma_a^2} + \frac{d^2}{\sigma_d^2}) \right) \right) \right) + \frac{1}{N} \left( \sum_{i=1}^N \left( y_i \, \mathrm{ln} (p_i) + (1y_i) \, \mathrm{ln} (1p_i) \right) \right) \right] \right\} \mathsf{d}B</math>  
  <math>  +  Let's use <math>f</math> to denote the function inside the exponential: 
  +  <math>\mathsf{P}(YX,M1) = \int \exp \left( N \; f(B) \right) \mathsf{d}B</math>  
  +  The function <math>f</math> is defined by:  
  <math>\  +  <math>f: \mathbb{R}^3 \rightarrow \mathbb{R}</math> 
  +  <math>f(B) = \frac{1}{N} \left( \frac{3}{2} \mathrm{ln}(2 \pi)  \frac{1}{2} \mathrm{ln}(\Sigma_B)  \frac{1}{2}(B^T \Sigma_B^{1} B) \right) + \frac{1}{N} \sum_{i=1}^N \left( y_i \, X_i^T B  \mathrm{ln}(1 + e^{X_i^TB}) \right)</math>  
  This  +  This function will then be used to approximate the integral, like this: 
  <math>\  +  <math>\mathsf{P}(YX,M1) \approx N^{3/2} (2 \pi)^{3/2} H(B^\star)^{1/2} e^{N f(B^\star)}</math> 
  <math>  +  where <math>H</math> is the [http://en.wikipedia.org/wiki/Hessian_matrix Hessian] of <math>f</math> and <math>B^\star = [\mu^\star a^\star d^\star]^T</math> is the point at which <math>f</math> is maximized. 
  <math>  +  We therefore need to find <math>B^\star</math>. As it maximizes <math>f</math>, we need to calculate the first derivatives of <math>f</math>. Let's do this the univariate way: 
  +  <math>\frac{\partial f}{\partial \beta} =  \frac{\beta}{N \, \sigma_\beta^2} + \frac{1}{N} \sum_{i=1}^N \left(\frac{y_i}{p_i}  \frac{1y_i}{1p_i} \right) \frac{\partial p_i}{\partial \beta}</math>  
  +  where <math>\beta</math> is <math>\mu</math>, <math>a</math> or <math>d</math>.  
  +  A simple form for the first derivatives of <math>p_i</math> also exists when writing <math>p_i = e^{X_i^tB} (1 + e^{X_i^tB})^{1}</math>:  
  finding the maximums: iterative procedure or generic solver > to do  +  <math>\frac{\partial p_i}{\partial \beta} = \left[ e^{X_i^tB} (1 + e^{X_i^tB})^{1} + e^{X_i^tB} \left( e^{X_i^tB} (1 + e^{X_i^tB})^{2} \right) \right] \frac{\partial X_i^TB}{\partial \beta}</math> 
+  
+  <math>\frac{\partial p_i}{\partial \beta} = \left[ \frac{e^{X_i^tB} (1 + e^{X_i^tB})  (e^{X_i^tB})^2}{(1 + e^{X_i^tB})^2} \right] \frac{\partial X_i^TB}{\partial \beta}</math>  
+  
+  <math>\frac{\partial p_i}{\partial \beta} = \left[ p_i (1  p_i) \right] \frac{\partial X_i^TB}{\partial \beta}</math>  
+  
+  where <math>\frac{\partial X_i^TB}{\partial \beta}</math> is equal to <math>1, \, g_i, \, \mathbf{1}_{g_i=1}</math> when <math>\beta</math> corresponds respectively to <math>\mu, \, a, \, d</math>.  
+  
+  This simplifies the first derivatives of <math>f</math> into:  
+  
+  <math>\frac{\partial f}{\partial \beta} =  \frac{\beta}{N \, \sigma_\beta^2} + \frac{1}{N} \sum_{i=1}^N (y_i  p_i ) \frac{\partial X_i^TB}{\partial \beta}</math>  
+  
+  When setting <math>\frac{\partial f}{\partial \beta}(\beta^\star) = 0</math>, we observe that <math>\beta^\star</math> is present not only alone but also inside the sum, in the <math>p_i</math>'s: indeed <math>p_i</math> is a nonlinear function of <math>B</math>. This means that an iterative procedure is required, typically [http://en.wikipedia.org/wiki/Newton_method_in_optimization Newton's method].  
+  
+  To use it, we need the second derivatives of <math>f</math>:  
+  
+  <math>\frac{\partial^2 f}{\partial \beta^2} =  \frac{1}{N \, \sigma_\beta^2} + \frac{1}{N} \sum_{i=1}^N \left[ (p_i(1p_i)\frac{\partial X_i^TB}{\partial \beta}) + (y_ip_i)\frac{\partial^2 X_i^TB}{\partial \beta^2} \right]</math>  
+  
+  The second derivatives of <math>X_i^TB</math> are all equal to 0:  
+  
+  <math>\frac{\partial^2 f}{\partial \beta^2} =  \frac{1}{N \, \sigma_\beta^2}  \frac{1}{N} \sum_{i=1}^N p_i(1p_i)\frac{\partial X_i^TB}{\partial \beta}</math>  
+  
+  Note that the second derivatives of <math>f</math> are strictly negative. Therefore, <math>f</math> is globally convex, which means that it has a unique global maximum, at <math>B^\star</math>. As a consequence, we have the right to use Laplace's method to approximate the integral around its maximum.  
+  
+  finding the maximums: iterative procedure, update equations or generic solver > to do  
implementation: in R > to do  implementation: in R > to do 
Revision as of 17:41, 3 February 2013
Project name  Main project page Previous entry Next entry 
Bayesian model of univariate linear regression for QTL detectionThis page aims at helping people like me, interested in quantitative genetics, to get a better understanding of some Bayesian models, most importantly the impact of the modeling assumptions as well as the underlying maths. It starts with a simple model, and gradually increases the scope to relax assumptions. See references to scientific articles at the end.
where β_{1} is in fact the additive effect of the SNP, noted a from now on, and β_{2} is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, Y. It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes). The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Let's neglect the normalization constant for now:
Similarly, let's keep only the terms in B for the moment:
We expand:
We factorize some terms:
Importantly, let's define:
We can see that Ω^{T} = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω^{ − 1}Ω^{T} = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τ^{N / 2} out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
where
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's tdistribution:
We hence can write:
We want to test the following null hypothesis:
In Bayesian modeling, hypothesis testing is performed with a Bayes factor, which in our case can be written as:
We can shorten this into:
Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative. This makes a big difference when interpreting the results (see below). Let's start with the numerator:
First, let's calculate what is inside the integral:
Using the formula obtained previously and doing some algebra gives:
Now we can integrate out τ (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens):
Inside the integral, we recognize the almostcomplete pdf of a Gamma distribution. As it has to integrate to one, we get:
We can use this expression also under the null. In this case, as we need neither a nor d, B is simply μ, Σ_{B} is and X is a vector of 1's. We can also defines . In the end, this gives:
We can therefore write the Bayes factor:
When the Bayes factor is large, we say that there is enough evidence in the data to support the alternative. Indeed, the Bayesian testing procedure corresponds to measuring support for the specific alternative hypothesis compared to the null hypothesis. Importantly, note that, for a frequentist testing procedure, we would say that there is enough evidence in the data to reject the null. However we wouldn't say anything about the alternative as we don't model it. The threshold to say that a Bayes factor is large depends on the field. It is possible to use the Bayes factor as a test statistic when doing permutation testing, and then control the false discovery rate. This can give an idea of a reasonable threshold.
Such a question is never easy to answer. But note that all hyperparameters are not that important, especially in typical quantitative genetics applications. For instance, we are mostly interested in those that determine the magnitude of the effects, σ_{a} and σ_{d}, so let's deal with the others first. As explained in Servin & Stephens, the posteriors for τ and B change appropriately with shifts (y + c) and scaling () in the phenotype when taking their limits. This also gives us a new Bayes factor, the one used in practice (see Guan & Stephens, 2008):
Now, for the important hyperparameters, σ_{a} and σ_{d}, it is usual to specify a grid of values, i.e. M pairs (σ_{a},σ_{d}). For instance, Guan & Stephens used the following grid:
Then, we can average the Bayes factors obtained over the grid using, as a first approximation, equal weights:
In eQTL studies, the weights can be estimated from the data using a hierarchical model (see below), by pooling all genes together as in Veyrieras et al (PLoS Genetics, 2010).
BF < function(G=NULL, Y=NULL, sigma.a=NULL, sigma.d=NULL, get.log10=TRUE){ stopifnot(! is.null(G), ! is.null(Y), ! is.null(sigma.a), ! is.null(sigma.d)) subset < complete.cases(Y) & complete.cases(G) Y < Y[subset] G < G[subset] stopifnot(length(Y) == length(G)) N < length(G) X < cbind(rep(1,N), G, G == 1) inv.Sigma.B < diag(c(0, 1/sigma.a^2, 1/sigma.d^2)) inv.Omega < inv.Sigma.B + t(X) %*% X inv.Omega0 < N tY.Y < t(Y) %*% Y log10.BF < as.numeric(0.5 * log10(inv.Omega0)  0.5 * log10(det(inv.Omega))  log10(sigma.a)  log10(sigma.d)  (N/2) * (log10(tY.Y  t(Y) %*% X %*% solve(inv.Omega) %*% t(X) %*% cbind(Y))  log10(tY.Y  N*mean(Y)^2))) if(get.log10) return(log10.BF) else return(10^log10.BF) } In the same vein as what is explained here, we can simulate data under different scenarios and check the BFs: N < 300 # play with it PVE < 0.1 # play with it grid < c(0.05, 0.1, 0.2, 0.4, 0.8, 1.6, 3.2) MAF < 0.3 G < rbinom(n=N, size=2, prob=MAF) tau < 1 a < sqrt((2/5) * (PVE / (tau * MAF * (1MAF) * (1PVE)))) d < a / 2 mu < rnorm(n=1, mean=0, sd=10) Y < mu + a * G + d * (G == 1) + rnorm(n=N, mean=0, sd=tau) for(m in 1:length(grid)) print(BF(G, Y, grid[m], grid[m]/4))
There are many equivalent ways to write the likelihood, the usual one being: with the logodds (logit function) being Let's use to denote the ith row of the design matrix X. We can also keep the same definition as above for . Thus we have:
As the y_{i}'s can only take 0 and 1 as values, the likelihood can be written as:
We still use the same prior as above for B (but there is no τ anymore), so that:
where Σ_{B} is a 3 x 3 matrix with on the diagonal and 0 elsewhere. As above, the Bayes factor is used to compare the two models:
The interesting point here is that there is no way to analytically calculate these integrals (marginal likelihoods). Therefore, we will use Laplace's method to approximate them, as in Guan & Stephens (2008). Starting with the numerator:
Let's use f to denote the function inside the exponential:
The function f is defined by:
This function will then be used to approximate the integral, like this:
where H is the Hessian of f and is the point at which f is maximized. We therefore need to find . As it maximizes f, we need to calculate the first derivatives of f. Let's do this the univariate way:
where β is μ, a or d. A simple form for the first derivatives of p_{i} also exists when writing :
where is equal to when β corresponds respectively to . This simplifies the first derivatives of f into:
When setting , we observe that is present not only alone but also inside the sum, in the p_{i}'s: indeed p_{i} is a nonlinear function of B. This means that an iterative procedure is required, typically Newton's method. To use it, we need the second derivatives of f:
The second derivatives of are all equal to 0:
Note that the second derivatives of f are strictly negative. Therefore, f is globally convex, which means that it has a unique global maximum, at . As a consequence, we have the right to use Laplace's method to approximate the integral around its maximum. finding the maximums: iterative procedure, update equations or generic solver > to do implementation: in R > to do finding the effect sizes and their std error: to do
to do
to do
to do
to do
to do
to do
to do
to do
