User:Timothee Flutre/Notebook/Postdoc/2011/11/10
From OpenWetWare
(→Bayesian model of univariate linear regression for QTL detection: fix typo sign lambda*) |
(→Bayesian model of univariate linear regression for QTL detection: start adding formula for BF) |
||
| Line 35: | Line 35: | ||
<math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math> | <math>Y | X, \tau, B \sim \mathcal{N}(XB, \tau^{-1} I_N)</math> | ||
| - | Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the | + | Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, <math>Y</math>. |
| + | It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes). | ||
The likelihood of the parameters given the data is therefore: | The likelihood of the parameters given the data is therefore: | ||
| Line 164: | Line 165: | ||
| - | * '''Bayes Factor''': to do | + | * '''Bayes Factor''': one way to answer our goal above ("is there an effect of the genotype on the phenotype?") is to do [http://en.wikipedia.org/wiki/Hypothesis_testing hypothesis testing]. |
| + | We want to test the following [http://en.wikipedia.org/wiki/Null_hypothesis null hypothesis]: | ||
| + | <math>H_0: \; a = d = 0</math> | ||
| - | * ''' | + | In Bayesian modeling, hypothesis testing is performed with a [http://en.wikipedia.org/wiki/Bayes_factor Bayes factor], which in our case can be written as: |
| + | |||
| + | <math>BF = \frac{\mathsf{P}(Y | X, a \neq 0, d \neq 0)}{\mathsf{P}(Y | X, a = 0, d = 0)}</math> | ||
| + | |||
| + | We can shorten this into: | ||
| + | |||
| + | <math>BF = \frac{\mathsf{P}(Y | X)}{\mathsf{P}_0(Y)}</math> | ||
| + | |||
| + | Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative. | ||
| + | |||
| + | Let's start with the numerator: | ||
| + | |||
| + | <math>\mathsf{P}(Y | X) = \int \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau) \mathsf{d}\tau</math> | ||
| + | |||
| + | First, let's calculate what is inside the integral: | ||
| + | |||
| + | <math>\mathsf{P}(Y | X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)}{\mathsf{P}(B | Y, X, \tau)}</math> | ||
| + | |||
| + | Using the formula obtained previously and doing some algebra gives: | ||
| + | |||
| + | <math>\mathsf{P}(Y | X, \tau) = \left( \frac{\tau}{2 \pi} \right)^{\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} exp\left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY) \right)</math> | ||
| + | |||
| + | Now we can integrate out <math>\tau</math> (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens): | ||
| + | |||
| + | <math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \frac{\frac{\lambda}{2}^{\frac{\kappa}{2}}}{\Gamma(\frac{\kappa}{2})} \int \tau^{\frac{N+\kappa}{2}-1} exp \left( -\frac{\tau}{2} (Y^TY - Y^TX\Omega X^TY + \lambda) \right)</math> | ||
| + | |||
| + | Inside the integral, we recognize the almost-complete pdf of a Gamma distribution. | ||
| + | As it has to integrate to one, we get: | ||
| + | |||
| + | <math>\mathsf{P}(Y | X) = (2\pi)^{-\frac{N}{2}} \left( \frac{|\Omega|}{|\Sigma_B|} \right)^{\frac{1}{2}} \left( \frac{\lambda}{2} \right)^{\frac{\kappa}{2}} \frac{\Gamma(\frac{N+\kappa}{2})}{\Gamma(\frac{\kappa}{2})} \left( \frac{Y^TY - Y^TX\Omega X^TY + \lambda}{2} \right)^{-\frac{N+\kappa}{2}}</math> | ||
| + | |||
| + | |||
| + | * '''Choosing the hyperparameters''': | ||
invariance properties motivate the use of limits for some "unimportant" hyperparameters | invariance properties motivate the use of limits for some "unimportant" hyperparameters | ||
| Line 174: | Line 209: | ||
| - | * '''R code''': to do | + | * '''R code''': |
| + | |||
| + | to do | ||
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> | <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### --> | ||
Revision as of 15:09, 22 November 2012
Main project page Previous entry Next entry
| |
Bayesian model of univariate linear regression for QTL detectionSee Servin & Stephens (PLoS Genetics, 2007).
where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak. Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the regression has a single response, Y. It is usual to keep the term "multivariate" for the case where there is a matrix of responses (i.e. multiple phenotypes). The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
which means:
And a multivariate Normal distribution for B:
which means:
Let's neglect the normalization constant for now:
Similarly, let's keep only the terms in B for the moment:
We expand:
We factorize some terms:
Importantly, let's define:
We can see that ΩT = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τN / 2 out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
where
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's t-distribution:
We hence can write:
We want to test the following null hypothesis:
In Bayesian modeling, hypothesis testing is performed with a Bayes factor, which in our case can be written as:
We can shorten this into:
Note that, compare to frequentist hypothesis testing which focuses on the null, the Bayes factor requires to explicitly model the data under the alternative. Let's start with the numerator:
First, let's calculate what is inside the integral:
Using the formula obtained previously and doing some algebra gives:
Now we can integrate out τ (note the small typo in equation 9 of supplementary text S1 of Servin & Stephens):
Inside the integral, we recognize the almost-complete pdf of a Gamma distribution. As it has to integrate to one, we get:
invariance properties motivate the use of limits for some "unimportant" hyperparameters average BF over grid
to do | |

the (quantitative) phenotypes (e.g. expression levels at a given gene), and
the genotypes at a given SNP (encoded as allele dose: 0, 1 or 2).


