|Project name|| Main project page|
Previous entry Next entry
Bayesian model of univariate linear regression for QTL detection
See Servin & Stephens (PLoS Genetics, 2007).
where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak.
Let's now write the model in matrix notation:
This gives the following multivariate Normal distribution for the phenotypes:
Even though we can write the likelihood as a multivariate Normal, I still keep the term "univariate" in the title because the covariance matrix of Y | X,τ,B remains a single real number, τ.
The likelihood of the parameters given the data is therefore:
A Gamma distribution for τ:
And a multivariate Normal distribution for B:
Here and in the following, we neglect all constants (e.g. normalization constant, YTY, etc):
We use the prior and likelihood and keep only the terms in B:
We factorize some terms:
Importantly, let's define:
We can see that ΩT = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.
This now becomes easy to factorizes totally:
We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:
Similarly to the equations above:
But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:
Again, we use the priors and likelihoods specified above (but everything inside the integral is kept inside it, even if it doesn't depend on B!):
As we used a conjugate prior for τ, we know that we expect a Gamma distribution for the posterior. Therefore, we can take τN / 2 out of the integral and start guessing what looks like a Gamma distribution. We also factorize inside the exponential:
We recognize the conditional posterior of B. This allows us to use the fact that the pdf of the Normal distribution integrates to one:
We finally recognize a Gamma distribution, allowing us to write the posterior as:
Here we recognize the formula to integrate the Gamma function:
And we now recognize a multivariate Student's t-distribution:
We hence can write:
invariance properties motivate the use of limits for some "unimportant" hyperparameters
average BF over grid