# User:Timothee Flutre/Notebook/Postdoc/2011/11/10

(Difference between revisions)
 Revision as of 12:30, 21 November 2012 (view source) (→Bayesian model of univariate linear regression for QTL detection: add conditional posterior of B)← Previous diff Revision as of 12:46, 21 November 2012 (view source) (→Bayesian model of univariate linear regression for QTL detection: start posterior of tau)Next diff → Line 21: Line 21: - * '''Likelihood''': + * '''Likelihood''': $\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})$ - + - $\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i$ + - + - with: $\epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})$ + where $\beta_1$ is in fact the additive effect of the SNP, noted $a$ from now on, and $\beta_2$ is the dominance effect of the SNP, $d = a k$. where $\beta_1$ is in fact the additive effect of the SNP, noted $a$ from now on, and $\beta_2$ is the dominance effect of the SNP, $d = a k$. Line 31: Line 27: Let's now write in matrix notation: Let's now write in matrix notation: - $Y = X B + E$ + $Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T$ - + - where $B = [ \mu \; a \; d ]^T$ + which gives the following conditional distribution for the phenotypes: which gives the following conditional distribution for the phenotypes: Line 88: Line 82: $B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)$ $B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)$ + + + * '''Posterior of $\tau$''': + + Similarly to the equations above: + + $\mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau)$ + + But now, to handle the second term, we need to integrate over $B$, thus effectively taking into account the uncertainty in $B$: + + $\mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B$

## Revision as of 12:46, 21 November 2012

Project name Main project page
Previous entry      Next entry

## Bayesian model of univariate linear regression for QTL detection

See Servin & Stephens (PLoS Genetics, 2007).

• Data: let's assume that we obtained data from N individuals. We note $y_1,\ldots,y_N$ the (quantitative) phenotypes (e.g. expression level at a given gene), and $g_1,\ldots,g_N$ the genotypes at a given SNP (as allele dose, 0, 1 or 2).

• Goal: we want to assess the evidence in the data for an effect of the genotype on the phenotype.

• Assumptions: the relationship between genotype and phenotype is linear; the individuals are not genetically related; there is no hidden confounding factors in the phenotypes.

• Likelihood: $\forall i \in \{1,\ldots,N\}, \; y_i = \mu + \beta_1 g_i + \beta_2 \mathbf{1}_{g_i=1} + \epsilon_i \text{ with } \epsilon_i \overset{i.i.d}{\sim} \mathcal{N}(0,\tau^{-1})$

where β1 is in fact the additive effect of the SNP, noted a from now on, and β2 is the dominance effect of the SNP, d = ak.

Let's now write in matrix notation:

$Y = X B + E \text{ where } B = [ \mu \; a \; d ]^T$

which gives the following conditional distribution for the phenotypes:

$Y | X, B, \tau \sim \mathcal{N}(XB, \tau^{-1} I_N)$

• Priors: conjugate

$\tau \sim \Gamma(\kappa/2, \, \lambda/2)$

$B | \tau \sim \mathcal{N}(\vec{0}, \, \tau^{-1} \Sigma_B) \text{ with } \Sigma_B = diag(\sigma_{\mu}^2, \sigma_a^2, \sigma_d^2)$

• Joint posterior:

$\mathsf{P}(\tau, B | Y, X) = \mathsf{P}(\tau | Y, X) \mathsf{P}(B | Y, X, \tau)$

• Conditional posterior of B:

$\mathsf{P}(B | Y, X, \tau) = \mathsf{P}(B, Y | X, \tau)$

$\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B, Y | X, \tau)}{\mathsf{P}(Y | X, \tau)}$

$\mathsf{P}(B | Y, X, \tau) = \frac{\mathsf{P}(B | \tau) \mathsf{P}(Y | X, B, \tau)}{\int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B}$

Here and in the following, we neglect all constants (e.g. normalization constant, YTY, etc):

$\mathsf{P}(B | Y, X, \tau) \propto \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B)$

We use the prior and likelihood and keep only the terms in B:

$\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B) exp((Y-XB)^T(Y-XB))$

We expand:

$\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Sigma_B^{-1} B - Y^TXB -B^TX^TY + B^TX^TXB)$

We factorize some terms:

$\mathsf{P}(B | Y, X, \tau) \propto exp(B^T (\Sigma_B^{-1} + X^TX) B - Y^TXB -B^TX^TY)$

Let's define $\Omega = (\Sigma_B^{-1} + X^TX)^{-1}$. We can see that ΩT = Ω, which means that Ω is a symmetric matrix. This is particularly useful here because we can use the following equality: Ω − 1ΩT = I.

$\mathsf{P}(B | Y, X, \tau) \propto exp(B^T \Omega^{-1} B - (X^TY)^T\Omega^{-1}\Omega^TB -B^T\Omega^{-1}\Omega^TX^TY)$

This now becomes easy to factorizes totally:

$\mathsf{P}(B | Y, X, \tau) \propto exp((B^T - \Omega X^TY)^T\Omega^{-1}(B - \Omega X^TY))$

We recognize the kernel of a Normal distribution, allowing us to write the conditional posterior as:

$B | Y, X, \tau \sim \mathcal{N}(\Omega X^TY, \tau^{-1} \Omega)$

• Posterior of τ:

Similarly to the equations above:

$\mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \mathsf{P}(Y | X, \tau)$

But now, to handle the second term, we need to integrate over B, thus effectively taking into account the uncertainty in B:

$\mathsf{P}(\tau | Y, X) \propto \mathsf{P}(\tau) \int \mathsf{P}(B | \tau) \mathsf{P}(Y | X, \tau, B) \mathsf{d}B$