User:Timothee Flutre/Notebook/Postdoc/2012/08/16
From OpenWetWare
m (→Variational Bayes approach for the mixture of Normals: add missing term) |
(→Variational Bayes approach for the mixture of Normals: add part of updates for q_z) |
||
| Line 79: | Line 79: | ||
* '''Updates for <math>q_\mathbf{z}</math>''': | * '''Updates for <math>q_\mathbf{z}</math>''': | ||
| + | |||
| + | We start by writing the functional derivative of <math>\mathcal{F}</math> with respect to <math>q_{\mathbf{z}}</math>: | ||
| + | |||
| + | <math>\frac{\partial \mathcal{F}}{\partial q_{\mathbf{z}}} = \int_\Theta \, \mathrm{d}\Theta \; q_\Theta(\Theta) \; \left[ \frac{\partial}{\partial q_{\mathbf{z}}} \left( \int_\mathbf{z} \, \mathrm{d}\mathbf{z} \; \left( q_{\mathbf{z}}(\mathbf{z}) \mathrm{ln} \, p(\mathbf{y},\mathbf{z}|\Theta,K) - q_{\mathbf{z}}(\mathbf{z}) \mathrm{ln} \, q_{\mathbf{z}}(\mathbf{z}) \right) \right) \right] + C_{\mathbf{z}}</math> | ||
| + | |||
| + | <math>\frac{\partial \mathcal{F}}{\partial q_{\mathbf{z}}} = \int_\Theta \, \mathrm{d}\Theta \; q_\Theta(\Theta) \; \left[ \mathrm{ln} \, p(\mathbf{y},\mathbf{z}|\Theta,K) - \mathrm{ln} \, q_{\mathbf{z}}(\mathbf{z}) - 1 \right] + C_{\mathbf{z}}</math> | ||
| + | |||
| + | Then we set this functional derivative to zero. We also make use of a frequent assumption, namely that the variational distribution fully factorizes over each individual latent variables (mean-field assumption): | ||
| + | |||
| + | <math>\frac{\partial \mathcal{F}}{\partial q_{\mathbf{z}}} \bigg|_{q_{\mathbf{z}}^{(t+1)}} = 0 \Longleftrightarrow \forall \, n \; \mathrm{ln} \, q_{z_n}^{(t+1)}(z_n) = \left( \int_\Theta \, \mathrm{d}\Theta \; q_\Theta(\Theta) \; \mathrm{ln} \, p(\mathbf{y},\mathbf{z}|\Theta,K) \right) - 1 + C_{z_n}</math> | ||
TODO | TODO | ||
Revision as of 18:25, 4 September 2012
Main project page Previous entry Next entry
| |
Variational Bayes approach for the mixture of Normals
The latent variables induce dependencies between all the parameters of the model. This makes it difficult to find the parameters that maximize the likelihood. An elegant solution is to introduce a variational distribution of parameters and latent variables, which leads to a re-formulation of the classical EM algorithm. But let's show it directly in the Bayesian paradigm.
The constant C is here to remind us that q has the constraint of being a distribution, ie. of summing to 1, which can be enforced by a Lagrange multiplier. We can then use the concavity of the logarithm (Jensen's inequality) to derive a lower bound of the marginal log-likelihood:
Let's call this lower bound
From this, it is clear that In practice, we have to make the following crucial assumption of independence on q in order for the calculations to be analytically tractable:
This means that As we ultimately aim at inferring the parameters and latent variables that maximize the marginal log-likelihood, we will use the calculus of variations to find the functions
This naturally leads to a procedure very similar to the EM algorithm where, at the E step, we calculate the expectations of the parameters with respect to the variational distributions
We start by writing the functional derivative of
Then we set this functional derivative to zero. We also make use of a frequent assumption, namely that the variational distribution fully factorizes over each individual latent variables (mean-field assumption):
TODO
TODO | |

, gathered into the vector
and
.
, each being a vector of length K with a single 1 indicating the component to which the
and
and
as it is a
:
approximates the joint posterior, and therefore the lower-bound will be tight if and only if this approximation is exact and the KL divergence is zero.
and
:


