User:Timothee Flutre/Notebook/Postdoc/2012/08/16

From OpenWetWare
Revision as of 11:29, 31 August 2012 by Timothee Flutre (talk | contribs) (→‎Variational Bayes approach for the mixture of Normals: fix error prior \mu_k + add link precision)
Jump to navigationJump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.
Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Variational Bayes approach for the mixture of Normals

  • Motivation: I have described on another page the basics of mixture models and the EM algorithm in a frequentist context. It is worth reading before continuing. Here I am interested in the Bayesian approach as well as in a specific variational method (nicknamed "Variational Bayes").


  • Data: we have N univariate observations, [math]\displaystyle{ y_1, \ldots, y_N }[/math], gathered into the vector [math]\displaystyle{ \mathbf{y} }[/math].


  • Assumptions: we assume the observations to be exchangeable and distributed according to a mixture of K Normal distributions. The parameters of this model are the mixture weights ([math]\displaystyle{ w_k }[/math]), the means ([math]\displaystyle{ \mu_k }[/math]) and the precisions ([math]\displaystyle{ \tau_k }[/math]) of each mixture components, all gathered into [math]\displaystyle{ \Theta = \{w_1,\ldots,w_K,\mu_1,\ldots,\mu_K,\tau_1,\ldots,\tau_K\} }[/math]. There are two constraints: [math]\displaystyle{ \sum_{k=1}^K w_k = 1 }[/math] and [math]\displaystyle{ \forall k \; w_k \gt 0 }[/math].


  • Observed likelihood: [math]\displaystyle{ p(\mathbf{y} | \Theta, K) = \prod_{n=1}^N p(y_n|\Theta,K) = \prod_{n=1}^N \sum_{k=1}^K w_k Normal(y_n;\mu_k,\tau_k^{-1}) }[/math]


  • Latent variables: let's introduce N latent variables, [math]\displaystyle{ z_1,\ldots,z_N }[/math], gathered into the vector [math]\displaystyle{ \mathbf{z} }[/math]. Each [math]\displaystyle{ z_n }[/math] is a vector of length K with a single 1 indicating the component to which the [math]\displaystyle{ n^{th} }[/math] observation belongs, and K-1 zeroes.


  • Augmented likelihood: [math]\displaystyle{ p(\mathbf{y},\mathbf{z}|\Theta,K) = \prod_{n=1}^N p(y_n,z_n|\Theta,K) = \prod_{n=1}^N p(z_n|\Theta,K) p(y_n|z_n,\Theta,K) = \prod_{n=1}^N \prod_{k=1}^K w_k^{z_{nk}} Normal(y_n;\mu_k,\tau_k^{-1})^{z_{nk}} }[/math]


  • Priors: we choose conjuguate ones
    • for the parameters: [math]\displaystyle{ \forall k \; \mu_k | \tau_k \sim Normal(\mu_0,(\tau_0 \tau_k)^{-1}) }[/math] and [math]\displaystyle{ \forall k \; \tau_k \sim Gamma(\alpha,\beta) }[/math]
    • for the latent variables: [math]\displaystyle{ \forall n \; z_n \sim Multinomial_K(1,\mathbf{w}) }[/math] and [math]\displaystyle{ \mathbf{w} \sim Dirichlet(\gamma) }[/math]