User:Timothee Flutre/Notebook/Postdoc/2012/01/02: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(→‎Entry title: start describing multivar normal distrib)
Line 6: Line 6:
| colspan="2"|
| colspan="2"|
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
==About the multivariate Normal distribution==
==Learn about the multivariate Normal and matrix calculus==


* '''Motivation''': when we measure things, we often have to measure several properties for each item. For instance, for each person, we measure the expression level of all genes in his sample.
''(Caution, this is my own quick-and-dirty tutorial, see the references at the end for presentations by professional statisticians.)''
 
* '''Motivation''': when we measure things, we often have to measure several properties for each item. For instance, we extract a sample of cells from each person, and we measure the expression level of all genes in the sample. We hence have, for each person, a vector of measurements, which leads us to the world of multivariate statistics.


* '''Data''': we have N observations, noted <math>X = (x_1, x_2, ..., x_N)</math>, each being of dimension <math>P</math>. This means that each <math>x_i</math> is a vector belonging to <math>\mathbb{R}^P</math>.
* '''Data''': we have N observations, noted <math>X = (x_1, x_2, ..., x_N)</math>, each being of dimension <math>P</math>. This means that each <math>x_i</math> is a vector belonging to <math>\mathbb{R}^P</math>.
Line 27: Line 29:


<math>l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu)</math>
<math>l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu)</math>
* '''ML estimation''': as usual, to find the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum-likelihood estimates] of the parameters, we need to derive the log-likelihood with respect to each parameter, and then take the values of the parameters at which the log-likelihood is zero. However, in the case of multivariate distributions, this requires knowing a bit of [http://en.wikipedia.org/wiki/Matrix_calculus matrix calculus], which it is not always straightforward...
... <TO DO> ...
* '''MLE method 1''': from Magnus and Neudecker (third edition, Part Six, Chapter 15, Section 3, p.353). First, they re-write the log-likelihood, noting that <math>(x_i-\mu)^T \Sigma^{-1} (x_i-\mu)</math> is a scalar, ie. a 1x1 matrix, and is therefore equal to its [http://en.wikipedia.org/wiki/Trace_%28linear_algebra%29 trace]:
<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) )</math>
As the trace is invariant under cyclic permutations (<math>tr(ABC) = tr(BCA) = tr(CAB)</math>):
<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T )</math>
The trace is also a linear map (<math>tr(A+B) = tr(A) + tr(B)</math>):
<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \sum_{i=1}^N \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T )</math>
And finally:
<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \Sigma^{-1} \sum_{i=1}^N (x_i-\mu) (x_i-\mu)^T )</math>
As a result:
<math>l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} tr(\Sigma^{-1} Z)</math> with <math>Z=\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T</math>
We can now write the first [http://en.wikipedia.org/wiki/Differential_of_a_function differential] of the log-likelihood:
<math>d_{\theta}l(\theta) = - \frac{N}{2} d(ln(|\Sigma|)) - \frac{1}{2} d(tr(\Sigma^{-1} Z))</math>
... <TO DO> ...
* '''References''':
** Magnus and Neudecker, Matrix differential calculus with applications in statistics and econometrics (2007)
** Wand, Vector differential calculus in statistics (The American Statistician, 2002)


<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->
<!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

Revision as of 10:10, 2 January 2012

Project name <html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Learn about the multivariate Normal and matrix calculus

(Caution, this is my own quick-and-dirty tutorial, see the references at the end for presentations by professional statisticians.)

  • Motivation: when we measure things, we often have to measure several properties for each item. For instance, we extract a sample of cells from each person, and we measure the expression level of all genes in the sample. We hence have, for each person, a vector of measurements, which leads us to the world of multivariate statistics.
  • Data: we have N observations, noted [math]\displaystyle{ X = (x_1, x_2, ..., x_N) }[/math], each being of dimension [math]\displaystyle{ P }[/math]. This means that each [math]\displaystyle{ x_i }[/math] is a vector belonging to [math]\displaystyle{ \mathbb{R}^P }[/math].
  • Model: we suppose that the [math]\displaystyle{ x_i }[/math] are independent and identically distributed according to a multivariate Normal distribution [math]\displaystyle{ N_P(\mu, \Sigma) }[/math]. [math]\displaystyle{ \mu }[/math] is the P-dimensional mean vector, and [math]\displaystyle{ \Sigma }[/math] the PxP covariance matrix. If [math]\displaystyle{ \Sigma }[/math] is positive definite (which we will assume), the density function for a given x is: [math]\displaystyle{ f(x/\mu,\Sigma) = (2 \pi)^{-P/2} |\Sigma|^{-1/2} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)) }[/math], with [math]\displaystyle{ |M| }[/math] denoting the determinant of a matrix and [math]\displaystyle{ M^T }[/math] its transpose.
  • Likelihood: as usual, we will start by writing down the likelihood of the data, the parameters being [math]\displaystyle{ \theta=(\mu,\Sigma) }[/math]:

[math]\displaystyle{ L(\theta) = \mathbb{P}(X/\theta) }[/math]

As the observations are independent:

[math]\displaystyle{ L(\theta) = \prod_{i=1}^N f(x_i / \theta) }[/math]

It is easier to work with the log-likelihood:

[math]\displaystyle{ l(\theta) = ln(L(\theta)) = \sum_{i=1}^N ln( f(x_i / \theta) ) }[/math]

[math]\displaystyle{ l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) }[/math]

  • ML estimation: as usual, to find the maximum-likelihood estimates of the parameters, we need to derive the log-likelihood with respect to each parameter, and then take the values of the parameters at which the log-likelihood is zero. However, in the case of multivariate distributions, this requires knowing a bit of matrix calculus, which it is not always straightforward...

... <TO DO> ...

  • MLE method 1: from Magnus and Neudecker (third edition, Part Six, Chapter 15, Section 3, p.353). First, they re-write the log-likelihood, noting that [math]\displaystyle{ (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) }[/math] is a scalar, ie. a 1x1 matrix, and is therefore equal to its trace:

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) ) }[/math]

As the trace is invariant under cyclic permutations ([math]\displaystyle{ tr(ABC) = tr(BCA) = tr(CAB) }[/math]):

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T ) }[/math]

The trace is also a linear map ([math]\displaystyle{ tr(A+B) = tr(A) + tr(B) }[/math]):

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \sum_{i=1}^N \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T ) }[/math]

And finally:

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \Sigma^{-1} \sum_{i=1}^N (x_i-\mu) (x_i-\mu)^T ) }[/math]

As a result:

[math]\displaystyle{ l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} tr(\Sigma^{-1} Z) }[/math] with [math]\displaystyle{ Z=\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T }[/math]

We can now write the first differential of the log-likelihood:

[math]\displaystyle{ d_{\theta}l(\theta) = - \frac{N}{2} d(ln(|\Sigma|)) - \frac{1}{2} d(tr(\Sigma^{-1} Z)) }[/math]

... <TO DO> ...

  • References:
    • Magnus and Neudecker, Matrix differential calculus with applications in statistics and econometrics (2007)
    • Wand, Vector differential calculus in statistics (The American Statistician, 2002)