User:Timothee Flutre/Notebook/Postdoc/2012/01/02: Difference between revisions

Revision as of 10:10, 2 January 2012

Project name

<html><img src="/images/9/94/Report.png" border="0" /></html> Main project page
<html><img src="/images/c/c3/Resultset_previous.png" border="0" /></html>Previous entry<html>      </html>Next entry<html><img src="/images/5/5c/Resultset_next.png" border="0" /></html>

Learn about the multivariate Normal and matrix calculus

(Caution, this is my own quick-and-dirty tutorial, see the references at the end for presentations by professional statisticians.)

Motivation: when we measure things, we often have to measure several properties for each item. For instance, we extract a sample of cells from each person, and we measure the expression level of all genes in the sample. We hence have, for each person, a vector of measurements, which leads us to the world of multivariate statistics.

Data: we have N observations, noted [math]\displaystyle{ X = (x_1, x_2, ..., x_N) }[/math], each being of dimension [math]\displaystyle{ P }[/math]. This means that each [math]\displaystyle{ x_i }[/math] is a vector belonging to [math]\displaystyle{ \mathbb{R}^P }[/math].

Model: we suppose that the [math]\displaystyle{ x_i }[/math] are independent and identically distributed according to a multivariate Normal distribution [math]\displaystyle{ N_P(\mu, \Sigma) }[/math]. [math]\displaystyle{ \mu }[/math] is the P-dimensional mean vector, and [math]\displaystyle{ \Sigma }[/math] the PxP covariance matrix. If [math]\displaystyle{ \Sigma }[/math] is positive definite (which we will assume), the density function for a given x is: [math]\displaystyle{ f(x/\mu,\Sigma) = (2 \pi)^{-P/2} |\Sigma|^{-1/2} exp(-\frac{1}{2} (x-\mu)^T \Sigma^{-1} (x-\mu)) }[/math], with [math]\displaystyle{ |M| }[/math] denoting the determinant of a matrix and [math]\displaystyle{ M^T }[/math] its transpose.

Likelihood: as usual, we will start by writing down the likelihood of the data, the parameters being [math]\displaystyle{ \theta=(\mu,\Sigma) }[/math]:

[math]\displaystyle{ L(\theta) = \mathbb{P}(X/\theta) }[/math]

As the observations are independent:

[math]\displaystyle{ L(\theta) = \prod_{i=1}^N f(x_i / \theta) }[/math]

It is easier to work with the log-likelihood:

[math]\displaystyle{ l(\theta) = ln(L(\theta)) = \sum_{i=1}^N ln( f(x_i / \theta) ) }[/math]

[math]\displaystyle{ l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) }[/math]

ML estimation: as usual, to find the maximum-likelihood estimates of the parameters, we need to derive the log-likelihood with respect to each parameter, and then take the values of the parameters at which the log-likelihood is zero. However, in the case of multivariate distributions, this requires knowing a bit of matrix calculus, which it is not always straightforward...

... <TO DO> ...

MLE method 1: from Magnus and Neudecker (third edition, Part Six, Chapter 15, Section 3, p.353). First, they re-write the log-likelihood, noting that [math]\displaystyle{ (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) }[/math] is a scalar, ie. a 1x1 matrix, and is therefore equal to its trace:

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) ) }[/math]

As the trace is invariant under cyclic permutations ([math]\displaystyle{ tr(ABC) = tr(BCA) = tr(CAB) }[/math]):

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T ) }[/math]

The trace is also a linear map ([math]\displaystyle{ tr(A+B) = tr(A) + tr(B) }[/math]):

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \sum_{i=1}^N \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T ) }[/math]

And finally:

[math]\displaystyle{ \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \Sigma^{-1} \sum_{i=1}^N (x_i-\mu) (x_i-\mu)^T ) }[/math]

As a result:

[math]\displaystyle{ l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} tr(\Sigma^{-1} Z) }[/math] with [math]\displaystyle{ Z=\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T }[/math]

We can now write the first differential of the log-likelihood:

[math]\displaystyle{ d_{\theta}l(\theta) = - \frac{N}{2} d(ln(|\Sigma|)) - \frac{1}{2} d(tr(\Sigma^{-1} Z)) }[/math]

... <TO DO> ...

References:
- Magnus and Neudecker, Matrix differential calculus with applications in statistics and econometrics (2007)
- Wand, Vector differential calculus in statistics (The American Statistician, 2002)

@@ Line 6: / Line 6: @@
 | colspan="2"|
 <!-- ##### DO NOT edit above this line unless you know what you are doing. ##### -->
-==About the multivariate Normal distribution==
+==Learn about the multivariate Normal and matrix calculus==
-* '''Motivation''': when we measure things, we often have to measure several properties for each item. For instance, for each person, we measure the expression level of all genes in his sample.
+''(Caution, this is my own quick-and-dirty tutorial, see the references at the end for presentations by professional statisticians.)''
+* '''Motivation''': when we measure things, we often have to measure several properties for each item. For instance, we extract a sample of cells from each person, and we measure the expression level of all genes in the sample. We hence have, for each person, a vector of measurements, which leads us to the world of multivariate statistics.
 * '''Data''': we have N observations, noted <math>X = (x_1, x_2, ..., x_N)</math>, each being of dimension <math>P</math>. This means that each <math>x_i</math> is a vector belonging to <math>\mathbb{R}^P</math>.
@@ Line 27: / Line 29: @@
 <math>l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} \sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu)</math>
+* '''ML estimation''': as usual, to find the [http://en.wikipedia.org/wiki/Maximum_likelihood maximum-likelihood estimates] of the parameters, we need to derive the log-likelihood with respect to each parameter, and then take the values of the parameters at which the log-likelihood is zero. However, in the case of multivariate distributions, this requires knowing a bit of [http://en.wikipedia.org/wiki/Matrix_calculus matrix calculus], which it is not always straightforward...
+... <TO DO> ...
+* '''MLE method 1''': from Magnus and Neudecker (third edition, Part Six, Chapter 15, Section 3, p.353). First, they re-write the log-likelihood, noting that <math>(x_i-\mu)^T \Sigma^{-1} (x_i-\mu)</math> is a scalar, ie. a 1x1 matrix, and is therefore equal to its [http://en.wikipedia.org/wiki/Trace_%28linear_algebra%29 trace]:
+<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) )</math>
+As the trace is invariant under cyclic permutations (<math>tr(ABC) = tr(BCA) = tr(CAB)</math>):
+<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = \sum_{i=1}^N tr( \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T )</math>
+The trace is also a linear map (<math>tr(A+B) = tr(A) + tr(B)</math>):
+<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \sum_{i=1}^N \Sigma^{-1} (x_i-\mu) (x_i-\mu)^T )</math>
+And finally:
+<math>\sum_{i=1}^N (x_i-\mu)^T \Sigma^{-1} (x_i-\mu) = tr( \Sigma^{-1} \sum_{i=1}^N (x_i-\mu) (x_i-\mu)^T )</math>
+As a result:
+<math>l(\theta) = -\frac{NP}{2} ln(2\pi) - \frac{N}{2}ln(|\Sigma|) - \frac{1}{2} tr(\Sigma^{-1} Z)</math> with <math>Z=\sum_{i=1}^N(x_i-\mu)(x_i-\mu)^T</math>
+We can now write the first [http://en.wikipedia.org/wiki/Differential_of_a_function differential] of the log-likelihood:
+<math>d_{\theta}l(\theta) = - \frac{N}{2} d(ln(|\Sigma|)) - \frac{1}{2} d(tr(\Sigma^{-1} Z))</math>
+... <TO DO> ...
+* '''References''':
+** Magnus and Neudecker, Matrix differential calculus with applications in statistics and econometrics (2007)
+** Wand, Vector differential calculus in statistics (The American Statistician, 2002)
 <!-- ##### DO NOT edit below this line unless you know what you are doing. ##### -->

User:Timothee Flutre/Notebook/Postdoc/2012/01/02: Difference between revisions

Revision as of 10:10, 2 January 2012

Learn about the multivariate Normal and matrix calculus

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools