The Bayesian Approach

The Bayesian Philosophy

The other way of looking at data is the Bayesian approach. It is still a controversial interpretation of the concept of probability –if you read the chapter I have posted, you will see that in my opinion it is the only consistent, honest interpretation. A Bayesian probability is commonly seen as a ‘degree of belief’ and a distribution is seen as a summary of what we know about a phenomenon. At the heart of the theory lies a theorem (Bayes’ theorem) and an update rule.

• What is Bayes’ theorem?
• What is the meaning of the terms involved in the theorem?
• What is Bayes’ update rule? What is its meaning?

An Example of Bayesian Parameter Estimation

One of the easiest applications of Bayesian analysis is the estimation of the parameters of a statistical distribution. To see how it is done let us deal with a simple example: the exponential distribution

• What is the exponential distribution of parameter λ?
• Write the general expression for likelihood of the sample vector X=(x1,…,xN)
• It is customary to assign to scale parameters such as λ a prior distribution P(λ)=1/ λ. Write the posterior of λ
• Find the general expression of the maximum of the posterior
• Bonus Question: P(λ)=1/ λ is not normalised… What can be done about it?
• Draw 10 samples x1,…,x10 from the exponential distribution λ =1
• For these 10 samples plot the posterior of λ
• Calculate its maximum, its expected value and its standard deviation

• Zheng Da Clinton Goh 11:25, 12 August 2008 (EDT):This is the newest file of what I've done. Apparently the part where you have to draw samples from a Gaussian of mean 1 and std 1 was done wrongly, but the main parts of the exercise are at the back. Will check on this. I've also successfully created a plot of posterior of mean and std for a Gaussian (looks quite cool), and marginalised it the easy way! Hard way is to do the actual integral, but there's not a need for this cause we're using MATLAB =). Enjoy!

Influence of the Quantity of Data and the Hypothesis

• Re-apply the analysis with 50 samples, 100 samples.
• What do you observe?
• I want to estimate λ with a 5% error: how many samples do I need?
• Now, Draw 10, 50, 100 samples with a Gaussian distribution of mean 1 and standard deviation 1.
• Apply the previous Bayesian analysis to these samples
• What do you observe?

Interlude: The Bayesian Philosophy Pt 2

As you must surely have realised there are several ways to estimate the parameters from their posterior distribution. This may seem confusing but there are very good reasons for this. The first reason is that although the Bayesian approach is closely linked to decision theory, it is not solely concerned with reaching decisions. The main purpose is the construction of the posterior distributions –the summaries of what we know. The posteriors will be updated upon arrival of new data. There is therefore no point in reaching a decision – which inevitably leads to a loss of knowledge – until all the collectable data are in and all the possibilities are estimated. Secondly, a decision of the kind we are interested in for our iGEM project is not the only possible application of the construction of the posterior. For instance the construction of the posterior might be a simple intermediary step in a more complex algorithm - the Bayesian approach is highly modular. Imagine we conduct a small experiment in order to learn something about a phenomenon – say the growth rate of a colony. If the experiment is very thorough and very well-controlled we can reasonably hope to obtain a reliable estimate of the growth rate. Else, it is dangerous to trust the data too much. On the other hand, it would be wasteful to get rid of the data altogether. But we can still use them as a source of prior knowledge. In practice, the posterior of the growth rate is then used as a prior in the new, more complex process. The lesson of the story therefore is:

• The posterior is what matters
• Be flexible
• Do not make ‘hard’ choices unless and until you are forced to

Bibliographical Research on Bayesian Parameter Estimation

Now for a problem like ours we need to reach a decision at some point! As you must surely have realised there are several ways to estimate the parameters from their posterior distribution. This may seem confusing but there is a very good reason for this: there are several ways to estimate the parameters because there are several ways to ask the question ‘what is the parameter?’. In decision theory the outcome to the decision process is characterised by a series of desirable properties – the properties of interest being encoded into a utility/loss function.

• What is a utility/loss function?
• What is a Bayesian estimator?
• What is MAP? What are the pros and cons of the method?

Note: The use of a Bayesian estimator is the approach that I encourage you to adopt, but if you find another -justifiable- way to ask the question ‘what is the parameter?’ to the posterior and you can get an answer , by all means try!

A More Complex Example of Bayesian Parameter Estimation

The exponential distribution only had one parameter to estimate. With several parameters things get a it more complicated. Now let us deal with a slightly more complex example: the Gaussian distribution. Let us call m and σ its parameters. We assign to σ a prior distribution P(σ)=1/ σ and to m a uniform distribution P(m)=1. Again the normalisation constants are overlooked

• Write the general expression for likelihood of the sample vector X
• Write the posterior of (m,σ) - Apply MAP to the posterior of (m,σ)
• Draw 10, 50, 100 samples X=(x1,…,xN) from a Gaussian of mean 10, deviation 5. For these samples plot the posterior of (m,σ).
• Calculate its maximum, its expected value and its Hessian

So far, so good: we have simply gone from one dimension to two. However, there is another possibility: we can simply ring ourselves back to one dimension for each of the parameter of the model through marginalisation of the other parameters.

• What is marginalisation?
• Marginalise σ to get the posterior of m
• For the samples you have drawn, plot the Posterior of m.
• And estimate its maximum, mean and standard deviation
• Do the same with the posterior of σ

Bayesian Model Comparison

The real power of the Bayesian approach does not lie in the way it estimates parameters – it is often close the standard way- but in the systematic way it performs higher-level operations such as model comparison. A beautiful framework was developed by D.McKay - See McKay’s book chapters 27 and 28- that explains among other things one of the oldest principles of science: Occam’s razor

• What is Occam’s razor?

To see how this works, let us go back to the example of the exponential distribution. Use for the data D the 50 samples you have drawn

• Estimate the evidence of your data D – use a Laplace approximation
• Now consider a uniform distribution between 0 and 10. What is the evidence of the data for such a model?
• Which one is the better model? (please prove it is the exponential…)

The estimation of the evidence can be made much more simpler ( to the price of precision of course) by considering the degenerate case and fix the parameters to their estimated value

• What is the Evidence?
• Is the exponential model still the better model?

We now have a way to match models to data and rate the models versus these data…We have everything we need!

Multilayered Bayesian Data Analysis

We are now going to combine all the bits that you have developed into a bi-layered Bayesian data analysis. At the first layer, the routine will associate to a model of your choice the best parameters. At the second layer it will determine among a library of model which one is best supported by the data.

Generation of Data First you will need some data

• Generate motility data according to the model of your choice
• Split them into velocity, run-time, rotation-time and angle
• Do the same with another – unrealistic model.

First Layer We are only going to analyse the velocity during the run-phase.

• Identify several plausible models for the velocity during the run-phase.
• Estimate the parameter(s) for each model
• Estimate the reliability of your predictions

Second Layer For the velocity during the run-phase:

• Compute the Evidence for each candidate model
• Determine the best supported model

Design of Experiment While we are at it, let us have a look at the design of experiment. We want to estimate the parameters with a 10% precision.

• What does it mean in terms of posterior?
• For some of your candidate models, how many samples do you need to achieve such precision? Hint: Use synthetic data.

Unfortunately we cannot be sure of the quality of the data. Let us have a look at how bad data degrade the quality of the first layer.

• Mix the data generated by your realistic model with data generated with your unrealistic model (use a ratio good/bad of 10 , 5 then 2)
• Under the assumption of the correct model, estimate the parameters of the model. Compute the corresponding Evidence.