Project Description
Bayesian networks (BN) are an area I want to develop a working knowledge in. The aim in this project is to identify particular research directions that employ BNs or extend BN work.
Project Goals:
 Review BN literature
 Identify stateofart algorithms and tools
 Discover applications of BNs in life sciences
Literature Review
Review of Bayesian Basics
Copied from WengKeen Wong, 2005.
Probability Primer
 refers to event and some degree of uncertainty about outcome of event
 The relative frequency that outcomes occurs if repeated large number of times under similar conditions
 "Bayesian" definition: probability is degree of belief in an outcome
 Conditional probabilities Ρ(A = true  B = true)
 Out of all outcomes in which B is true, how many also have A equal to true. Read: "probability of A conditioned on B" or "probability of A given B"
 E.g.
 H = "have a headache"
 F = "coming down with flu"
 P(H = true) = 1/10, ..... P(F = true) = 1/40, ..... P(H = true  F = true) = 1/2
 "Headaches are rare, flu is rarer, but if your coming down with flu, there's a 5050 chance you'll have a headache"
 Joint probability distribution P(A = true, B = true)
 The probability of A=true and B=true.
 P(H=trueF=true) = P(H=true,F=true)/P(F=true), or the probability they both occur divided by the probability the conditioned variable occurs"
 Can be any number of random variables e.g. P(A=true, B=true, C=true)
 For every combination of variables, need to know how probable that combination is
A
 B
 C
 P(A,B,C)

F
 F
 F
 0.1

F
 F
 T
 0.2

.
 .
 .
 .

 The probabilities of these combos need to sum to 1.Once you have the joint probability distribution, you can calculate any probability involving A,B and C.
 E.g. P(A=true) = sum of P(A,B,C) in rows with A=true
 P(A=true, B=true  C=true) = P(A=true, B=true, C=true) / P(C=true)
 for k boolean random variables you need table of size 2^{k}
 Indepedence reduces the number of table entries.
 for n coin flips, the joint distribution P(C_{1},...C_{n}), and if coin flips are not independent, you need 2^{n} table entries
 If independent, then P(C_{1},...C_{n}) = Π^{n}_{i=1} P(C_{i})
 Each P(C_{i}) has two table entries, for a total of 2n values
 A and B are conditionally independent given C, if any of the following:
 P(A,BC) = P(AC)P(BC)
 P(AB,C) = P(AC)
 P(BA,C) = P(BC)
 Knowing C tells me everything about B. I don't gain anything from knowing A. Two possibilities: A doesn't influence B, or C provides all information that A would provide.
Bayesian Networks
A Bayesian network is made up of
 A directed acyclic graph
 conditional probability distribution tables for each node.
These tables contain the conditional probability distribution P(X_{i  Parents(Xi}) for node X_{i} in graph. This only includes the immediate parents, and not higher ancestors. If you have k Parents, this table has 2^{k+1} probabilities (but because probabilities sum to 1, only 2^{k} need to be stored.
Properties:
 Encodes conditional independence between variables in graph structure
 Compact representation of join probability distribution over variables
Conditional independence (or Markov condition): given its parents, node (X) is conditionally independent of nondescendents. Using this markov condition we can compute joint probability distribution over all variables in BN using:
 P(X_{1}=x_{1},...X_{n}=x_{n}) = Π^{n}_{i=1}P(X_{i}=x_{i}Parents(X_{i}))
Inference:
Using a BN to compute probabilities is called inference. Usual form is P( X  E), X = query variables, E = evidence variables.
Exact inference is possible in medsmall networks. Must use approximate techniques for large networks. Can also have many unobserved values.
Design:
Either you can get an field expert to design BN structure, or you can try and learn it from data.
