User:Nuri Purswani/Network/Introduction/Types

From OpenWetWare
Revision as of 06:09, 14 June 2010 by Nuri Purswani (talk | contribs) (→‎Ordinary Differential Equations)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search
Algorithms for Biological Network Reconstruction from data

Home Introduction Synthetic Datasets Methods Algorithms Results Discussion References
Introduction

The task of biological network inference can be done at different scales (Bansal et al. 2006) :

  • Inference of gene-sequence interactions.
  • Inference of gene-gene interactions.

The methods considered in this project focus on gene-gene interactions, although the implementations are defined in a more general fashion, suitable for solving any system identification problem. The four main types of existing network reconstruction algorithms, along with their assumptions and limitations are summarised below:

Clustering

The principle between this type of method is that genes that are co-expressed are likely to be functionally related. Quantification of "relatedness" between genes requires calculation employing some form of distance measure. Common distance measures include: eucledian distance, pearson correlation, spellman-rank...(Bansal et al. 2007). The main limitations to these methods include:

  • Their inability to precisely infer gene-gene interactions: genes may be co-expressed, but clustering methods are unable to identify the precise interactions between genes.
  • "There is no one size that fits all": Clustering methods impose a particular structure on the dataset of interest[3]. So out of the various published algorithms, there is no guarantee that they will return the correct structure of the datasets.

Bayesian Networks

Bayesian networks are graph based models that capture relations of conditional independence between variables (Ben-Gal et al. 2007). Markovian assumptions apply to them (Markov 1971), which states that any given node in a directed acyclic graph is independent of its non-descendants, given its parents. This means that:

  • Static bayesian networks (i.e. networks that infer biological relationships from steady state data) are limited in their usefulness to interpret biological information. One of the common characteristics of biological processes is the presence of feedback, which is not captured by this rule of conditional independence.
  • Dynamic bayesian networks overcome this problem, by using time-series data as input.
  • Advantage-they are able to cope with noisy data and stochasticity

Bayesian inference algorithms commonly utilise optimisation procedures such as simulated annealing, expectation maximization algorithm, etc, to maximise a score of observing a particular data structure given a set of parameters. Following from Bayes' rule:

[math]\displaystyle{ P(G|D)= \frac{P(D|G)*P(G)}{P(D)} }[/math]

Where [math]\displaystyle{ P(G|D) }[/math] is the score we are trying to maximise for a parameter [math]\displaystyle{ G }[/math] given the observed data [math]\displaystyle{ D }[/math]. Scoring schemes also include steps that penalize connections such as the bayesian information criterion, aic and dirichlet (Bansal et al. 2007) These will be re-visited again in the next section.

Information Theory

Information theoretic approaches use the mutual information measure to quantify dependences between a pair of genes i and j. Depending on the value of the threshold set, the edges of the dependency are set to zero or 1. The general formula for mutual information is:

[math]\displaystyle{ M_{ij}=H_i+H_j-H_{ij} }[/math]

where [math]\displaystyle{ H_i }[/math] is the entropy of a given quantity:

[math]\displaystyle{ -\sum_{i=1}^{n}p(x_i)log(p(x_i)) }[/math]

Common algorithms that use this network inference method include ARACNE, CLR and MRNET (Olsen et al. 2008).

Ordinary Differential Equations

These involve inference of parameters [math]\displaystyle{ A, B, C, D }[/math] for models in the form:

[math]\displaystyle{ \frac{dx}{dt}=Ax+By }[/math]

[math]\displaystyle{ \frac{dy}{dt}=Cy+Dx }[/math]

where [math]\displaystyle{ x }[/math] and [math]\displaystyle{ y }[/math] are species in the biological network we are trying to infer. See Alche-Buc et al. 2007. and the literature review