User:Nuri Purswani/NetworkReconstruction/Algorithms

From OpenWetWare
Jump to navigationJump to search
Algorithms for Biological Network Reconstruction from data


Home Project Overview Algorithms Results Discussion Software References
Algorithms

< THIS SECTION WILL BECOME METHODS

Synthetic Datasets

The State Space Representation

The general form of a state space model is given by:

[math]\displaystyle{ \dot{x}=Ax + Bu }[/math]
[math]\displaystyle{ y=Cx + Du }[/math]

Where:

  • [math]\displaystyle{ x }[/math] is a k dimensional vector [math]\displaystyle{ x_{1:T} }[/math] of hidden variables.
  • [math]\displaystyle{ y }[/math] is a p dimensional vector [math]\displaystyle{ y_{1:T} }[/math] of observed variables.
  • [math]\displaystyle{ u }[/math] is a d dimensional vector [math]\displaystyle{ u_{1:T} }[/math] of external driving inputs.
  • [math]\displaystyle{ A_{kxk} }[/math] is a state matrix, describing the dynamics of the number of hidden states [math]\displaystyle{ k }[/math].
  • [math]\displaystyle{ B_{kxd} }[/math] is a matrix that describes the effects of the number of inputs [math]\displaystyle{ d }[/math] on the hidden states.
  • [math]\displaystyle{ C_{pxk} }[/math] is an "emission" matrix, describing the relationship between the state and the output.
  • [math]\displaystyle{ D_{pxd} }[/math] is a matrix that describes the effects of the number of inputs [math]\displaystyle{ d }[/math] on the observations (e.g. mRNA concentrations).

The gene expression input data was modelled using this framework, with variations that depended on the algorithm that was employed. The next section will explain how the state space representation can be applied to a real gene network.

The Ring Network

Consider a real network containing 6 genes, regulating each other as shown in Figure 1.A. If we assume that in a real microarray experiment, we can only measure three out of 6 genes (Figure 1.B), this network then translates to the state space representation in Eq (1).

Figure 1:
CHANGE THE FIGURE TO MAKE IT MORE REPRESENTATIVE OF THE EQUATION A. Each blue node represents a gene in this ring network.
B. The red nodes correspond to the measured states, and blue nodes correspond to the hidden states.

Since in this particular example we have three hidden states and three measured states and three hidden states, the system can be represented as follows:

[math]\displaystyle{ \begin{pmatrix}\dot{x1} \\ \dot{x2} \\ \dot{x3}\end{pmatrix}= \begin{pmatrix} a_{11} & a12 & a13 \\ a21 & a22 & a23\\ a31 & a32 & a33 \end{pmatrix} }[/math]
[math]\displaystyle{ }[/math]

Summary of datasets

Algorithms

Rangel et al 2004

Description

Read here for more detail about this method.

Input Requirements

  • Time series data
  • >20 repetitions of an experiment

Outputs

  • Estimate of network parameters
  • Estimate of the number of hidden variables

Assumptions

Capabilities

Limitations

Beal et al 2005

Stan, Gonçalves and Warnick 2008

Input Requirements

This algorithm is capable of dealing with:

  • Steady State data
  • Time series data

At present, it has only been tested on synthetically generated data sets due to lack of available experiments. In practice, it is capable of recovering partial network structure from both steady state and time series data when experiments are performed in a systematic way, where:

  • If a network contains p measured species, p experiments must be performed.
  • Each experiment must independently control a measured species.
  • Requires external step inputs!

Available biological datasets do not contain microarray expression data collected in this systematic way, so inputs for this algorithm have been artificially generated from these ode models. (Futurework point: Include that to test the validity of this algorithm, it is essential to design experiments in this way. Maybe start with synthetic networks and then move to something else)

Dynamical Structure Reconstruction for LTI networks

Need to add more detail to the section

Input Requirements

This algorithm is capable of dealing with:

  • Steady State data
  • Time series data

At present, it has only been tested on synthetically generated data sets due to lack of available experiments. In practice, it is capable of recovering partial network structure from both steady state and time series data when experiments are performed in a systematic way, where:

  • If a network contains p measured species, p experiments must be performed.
  • Each experiment must independently control a measured species.
  • Requires external step inputs!

Available biological datasets do not contain microarray expression data collected in this systematic way, so inputs for this algorithm have been artificially generated from these ode models. (Futurework point: Include that to test the validity of this algorithm, it is essential to design experiments in this way. Maybe start with synthetic networks and then move to something else)

Outputs

  • If provided steady state data in the relevant input format, it can recover the boolean structure between observed nodes.
  • If provided time series data, it can recover the dynamical structure function between observed nodes

Assumptions

  • Sparsity Assumption: Sparsest network structure capable of producing the observed behaviour is more likely to be the candidate structure. This may not necessarily be the case in biological networks, where connections are a result of previous generations of trial and error..
  • The method treats gene regulatory networks as linear time invariant systems.

Capabilities

  • From steady state data, the boolean structure of the network between the observed nodes
  • From time series data we can recover the boolean and the dynamical structure function between the observed nodes
  • It can cope with non-linearities in the form A/(B+z^n) -hill functions- where z is a variable in the system

LDST

Input Requirements

  • Data input must fit this particular structural model:
  • Matrix A must be stable. That is, the eigenvalues of the state space vector <1.
  • The state of the system must be controllable
  • The state of the system must be observable
  • Doesnt require step inputs

Outputs

  • A network structure containing estimates for the parameters A, B, C, D, R, Q and the dimension of the hidden space k.
  • A directed acyclic graph obtained by relevant hypothesis estimation from the CB+D matrix describing gene-gene interactions.

Assumptions

These come from Beal:

  • In the LDS model they use, dynamics and output processes are assumed to be time invariang, which is not realistic.
  • The times at which data is sampled is not linearly spaced: There may be non-linearities in the transcriptional processes that are not studied.
  • May be missing time slices, cant cope with stationary data.
  • Noise dynamics are assumed to be Gaussian, but this is not necessarily the case.


Capabilities

  • It can cope strictly with linear systems, but not with non-linearities.
  • It is capable of inferring the dimensions of the hidden space. (right or wrong is a different story...)

REVISIT THOSE POINTS IN THE DISCUSSION

Variational Bayesian

Inputs

Outputs

Assumptions

Capabilities