Algorithms for Biological Network Reconstruction from data

Home

Algorithms

< THIS SECTION WILL BECOME METHODS

Synthetic Datasets

The State Space Representation

The general form of a state space model is given by:

[math]\displaystyle{ \dot{x}=Ax + Bu }[/math]
[math]\displaystyle{ y=Cx + Du }[/math]

Where:

[math]\displaystyle{ x }[/math] is a k dimensional vector [math]\displaystyle{ x_{1:T} }[/math] of hidden variables.
[math]\displaystyle{ y }[/math] is a p dimensional vector [math]\displaystyle{ y_{1:T} }[/math] of observed variables.
[math]\displaystyle{ u }[/math] is a d dimensional vector [math]\displaystyle{ u_{1:T} }[/math] of external driving inputs.
[math]\displaystyle{ A_{kxk} }[/math] is a state matrix, describing the dynamics of the number of hidden states [math]\displaystyle{ k }[/math].
[math]\displaystyle{ B_{kxd} }[/math] is a matrix that describes the effects of the number of inputs [math]\displaystyle{ d }[/math] on the hidden states.
[math]\displaystyle{ C_{pxk} }[/math] is an "emission" matrix, describing the relationship between the state and the output.
[math]\displaystyle{ D_{pxd} }[/math] is a matrix that describes the effects of the number of inputs [math]\displaystyle{ d }[/math] on the observations (e.g. mRNA concentrations).

The gene expression input data was modelled using this framework, with variations that depended on the algorithm that was employed. The next section will explain how the state space representation can be applied to a real gene network.

The Ring Network

Consider a real network containing 6 genes, regulating each other as shown in Figure 1.A. If we assume that in a real microarray experiment, we can only measure three out of 6 genes (Figure 1.B), this network then translates to the state space representation in Eq (1).

Figure 1:
CHANGE THE FIGURE TO MAKE IT MORE REPRESENTATIVE OF THE EQUATION A. Each blue node represents a gene in this ring network.
B. The red nodes correspond to the measured states, and blue nodes correspond to the hidden states.

Since in this particular example we have three hidden states and three measured states and three hidden states, the system can be represented as follows:

[math]\displaystyle{ \begin{pmatrix}\dot{x1} \\ \dot{x2} \\ \dot{x3}\end{pmatrix}= \begin{pmatrix} a_{11} & a12 & a13 \\ a21 & a22 & a23\\ a31 & a32 & a33 \end{pmatrix} }[/math] [math]\displaystyle{ }[/math]

Summary of datasets

Algorithms

Rangel et al 2004

Description

Read here for more detail about this method.

Input Requirements

Time series data
>20 repetitions of an experiment

Outputs

Estimate of network parameters
Estimate of the number of hidden variables

Assumptions

Capabilities

Limitations

Beal et al 2005

Stan, Gonçalves and Warnick 2008

Input Requirements

This algorithm is capable of dealing with:

Steady State data
Time series data

At present, it has only been tested on synthetically generated data sets due to lack of available experiments. In practice, it is capable of recovering partial network structure from both steady state and time series data when experiments are performed in a systematic way, where:

If a network contains p measured species, p experiments must be performed.
Each experiment must independently control a measured species.
Requires external step inputs!

Available biological datasets do not contain microarray expression data collected in this systematic way, so inputs for this algorithm have been artificially generated from these ode models. (Futurework point: Include that to test the validity of this algorithm, it is essential to design experiments in this way. Maybe start with synthetic networks and then move to something else)

Dynamical Structure Reconstruction for LTI networks

Need to add more detail to the section

Input Requirements

This algorithm is capable of dealing with:

Steady State data
Time series data

At present, it has only been tested on synthetically generated data sets due to lack of available experiments. In practice, it is capable of recovering partial network structure from both steady state and time series data when experiments are performed in a systematic way, where:

If a network contains p measured species, p experiments must be performed.
Each experiment must independently control a measured species.
Requires external step inputs!

Available biological datasets do not contain microarray expression data collected in this systematic way, so inputs for this algorithm have been artificially generated from these ode models. (Futurework point: Include that to test the validity of this algorithm, it is essential to design experiments in this way. Maybe start with synthetic networks and then move to something else)

Outputs

If provided steady state data in the relevant input format, it can recover the boolean structure between observed nodes.
If provided time series data, it can recover the dynamical structure function between observed nodes

Assumptions

Sparsity Assumption: Sparsest network structure capable of producing the observed behaviour is more likely to be the candidate structure. This may not necessarily be the case in biological networks, where connections are a result of previous generations of trial and error..
The method treats gene regulatory networks as linear time invariant systems.

Capabilities

From steady state data, the boolean structure of the network between the observed nodes
From time series data we can recover the boolean and the dynamical structure function between the observed nodes
It can cope with non-linearities in the form A/(B+z^n) -hill functions- where z is a variable in the system

LDST

Input Requirements

Data input must fit this particular structural model:
Matrix A must be stable. That is, the eigenvalues of the state space vector <1.
The state of the system must be controllable
The state of the system must be observable
Doesnt require step inputs

Outputs

A network structure containing estimates for the parameters A, B, C, D, R, Q and the dimension of the hidden space k.
A directed acyclic graph obtained by relevant hypothesis estimation from the CB+D matrix describing gene-gene interactions.

Assumptions

These come from Beal:

In the LDS model they use, dynamics and output processes are assumed to be time invariang, which is not realistic.
The times at which data is sampled is not linearly spaced: There may be non-linearities in the transcriptional processes that are not studied.
May be missing time slices, cant cope with stationary data.
Noise dynamics are assumed to be Gaussian, but this is not necessarily the case.

Capabilities

It can cope strictly with linear systems, but not with non-linearities.
It is capable of inferring the dimensions of the hidden space. (right or wrong is a different story...)

User:Nuri Purswani/NetworkReconstruction/Algorithms

Contents

Synthetic Datasets

The State Space Representation

The Ring Network

Summary of datasets

Algorithms

Rangel et al 2004

Description

Input Requirements

Outputs

Assumptions

Capabilities

Limitations

Beal et al 2005

Stan, Gonçalves and Warnick 2008

Input Requirements

Dynamical Structure Reconstruction for LTI networks

Input Requirements

Outputs

Assumptions

Capabilities

LDST

Input Requirements

Outputs

Assumptions

Capabilities

Variational Bayesian

Inputs

Outputs

Assumptions

Capabilities

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools