User:Robert M. MacCallum/WTFGSB Reportback

From OpenWetWare

(Difference between revisions)
Jump to: navigation, search
(Day three, session one)
(Day three, session two)
Line 381: Line 381:
==Day three, session two==
==Day three, session two==
===Wolfgang Huber===
===Wolfgang Huber: ‘Detecting genetic interactions and multiparametric dynamic phenotypes in RNAi perturbation microscopy imaging assays’===
transcriptome characterisation in yeast
transcriptome characterisation in yeast
Line 414: Line 414:
no evidence for translation so far
no evidence for translation so far
===Felix Naef===
===Felix Naef: ‘Rhythmic protein-DNA interactomes and circadian transcription regulatory networks’===
understand gene expression programs under circadian control
understand gene expression programs under circadian control
Line 427: Line 427:
Not solved phase specifity problem yet.
Not solved phase specifity problem yet.
===Caroline Brorsson===
===Caroline Brorsson: ‘A Genome-Wide SNPxSNP Search for Epistasis Identifies Gene-Gene Interactions in Type 1 Diabetes’===
Type 1 diabetes
Type 1 diabetes
Line 440: Line 440:
===Mark McCarthy===
===Mark McCarthy: ‘The End of the Beginning: Genetic Success and the Long Road to Functional Inference’===

Revision as of 11:30, 7 December 2009


Welcome Trust Functional Genomics and Systems Biology Workshop

30 November to 1 December 2009

More details and programme

Day one

Edison Liu: ‘Integrative Study of Estrogen Receptor Biology in Human Cancer’

Estrogen (or is it EGF) receptor (ER) binding site analysis (ChIP and bioinf) - "Cosmic" score, correlation with RNA PolII binding and H3K4meX marks.

Some functional binding is 1Mb away from gene!! Only 9% in 5k "promoter".

Cool ChIA-PET (ChiA-seq) method to determine chromosomal loops.

Looping for efficient transcription, grouping of coregulated genes ("looped out" genes don't respond to ER)

===Johan Rung: ‘A multi-stage genome-wide association study detects a novel risk locus near IRS1 for type 2 diabetes, insulin resistance, and hyperinsulinemia’===

GWAS for type 2 diabetes

F Pradezynski: ‘Systems Level Approach of Hepatitis C Virus Infection’

Y2H between various virus proteomes and human proteins.

Many human pathways interfered with, in particular the ones you'd expect (interferon reponse)

Seems to be a remarkable number of targets (100s) from such a few viral proteins.

Chris Bakal: ‘Describing the Systems Architecture of Cell Morphogenesis’

Wounding, cell morphology, image analysis -> 100+ feature profile of cell's, morphology.

"canalised" morphology space (jumps between states)

Keith Baggerly: ‘The Importance Of Reproducibility In High-Throughput Biology: A Case Study’

Reproducibility in hi-thru biology

This was a fascinating story of a genuine attempt to reproduce a diagnostic/predictive approach using microarray data (for sensitivity to cancer drugs).

The data was in GEO but when analysed again, the gene lists, heat maps etc were completely different.

Eventually an "off by one" error was found, caused, equally, by pasting data from excel and the non-existence of documentation for the software (R package).

Later papers from the offending authors had further errors (mislabeled drugs, repeated figures from earlier work). Letters to the editor were responded with "we did it again and got the same results" (can you believe it!).

In the end, the study had led to clinical trials and so Baggerly and colleagues published a proper paper exposing the problems in a statistical journal. Soon after that the medical journals were on the case and the trials were stopped.

Nick Luscombe: ‘Nucleoporins, chromosomal organisation and gene regulation.’

Nuclear lamins known to tether transcriptionally inactive DNA

Nucleoporins now shown to be assoc with active gene expression.

Also through ChIP some proteins bind to enable X chromosome dosage compensation.

Mark Gerstein: ‘Understanding Protein Function on a Genome-scale using Networks’

A review of several years' network work. Including some Venter ocean sample sequence analysis (map to pathways, correlate with environmental factors with some canonical ..... method (is this like bi-clustering?))

Yoram Louzoun: 'Immunomic analysis of viruses CD8+ T cell epitope repertoire'

Not in programme.

Mentioned an epitope prediction approach called SIR (Size of Immune Repertoire score) which models MHC peptide binding.

Early viral proteins have less epitopes than late proteins.

Some scheduled speakers didn't speak in this session.

Day two, session one

Seth Grant: ‘System Biology of The Synapse and Behaviour’

Complexity of post-synaptic molecular machinery (several thousand proteins). Conserved in invertebrates (50% of prots) and single celled (25%). Evolution of the machinery (including plasticity) preceded evolution of synapses.

Very slow evolution.

Many diseases.

Caleb Webber: ‘Identifying CNV genes that contribute to developmental delay and autism’

CNV in mouse

What's special about pathological CNVs? (vs. benign)

Human CNVs look up mouse phenotypes (somehow!)


Florian Markowetz: ‘Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-depleted Murine Embryonic Stem Cells’

ES cell histone modifications

days 1 3 5 of ES development - 4 analyses

Protein MS ChIP-chip histone Rna pol II Microarrays

day 0 nanog TF downreg -> network of TFs

clustering of smoothed histone profiles (around TSS)

when mRNA upreg, small local acetylation around TSS when mRNA down, wider deacetylation around TSS.

increased correlation between H acet and gene expression through time (more at day 5 than day 1) genome-wide

predict gene expr from histone acetylation using LOTS of ML methods (in R)

Grant Belgard: ‘Transcriptome-Wide Functional Anatomy of Mouse Cortical Layering Revealed Through Deep Sequencing’

brain transcriptomics

by sequencing

6 layers of neocortex

many cell types spanning several layers

paired end 50bp reads

(you get some intronic reads)

some intergenic regions detected (a few percent of reads)

layer specific genes, various layers show various GO enrichments.

John Hogenesch: ‘A journey through the clock network’

Circadian clock genes through hi-thru func genomics. nice robot video.

siRNA screen (seems to be tunable to desired knockdown level)

clock pathway is robust - surprising lack of lethal knock outs

Day two, session two

Peter Hoen: ‘Functional Genomics as a Readout In Therapy Development’

(standing in for Gert-Jan van Ommen)

Duchenne muscular dystrophy

antisense therapy

Andrew Teschendorff: ‘Pathway-Centric Classification of Breast Cancer’

classification of breast cancer

Dan Geschwind: ‘Human-Specific Transcriptional Regulation of Cns Development Genes By Foxp2’

transcriptional regulation of CNS development genes by FoxP2

looked at human vs chimp regulation of genes (microarray) in a cell line.

many genes respond differently (up and down)

But why? The 2 AA diffs are not in known DNA binding domain

6 genes regulated via proximal promoter (luciferase reporter)

validated in vivo

haNCS human accelerated non coding sequences (look this up)

Horvath weighted gene co-expression network analysis. WGCNA

Recent paper showing two mitochondrial network types in neurons (synaptic and cell body)

Compare human vs chimp networks

Douglas Kell: ‘The cellular uptake of pharmaceutical drugs: a problem not of biophysics but of systems biology’

Suit and tie alert!

networks described in unambiguous fashion, SBML, ChEBI SMILES etc for small molecules.

uptake of drugs, via transporters (proteins).

Day two, session three

Genevieve Konopka: ‘Comparative Gene Expression in Primate Brain Using Nextgen Sequencing’

Language genes

Can't do multi-species (human, chimp, macaque) on a human affy chip.

Next gen sequencing! Four brain regions.

"Sequencing wins"

Networks from WGCNA

Tom Freeman: ‘Identification of Expression Networks in Immunity’

Networks in immunity

focus: macrophage

mentioned proteasome (did I see that on map wrt immunity?)

graphical markup for pathways

some kind of flow simulations through them

biolayout express software - looks good (has enrichment analysis built in)

Day two, session four

Frank Holstege: ‘Understanding regulatory circuitry through expression-profile phenotypes’


1200 regulatory components, TFs, kinases, ch remodelers, RNA processing -> mutations and expression microarrays

GASSCO dye correction algorithm (two colour!)

done so far deletome

some kinases have no diff expr, is it because they are inactive in standard conditions or is it because of redundancy?

The use some synthetic genetic interaction prediction to choose pairs

find signals!

some kinases redundant with phosphatase! it's cross talk between two pathways (somehow).

different types of redundancy:

  1. complete
  2. quantitative (double has more effect than single(s))
  3. incongruent (effects in single are not in double)

also used the data for protein complex prediction

Stefan Weimann: ‘Modeling and Experimental Testing of Cell Cycle Regulation by the Erbb- Protein and Mirna Network in Breast Cancer’

new targets for drug resistant breast cancer

ErbB signalling network

the drug is an ErbB2 antibody

Louis Serrano: ‘Systems Biology of a Small Bacterium’

Mycoplasma pneumoniae

689 ORFs + 44 RNAs

free living

maybe only 10-11 TFs (E. coli 100 or so)

full complement of chromatin remodelling

plan was to do loads of -omics + electron microscopy

transcriptomics: arrays 62 conditions, tiling array

detailed look at transcripts (reverse strand ncRNA, no idea of mechanism) multiple TSSs

where you have operons encoding 4 genes, you don't just see mRNA of all four, you get different levels of each gene, somehow...

same SOS response as subtilis, but without the TFs! very interesting

plenty of regulatory complexity

metabolome: KEGG didn't work out, had to do lots of manual work to build metabolic map. defined minimal medium.

know reactions are there, but 10-12 enzymes are not known

200 molecules per protein per cell

so small that you're "living in a stochastic world" - each reaction is like rolling a dice, how does it survive is an interesting question.

Day three, session one

Jurg Bahler: ‘Differential marking of intronic and exonic DNA regions with respect to RNA polymerase II occupancy, histone density, and H3K36me3 MODIFICATION patterns’

Pre-post splicing levels measured with RNA seq.

Splicing efficiency regulated

Co-transcriptional splicing. Look for relationship between splicing and chromatin - H3K36me3 lower in introns.

Measured transcript levels after transcription blocking compound - measure decay, however drugs have side-effects. Better to measure PolII occupancy and RNA abundance and estimate decay with a formula.

Looked at response to oxidative stress. See patterns of expression with stable transcription.

Jaak Vilo: ‘Network reconstruction and mining of high-throughput data’

Network reconstruction. Analysis tools. GraphWeb NAR db issue.

"MEM" query similar expression in multiple datasets. You could put everything in together (like VB expr maps) but there could be crap data included. Instead they do a post-analysis of separate queries (one per dataset) using ranks. P-value for enrichment of low ranks. "Low" depends on a rank threshold - try all and find lowest p-value.

"multi-experiment matrix"

web tool may have anopheles affy data (they get it from ArrayExpress)

really nice annotation cloud mouse-over!

Adler et al Genome Biology 2009 in press

Annelies Fieuw: ‘Integrative analysis of coding and non-coding gene expression and copy numbers in neuroblastoma’


look for more genes implicated in pathology

looked for correlated m(i)RNA expression and genome copy number

Geoffrey Faulkner: ‘Transposed Elements are Massively Transcribed in Mammalian Cells’


1/2 human genome. only 100 mobile though. mostly Alu SINEs and L1 LINEs. something about neurons recently in Nature

plenty more immobile

most transposon insertions are incomplete and disrupt genes.

CAGE - 25bp 5' tags, somehow find TE promoters through sequencing.

the TEs could provide alternative TSSs 100,000 possible 700 validated with ESTs about half validated with another sequencing run

see correlated expression between TE and nearby gene

likely to be positive regulators

RNAi against TE transcripts - have phenotypes (myoblast morphology)


Eileen Furlong: ’Making global predictions of cis-regulatory activity’

need map of cis-regulatory elements and their inputs

lots of chip-chip through development.

usually two antibodies for each TF (for consensus) and strict FDR

>19k peaks!

look for combinatorial binding

they've found 8000 CRMs (for mesoderm development) 2000 target genes

further expts determine +ve or -ve effect of CRMs

recommends Nature Genetics v36 2006, Reinitz group - models of TF neworks for eve stripe 2 enhancer

80% of literature CRMs are in the chip set ("atlas")

predict 5 different expression pattern from binding signatures with SVM, predict expression of chip CRMs and I guess validate experimentally, 4 classes 80% validated.

One combinatorial TF binding code does not have one output (from their data)

Day three, session two

Wolfgang Huber: ‘Detecting genetic interactions and multiparametric dynamic phenotypes in RNAi perturbation microscopy imaging assays’

transcriptome characterisation in yeast

nucleosome depleted regions are shared in bidirectional promoters.

in yeast, antisense transcripts interfere with sense, also interferes with H modification (somehow) and also activating transcription

48 (wild?) strains and transcriptomics ((look for association with SNPs))

genes with antisense xscripts have more diff regulation (across strains, env conditions, species)

genes with antisense are more often OFF

anticorrelation of sense/antisense xscripts

is it "opportunistic" transcription? (pol finds open chromatin and xscribes) (Stuhl review?)

more TF binding sites for coding than antisense

strain data QTL for expression

  • coding: 75% distal effects
  • antisense: 50/50 local/distal (not sure of interpretation)

transcripts usually extend 100bp into the promoter of the opposite transcript. this is necessary for anti-correlated xscript levels

TATA also contributes to regulated expression

"likely true for higher euks"

no evidence for translation so far

Felix Naef: ‘Rhythmic protein-DNA interactomes and circadian transcription regulatory networks’

understand gene expression programs under circadian control (e.g. genes expressed in heart at 10am, same kind of thing in liver)

known "E box" element (driven by TF BMAL1?) driven by circadian system, but it can't drive all downstream effects because of their timing - is there another element?

something special about tandem arrangement of E-box element. CLOCK/BMAL1 heterodimer binding

Bussemaker 2001 cis-reg network algorithm?

Not solved phase specifity problem yet.

Caroline Brorsson: ‘A Genome-Wide SNPxSNP Search for Epistasis Identifies Gene-Gene Interactions in Type 1 Diabetes’

Type 1 diabetes

12000 cases, 13000 controls -> 40 regions associated with T1D

mostly immune genes

small effects of each variant

looking for epistasis in GWAS

Mark McCarthy: ‘The End of the Beginning: Genetic Success and the Long Road to Functional Inference’


plenty of loci found now (~20?) but no decent prediction, AUC with variants only = 0.6, with BMI+Age=0.78! explains around 5% of predisposition

agilent array for CNVs, none found for T2D or some other diseases.

look for rarer (1%) variations...

Day three, session three

Matthias Uhlen

goal: antibodies against all proteins

very important resource, look it up.

have got an epitope predictor - training data available

working towards a subcellular "index" for all proteins.

first draft proteome by 2014

long term goal, paired antibodies (shouldn't rely on one!)

all Abs available through sigma.

antibody specificity; proteins 10^7 dynamic range in cells; paired antibodies against different epitopes make it more specific

looking into FRET paired antibodies - seems to work.

don't have paired for the majority

some kind of validation with commercial abs, 40% work nicely??? wiki based community based validation of antibodies.

systems level analysis:

how many proteins are tissue specific (one cell type) < 2%

or at a larger level (say, brain) = 10%

some new tissue specific genes found though.

how many prots in a cell? ~65% of all

brain had fewest

levels of proteins do distinguish cell types

antibody "array" using beads, multiplexed 384 abs x 384 samples in one run!

used on blood plasma - personalised medicine and biomarkers

going to do 20 diseases (400 patients per disease) biobanks

George Koumbaris

breakpoint analysis of X-chromosome disorders

J-B Cazier

metabolome from fat, plasma, urine, etc with NMR spectra (40,000 points)

  1. through time: (no liver)
  2. multiple animals
  3. treatments
  4. species


300 SNPs

found a locus (on chr14) and a metabolite (benzoate, a gut microbial metabolite)

Gavin Sherlock

yeast evolution (in the lab)

count and isolate clones in a population.

R Y and G dies in population

equal proportions, glucose limitation (for selection)

haploid, no sex, several 100 generations (I think)

observe "clonal interference"

isolate clones (facs and then split into 7 then fitness measure then solexa sequencing)

see mutants and amplifications (saw hexose transporter amplic)

clones arising later have more mutations

each of 5 lineages were distinct (no shared mutations) even if same colour

some mutations may be hitchhikers (not adaptive) ; some kind of sex under selection somehow lets you figure out which mutations are adaptive

Day three, session four

Steve Oliver

more yeast evolution. hemizygous mutants in competition with each other. most genes in one copy give happy yeast, some show haploid insufficiency. some others are haploid proficient.

these are "high flux" genes

after whole genome duplication, they are more likely to stay in two copies.

more conserved

Chris Pacheco


Alvis Brazma

9000 raw data files from Affy U133A in GEO and ArrayExpress

After QC, 5372 samples remained (206 studies, 163 labs -> 369 conditions)

only about 25% "normal"

18000 genes

PCA: main component blood vs rest, second axis malignancy

cell lines are very different

3rd is tissue of origin

first 3 components explain 37% of the variability

also a MDS (like Tom Freeman's graphs)

h clustering

6 main classes: brain, muscle, x, y, z, q (beyond this, the signal is weak - maybe lab effects)

take a leukaemia cluster, figure out genes

also introduced gene expression atlas at ebi

Personal tools