User:Robert M. MacCallum/WTFGSB Reportback
|Line 543:||Line 543:|
Revision as of 13:36, 2 December 2009
Welcome Trust Functional Genomics and Systems Biology Workshop
30 November to 1 December 2009
Estrogen (or is it EGF) receptor (ER) binding site analysis (ChIP and bioinf) - "Cosmic" score, correlation with RNA PolII binding and H3K4meX marks.
Some functional binding is 1Mb away from gene!! Only 9% in 5k "promoter".
Cool ChIA-PET (ChiA-seq) method to determine chromosomal loops.
Looping for efficient transcription, grouping of coregulated genes ("looped out" genes don't respond to ER)
GWAS for type 2 diabetes
Y2H between various virus proteomes and human proteins.
Many human pathways interfered with, in particular the ones you'd expect (interferon reponse)
Seems to be a remarkable number of targets (100s) from such a few viral proteins.
Wounding, cell morphology, image analysis -> 100+ feature profile of cell's, morphology.
"canalised" morphology space (jumps between states)
Reproducibility in hi-thru biology
More to come on this
Nuclear lamins known to tether transcriptionally inactive DNA
Nucleoporins now shown to be assoc with active gene expression.
Also through ChIP some proteins bind to enable X chromosome dosage compensation.
A review of several years' network work. Including some Venter ocean sample sequence analysis (map to pathways, correlate with environmental factors with some canonical ..... method (is this like bi-clustering?))
Day two, session one
Complexity of post-synaptic molecular machinery (several thousand proteins). Conserved in invertebrates (50% of prots) and single celled (25%). Evolution of the machinery (including plasticity) preceded evolution of synapses.
Very slow evolution.
CNV in mouse
What's special about pathological CNVs? (vs. benign)
Human CNVs look up mouse phenotypes (somehow!)
ES cell histone modifications
days 1 3 5 of ES development - 4 analyses
Protein MS ChIP-chip histone Rna pol II Microarrays
day 0 nanog TF downreg -> network of TFs
clustering of smoothed histone profiles (around TSS)
when mRNA upreg, small local acetylation around TSS when mRNA down, wider deacetylation around TSS.
increased correlation between H acet and gene expression through time (more at day 5 than day 1) genome-wide
predict gene expr from histone acetylation using LOTS of ML methods (in R)
6 layers of neocortex
many cell types spanning several layers
paired end 50bp reads
(you get some intronic reads)
some intergenic regions detected (a few percent of reads)
layer specific genes, various layers show various GO enrichments.
Circadian clock genes through hi-thru func genomics. nice robot video.
siRNA screen (seems to be tunable to desired knockdown level)
clock pathway is robust - surprising lack of lethal knock outs
Day two, session two
(standing in for Gert-Jan van Ommen)
Duchenne muscular dystrophy
classification of breast cancer
transcriptional regulation of CNS development genes by FoxP2
looked at human vs chimp regulation of genes (microarray) in a cell line.
many genes respond differently (up and down)
But why? The 2 AA diffs are not in known DNA binding domain
6 genes regulated via proximal promoter (luciferase reporter)
validated in vivo
haNCS human accelerated non coding sequences (look this up)
Horvath weighted gene co-expression network analysis. WGCNA
Recent paper showing two mitochondrial network types in neurons (synaptic and cell body)
Compare human vs chimp networks
Suit and tie alert!
networks described in unambiguous fashion, SBML, ChEBI SMILES etc for small molecules.
uptake of drugs, via transporters (proteins).
Day two, session three
Can't do multi-species (human, chimp, macaque) on a human affy chip.
Next gen sequencing! Four brain regions.
Networks from WGCNA
Networks in immunity
mentioned proteasome (did I see that on map wrt immunity?)
graphical markup for pathways
some kind of flow simulations through them
biolayout express software - looks good (has enrichment analysis built in)
Day two, session four
1200 regulatory components, TFs, kinases, ch remodelers, RNA processing -> mutations and expression microarrays
GASSCO dye correction algorithm (two colour!)
done so far deletome
some kinases have no diff expr, is it because they are inactive in standard conditions or is it because of redundancy?
The use some synthetic genetic interaction prediction to choose pairs
some kinases redundant with phosphatase! it's cross talk between two pathways (somehow).
different types of redundancy:
- quantitative (double has more effect than single(s))
- incongruent (effects in single are not in double)
also used the data for protein complex prediction
new targets for drug resistant breast cancer
ErbB signalling network
the drug is an ErbB2 antibody
689 ORFs + 44 RNAs
maybe only 10-11 TFs (E. coli 100 or so)
full complement of chromatin remodelling
plan was to do loads of -omics + electron microscopy
transcriptomics: arrays 62 conditions, tiling array
detailed look at transcripts (reverse strand ncRNA, no idea of mechanism) multiple TSSs
where you have operons encoding 4 genes, you don't just see mRNA of all four, you get different levels of each gene, somehow...
same SOS response as subtilis, but without the TFs! very interesting
plenty of regulatory complexity
metabolome: KEGG didn't work out, had to do lots of manual work to build metabolic map. defined minimal medium.
know reactions are there, but 10-12 enzymes are not known
200 molecules per protein per cell
so small that you're "living in a stochastic world" - each reaction is like rolling a dice, how does it survive is an interesting question.
Day three, session one
Pre-post splicing levels measured with RNA seq.
Splicing efficiency regulated
Co-transcriptional splicing. Look for relationship between splicing and chromatin - H3K36me3 lower in introns.
Measured transcript levels after transcription blocking compound - measure decay, however drugs have side-effects. Better to measure PolII occupancy and RNA abundance and estimate decay with a formula.
Looked at response to oxidative stress. See patterns of expression with stable transcription.
Network reconstruction. Analysis tools. GraphWeb NAR db issue.
"MEM" query similar expression in multiple datasets. You could put everything in together (like VB expr maps) but there could be crap data included. Instead they do a post-analysis of separate queries (one per dataset) using ranks. P-value for enrichment of low ranks. "Low" depends on a rank threshold - try all and find lowest p-value.
web tool may have anopheles affy data (they get it from ArrayExpress)
really nice annotation cloud mouse-over!
Adler et al Genome Biology 2009 in press
look for more genes implicated in pathology
looked for correlated m(i)RNA expression and genome copy number
1/2 human genome. only 100 mobile though. mostly Alu SINEs and L1 LINEs. something about neurons recently in Nature
plenty more immobile
most transposon insertions are incomplete and disrupt genes.
CAGE - 25bp 5' tags, somehow find TE promoters through sequencing.
the TEs could provide alternative TSSs 100,000 possible 700 valid with ESTs about half validated with another sequencing run
see correlated expression between TE and nearby gene
likely to be positive regulators
RNAi against TE transcripts - have phenotypes (myoblast morphology)
need map of cis-regulatory elements and their inputs
lots of chip-chip through development.
usually two antibodies for each TF (for consensus) and strict FDR
look for combinatorial binding
they've found 8000 CRMs (for mesoderm development) 2000 target genes
further expts determine +ve or -ve effect of CRMs
recommends Nature Genetics v36 2006, Reinitz group - models of TF neworks for eve stripe 2 enhancer
80% of literature CRMs are in the chip set ("atlas")
predict 5 different expression pattern from binding signatures with SVM, predict expression of chip CRMs and I guess validate experimentally, 4 classes 80% validated.
One combinatorial TF binding code does not have one output (from their data)
Day three, session two
transcriptome characterisation in yeast
nucleosome depleted regions are shared in bidirectional promoters.
in yeast, antisense transcripts interfere with sense, also interferes with H modification (somehow) and also activating transcription
48 (wild?) strains and transcriptomics ((look for association with SNPs))
genes with antisense xscripts have more diff regulation (across strains, env conditions, species)
genes with antisense are more often OFF
anticorrelation of sense/antisense xscripts
is it "opportunistic" transcription? (pol finds open chromatin and xscribes) (Stuhl review?)
more TF binding sites for coding than antisense
strain data QTL for expression
- coding: 75% distal effects
- antisense: 50/50 local/distal (not sure of interpretation)
transcripts usually extend 100bp into the promoter of the opposite transcript. this is necessary for anti-correlated xscript levels
TATA also contributes to regulated expression
"likely true for higher euks"
no evidence for translation so far
understand gene expression programs under circadian control (e.g. genes expressed in heart at 10am, same kind of thing in liver)
known "E box" element (driven by TF BMAL1?) driven by circadian system, but it can't drive all downstream effects because of their timing - is there another element?
something special about tandem arrangement of E-box element. CLOCK/BMAL1 heterodimer binding
Bussemaker 2001 cis-reg network algorithm?
Not solved phase specifity problem yet.
Type 1 diabetes
12000 cases, 13000 controls -> 40 regions associated with T1D
mostly immune genes
small effects of each variant
looking for epistasis in GWAS
plenty of loci found now (~20?) but no decent prediction, AUC with variants only = 0.6, with BMI+Age=0.78! explains around 5% of predisposition
agilent array for CNVs, none found for T2D or some other diseases.
look for rarer (1%) variations...
Day three, session three
goal: antibodies against all proteins http://proteinatlas.org
very important resource, look it up.
have got an epitope predictor - training data available
working towards a subcellular "index" for all proteins.
first draft proteome by 2014
long term goal, paired antibodies (shouldn't rely on one!)
all Abs available through sigma.
antibody specificity; proteins 10^7 dynamic range in cells; paired antibodies against different epitopes make it more specific
looking into FRET paired antibodies - seems to work.
don't have paired for the majority
some kind of validation with commercial abs, 40% work nicely??? wiki based community based validation of antibodies.
systems level analysis:
how many proteins are tissue specific (one cell type) < 2%
or at a larger level (say, brain) = 10%
some new tissue specific genes found though.
how many prots in a cell? ~65% of all
brain had fewest
levels of proteins do distinguish cell types
antibody "array" using beads, multiplexed 384 abs x 384 samples in one run!
used on blood plasma - personalised medicine and biomarkers
going to do 20 diseases (400 patients per disease) biobanks
breakpoint analysis of X-chromosome disorders
metabolome from fat, plasma, urine, etc with NMR spectra (40,000 points)
- through time: (no liver)
- multiple animals
found a locus (on chr14) and a metabolite (benzoate, a gut microbial metabolite)
yeast evolution (in the lab)
count and isolate clones in a population.
R Y and G dies in population
equal proportions, glucose limitation (for selection)
haploid, no sex, several 100 generations (I think)
observe "clonal interference"
isolate clones (facs and then split into 7 then fitness measure then solexa sequencing)
see mutants and amplifications (saw hexose transporter amplic)
clones arising later have more mutations
each of 5 lineages were distinct (no shared mutations) even if same colour
some mutations may be hitchhikers (not adaptive) ; some kind of sex under selection somehow lets you figure out which mutations are adaptive
Day three, session four
more yeast evolution. hemizygous mutants in competition with each other. most genes in one copy give happy yeast, some show haploid insufficiency. some others are haploid proficient.
these are "high flux" genes
after whole genome duplication, they are more likely to stay in two copies.
9000 raw data files from Affy U133A in GEO and ArrayExpress
After QC, 5372 samples remained (206 studies, 163 labs -> 369 conditions)
only about 25% "normal"
PCA: main component blood vs rest, second axis malignancy
cell lines are very different
3rd is tissue of origin
first 3 components explain 37% of the variability
also a MDS (like Tom Freeman's graphs)
6 main classes: brain, muscle, x, y, z, q (beyond this, the signal is weak - maybe lab effects)
take a leukaemia cluster, figure out genes
also introduced gene expression atlas at ebi