Dec09 EisenNotes Eisen Notes

From OpenWetWare
Jump to navigationJump to search

Day 2

Tom

  • COuld do RNA families
  • Some question as to which clustering methods to use to build families b/c dongying, morgan and others not doing things all the same
  • Sam says might want to use a phylgoeny driven clustering method
  • See MCL paper here
  • Sam: could flag things that hit a superfamily but not any of the known subfamilies

Database needs

Morgan novelty

Biotorrents

Josh 2 Microbial Ranges

  • Using distance decay relationships to measure microbial ranges
  • Eisen notes - also see some stuff of "Field Guides to Microbes]
  • Josh says need random sampling of space
  • Eisen says need random sampling of niche

Null models

  • Can randomize in lots of different ways
  • For tree ranomdization can have all the tips be all reads or can collapse into OTUs
  • Katie had some good comments about WHY do various null models - fixing one variable, or fixing both ...
  • General benefits of a null model gives you default to test against
  • Josh suggests that having an explicit alternative model to test against is very powerful
  • Eisen says it would be good to have someone write a review about these issues int eh context of metagenomics
  • Josh proposes a Likelihood ratio test for assessing models
  • Can build a model where one assumes sampling from all taxa is independent but that is probably not the case
  • Can we predict what organisms are missing? Are there parts of the tree we have not sampled

Sam simulations

  • Multiple things being varied
  • Using AMPHORA as a backbone to some components of the simulations
  • METASIM used as part of the simulation
    • POssibly may be doing some weird things w/ which regions of the gene are covered but may not matter much
    • Comparing trees w/ fasttree but not yet using bootstraps
    • Many ways to compare trees
    • Unweighted
      • Robinson-Foulds (partition) metric
      • Path difference (nodal distance) metric
      • Disagreement metric
    • Weighted
    • Absolute or normalized

What do we need to know about protein families?

  • Phylogenetic informativeness
    • Taxa ID
    • PD calculation
  • Relative abundance informativeness
    • Evenness in copy # is key
  • OTU identification value
  • Joshs suggests a statistical model of how well we have sampled genomes and how likely they are to predict future data
    • can do this by taxonomy
    • or ecology
  • Predictability is important not just "evenness" for example
  • Aaron suggests integrating gene and species tree
  • Do all parts of a gene give the same answer?

James, Microbial Diversity

  • Spatially structured communtiy assemb;y
    • DIstance decay
    • Taxa area
    • Do the same rules apply at small and large scales?
    • Relative abundance is important in the model in terms of sampling individuals
  • Some sources of noise
    • Doing too wide a breadth of taxa at once
    • Doing too many types of environments at once

Phylogenetic diversity

OTU Discussion

  • INtroduction
    • What is an OTU?
    • Mostly looked at w/ ss-rRNA (16s bacteria, 18s eukaryotes)
    • Problems w/ PCR
    • TOm shows outline of new pipeline