Moore Notes 11 17 10 PI

Group Call

PhylOTU manuscript is resubmitted

Niche mapping figures
- Range maps are really just for abundant organisms
- Hard limit of occurring at 10 sites in the world to do niche mapping
  - What global abundance does this correspond to? can we quantify?
- Worse for microbes, but happens with other organisms too
- Would be interesting to run on OTUs, but there are a lot of issues with that
- Omission rates figure
  - only pick up taxa that are "closely" related to things we've seen before (50% RDP cutoff)
  - Wouldn't be a problem if the rare taxa / endemics scale spatial with total richness (not likely to be true - isn't for macroorganisms)
  - Main issue is whether the patterns we describe may be biased by only using common taxa
  - Focus on order or family most likely
    - But what can we say about the range of a taxa at this high level of taxonomy?
    - Higher level (order) for summary/meta-maps vs. finer level for maps of individual taxa
    - Scale of the map is important too
- There is literature on how biogeographic patterns change at different levels of taxonomic resolution (e.g., species/genus ratios)
  - Not just abundant organisms
  - Could we make quantitative comparisons with marine macro organisms?
    - Census of marine life data http://www.coml.org/ with data here: http://www.iobis.org/
    - Tittensor paper
    - Aquamaps.org
    - Qualitative comparisons might be sufficient
- Will use an RDP bootstrap cutoff of 50%
  - 99% of sequences are v6 pyrotags, and this is a good cutoff for them
  - agrees well with Greengenes
- Guillaume is error checking the code
- Writing up paper

AMPHORA-2
- BLAST step to pull out sequences in a family
  - HMMR3 is fast enough to use the profile HMM, but is less sensitive then
  - Currently using MCL clustering to identify groups within list of sequences and use one sequence per group
  - Other ideas:
    - Build tree and pick representatives using maxPD
      - Can over sample clades with long branches
    - Emit sequences from HMM
      - Do the simulated sequences represent the family well? (since sequences are "averages", not real sequences)
    - Kimmen's software should do this, but is geared towards clustering at a higher level (subfamilies within a superfamily)
      - May have some of the same problems as maxPD
    - Prune tree from inside out - need an algorithm
    - Would node imbalance statistic be useful here?
    - Build a simple, fast feature-based classifier
  - Goal: to pick sequences that are optimal for detecting different phylogenetic lineages within a family
- Focus on input and outputs for each module, so that other methods can be swapped in at any step
  - Build a modular workflow that can be easily extended
- Pplacer is running to build trees
- Next
  - More families
  - Better modules
- Steve can be a tester when the time comes

Moore Notes 11 17 10 PI

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools