Moore Notes 11 17 10 PI
From OpenWetWare
Jump to navigationJump to search
Group Call
- PhylOTU manuscript is resubmitted
- Niche mapping figures
- Range maps are really just for abundant organisms
- Hard limit of occurring at 10 sites in the world to do niche mapping
- What global abundance does this correspond to? can we quantify?
- Worse for microbes, but happens with other organisms too
- Would be interesting to run on OTUs, but there are a lot of issues with that
- Omission rates figure
- only pick up taxa that are "closely" related to things we've seen before (50% RDP cutoff)
- Wouldn't be a problem if the rare taxa / endemics scale spatial with total richness (not likely to be true - isn't for macroorganisms)
- Main issue is whether the patterns we describe may be biased by only using common taxa
- Focus on order or family most likely
- But what can we say about the range of a taxa at this high level of taxonomy?
- Higher level (order) for summary/meta-maps vs. finer level for maps of individual taxa
- Scale of the map is important too
- There is literature on how biogeographic patterns change at different levels of taxonomic resolution (e.g., species/genus ratios)
- Not just abundant organisms
- Could we make quantitative comparisons with marine macro organisms?
- Census of marine life data http://www.coml.org/ with data here: http://www.iobis.org/
- Tittensor paper
- Aquamaps.org
- Qualitative comparisons might be sufficient
- Will use an RDP bootstrap cutoff of 50%
- 99% of sequences are v6 pyrotags, and this is a good cutoff for them
- agrees well with Greengenes
- Guillaume is error checking the code
- Writing up paper
- AMPHORA-2
- BLAST step to pull out sequences in a family
- HMMR3 is fast enough to use the profile HMM, but is less sensitive then
- Currently using MCL clustering to identify groups within list of sequences and use one sequence per group
- Other ideas:
- Build tree and pick representatives using maxPD
- Can over sample clades with long branches
- Emit sequences from HMM
- Do the simulated sequences represent the family well? (since sequences are "averages", not real sequences)
- Kimmen's software should do this, but is geared towards clustering at a higher level (subfamilies within a superfamily)
- May have some of the same problems as maxPD
- Prune tree from inside out - need an algorithm
- Would node imbalance statistic be useful here?
- Build a simple, fast feature-based classifier
- Build tree and pick representatives using maxPD
- Goal: to pick sequences that are optimal for detecting different phylogenetic lineages within a family
- Focus on input and outputs for each module, so that other methods can be swapped in at any step
- Build a modular workflow that can be easily extended
- Pplacer is running to build trees
- Next
- More families
- Better modules
- Steve can be a tester when the time comes
- BLAST step to pull out sequences in a family