Moore Notes 12 1 10

Group Call

PhylOTU paper accepted at PLoS Comp Bio
- Tom is handling proofs

Protein Db/AMPHORA 2
- Guillaume is parsing results
- Pipeline is fairly stable
  - Testing on GOS data (for Steve's paper) - new code, but old gene list still
  - Parallelized tree building program
- Working with Aaron on a short read strategy
  - No unassembled reads are getting picked up by BLAST search with 25 representatives for each marker
  - With Illumina data
  - New marker list
  - Tom: it is possible that those markers are not present in the data set - check with Aaron
  - Sam: start with conditions where you do recover the hits, then back off
  - Assembly with Velvet, then run assembled contigs through AMPHORA

Niche mapping
- Tom's analysis of correlation between 1/2 degree remote data and observed metadata at sampling sites plots
  - Unit scale is not identical for all pairs (hopefully they are proportional)
  - If either data set had a NULL value, data point was thrown out
    - Shorelines had a lot of NULL values
    - Silicate had a lot of NULL values (Josh: should be similar to phosphate - check)
  - All correlations are significant and positive (except silicate)
    - Temperature and salinity are the strongest
  - Time of sampling is important
    - Positive correlations suggest that averages are robust
    - Should check if variances are correlated too
    - James: Local and long term average are probably both important for determining community structure
  - Did filter MICROBIS for <150m
  - Many unexpected data points in MICROBIS, e.g., very high temperatures might be a units problem
- Josh added variability of environmental variables as layers in maxent
  - Variance has been shown to impact richness
  - Obtained remote variability data on temperature and salinity
    - Other predictors have variability for some locations, but not good coverage of ocean
  - Re-ran all analyses
    - Maps are not changed much
    - Stronger evidence for high richness in regions with mixing and boundary currents
    - Low diversity off W. Africa and Arabia are still there - these are likely due to low sampling in those environments
- Verifying richness predictions using locations with a lot of sampling (versus Chao) plots
  - Maxent and Chao are similar for order, class, and phylum. Not great for family
  - Chao is generally below maxent
    - Makes sense since Chao is unbiased for the lower bound of richness
  - One outlier point is probably the same sample in all four plots - check
  - Sites with many reads are right on the correlation line
    - Chao is close to real richness at these locations, but is off when there are not many reads (expected)
  - Previously wanted to use family (based on RDP analysis), but order might be better
  - Can test correlation statistically
    - Use a weighted regression to account for different sampling depths
    - Could throw out samples with too few reads for Chao to be accurate (but this will probably depend on true richness)

Moore Notes 12 1 10

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools