Moore Notes 12 1 10

From OpenWetWare
Jump to navigationJump to search

Group Call

  • PhylOTU paper accepted at PLoS Comp Bio
    • Tom is handling proofs
  • Protein Db/AMPHORA 2
    • Guillaume is parsing results
    • Pipeline is fairly stable
      • Testing on GOS data (for Steve's paper) - new code, but old gene list still
      • Parallelized tree building program
    • Working with Aaron on a short read strategy
      • No unassembled reads are getting picked up by BLAST search with 25 representatives for each marker
      • With Illumina data
      • New marker list
      • Tom: it is possible that those markers are not present in the data set - check with Aaron
      • Sam: start with conditions where you do recover the hits, then back off
      • Assembly with Velvet, then run assembled contigs through AMPHORA
  • Niche mapping
    • Tom's analysis of correlation between 1/2 degree remote data and observed metadata at sampling sites plots
      • Unit scale is not identical for all pairs (hopefully they are proportional)
      • If either data set had a NULL value, data point was thrown out
        • Shorelines had a lot of NULL values
        • Silicate had a lot of NULL values (Josh: should be similar to phosphate - check)
      • All correlations are significant and positive (except silicate)
        • Temperature and salinity are the strongest
      • Time of sampling is important
        • Positive correlations suggest that averages are robust
        • Should check if variances are correlated too
        • James: Local and long term average are probably both important for determining community structure
      • Did filter MICROBIS for <150m
      • Many unexpected data points in MICROBIS, e.g., very high temperatures might be a units problem
    • Josh added variability of environmental variables as layers in maxent
      • Variance has been shown to impact richness
      • Obtained remote variability data on temperature and salinity
        • Other predictors have variability for some locations, but not good coverage of ocean
      • Re-ran all analyses
        • Maps are not changed much
        • Stronger evidence for high richness in regions with mixing and boundary currents
        • Low diversity off W. Africa and Arabia are still there - these are likely due to low sampling in those environments
    • Verifying richness predictions using locations with a lot of sampling (versus Chao) plots
      • Maxent and Chao are similar for order, class, and phylum. Not great for family
      • Chao is generally below maxent
        • Makes sense since Chao is unbiased for the lower bound of richness
      • One outlier point is probably the same sample in all four plots - check
      • Sites with many reads are right on the correlation line
        • Chao is close to real richness at these locations, but is off when there are not many reads (expected)
      • Previously wanted to use family (based on RDP analysis), but order might be better
      • Can test correlation statistically
        • Use a weighted regression to account for different sampling depths
        • Could throw out samples with too few reads for Chao to be accurate (but this will probably depend on true richness)