Moore Notes 12 1 10
From OpenWetWare
Jump to navigationJump to search
Group Call
- PhylOTU paper accepted at PLoS Comp Bio
- Tom is handling proofs
- Protein Db/AMPHORA 2
- Guillaume is parsing results
- Pipeline is fairly stable
- Testing on GOS data (for Steve's paper) - new code, but old gene list still
- Parallelized tree building program
- Working with Aaron on a short read strategy
- No unassembled reads are getting picked up by BLAST search with 25 representatives for each marker
- With Illumina data
- New marker list
- Tom: it is possible that those markers are not present in the data set - check with Aaron
- Sam: start with conditions where you do recover the hits, then back off
- Assembly with Velvet, then run assembled contigs through AMPHORA
- Niche mapping
- Tom's analysis of correlation between 1/2 degree remote data and observed metadata at sampling sites plots
- Unit scale is not identical for all pairs (hopefully they are proportional)
- If either data set had a NULL value, data point was thrown out
- Shorelines had a lot of NULL values
- Silicate had a lot of NULL values (Josh: should be similar to phosphate - check)
- All correlations are significant and positive (except silicate)
- Temperature and salinity are the strongest
- Time of sampling is important
- Positive correlations suggest that averages are robust
- Should check if variances are correlated too
- James: Local and long term average are probably both important for determining community structure
- Did filter MICROBIS for <150m
- Many unexpected data points in MICROBIS, e.g., very high temperatures might be a units problem
- Josh added variability of environmental variables as layers in maxent
- Variance has been shown to impact richness
- Obtained remote variability data on temperature and salinity
- Other predictors have variability for some locations, but not good coverage of ocean
- Re-ran all analyses
- Maps are not changed much
- Stronger evidence for high richness in regions with mixing and boundary currents
- Low diversity off W. Africa and Arabia are still there - these are likely due to low sampling in those environments
- Verifying richness predictions using locations with a lot of sampling (versus Chao) plots
- Maxent and Chao are similar for order, class, and phylum. Not great for family
- Chao is generally below maxent
- Makes sense since Chao is unbiased for the lower bound of richness
- One outlier point is probably the same sample in all four plots - check
- Sites with many reads are right on the correlation line
- Chao is close to real richness at these locations, but is off when there are not many reads (expected)
- Previously wanted to use family (based on RDP analysis), but order might be better
- Can test correlation statistically
- Use a weighted regression to account for different sampling depths
- Could throw out samples with too few reads for Chao to be accurate (but this will probably depend on true richness)
- Tom's analysis of correlation between 1/2 degree remote data and observed metadata at sampling sites plots