Moore Notes 1 21 15

From OpenWetWare
Jump to navigationJump to search

Discussion of TARA Oceans data

  • Participants: Katie, Josh, Tom, Guillaume, Stephen
  • Stephen:
    • Data embargo issue
    • Updated summary (slides)
  • Analysis discussion:
    • What do we want to do with the data?
      • Start with aims of proposal
    • How to preprocess?
      • They will likely release EGGNOG abundances
      • They may map reads to assemblies (gene catalog)
      • Do we need something more/different?
        • Database
        • Classification thresholds
        • AGS normalization
    • diamond vs. rapsearch2
      • Do a quick comparison (correlation) of bit scores
        • If highly correlated, can use previously identified thresholds
    • Many (667) samples to process
      • Prioritize the prokaryote size fraction, then protists, then viruses
      • Prioritize open ocean (all?), surface waters (approximately 216 samples)
      • Start with metagenomes
    • Josh will look at ecological variability (MESS plots)
      • Can we do global predictions?
      • Are there samples we would drop and therefore do not need to run for read classification?
    • How much QC is needed
      • Stephen: Probably hasn't been done, but also not necessary
      • Better to keep track of quality and use that info downstream
      • Illumina looks better than 454
      • Could QC one library and compare protein family abundances pre and post QC
    • Size fractions reliable?
    • Stephen will start AGS analyses right away