Moore Notes 1 21 15
From OpenWetWare
Jump to navigationJump to search
Discussion of TARA Oceans data
- Participants: Katie, Josh, Tom, Guillaume, Stephen
- Stephen:
- Data embargo issue
- Updated summary (slides)
- Analysis discussion:
- What do we want to do with the data?
- Start with aims of proposal
- How to preprocess?
- They will likely release EGGNOG abundances
- They may map reads to assemblies (gene catalog)
- Do we need something more/different?
- Database
- Classification thresholds
- AGS normalization
- diamond vs. rapsearch2
- Do a quick comparison (correlation) of bit scores
- If highly correlated, can use previously identified thresholds
- Do a quick comparison (correlation) of bit scores
- Many (667) samples to process
- Prioritize the prokaryote size fraction, then protists, then viruses
- Prioritize open ocean (all?), surface waters (approximately 216 samples)
- Start with metagenomes
- Josh will look at ecological variability (MESS plots)
- Can we do global predictions?
- Are there samples we would drop and therefore do not need to run for read classification?
- How much QC is needed
- Stephen: Probably hasn't been done, but also not necessary
- Better to keep track of quality and use that info downstream
- Illumina looks better than 454
- Could QC one library and compare protein family abundances pre and post QC
- Size fractions reliable?
- Stephen will start AGS analyses right away
- What do we want to do with the data?