User:Morgan G. I. Langille/Notebook/Project management: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
Line 8: Line 8:
==Darpa==
==Darpa==


===Manuscripts===
*phone call on Friday
#Association paper showing Miguel and Xingpeng methods. Led by Miguel & Xingpeng
#Microbial geographical paper incorporating environment and location. Is it distance or habitat? Using GOS data. Unknown genes? Xingpeng & Morgan
#Unknown genes?


==Erebus==
==Erebus==

Revision as of 16:57, 27 January 2011

Halophiles

Erin and Andrew getting ortholog sets all halophile genomes Need to make list of things to be done for roche genome paper.

  1. Re-do crispr analysis for new NCBI genomes.
  2. Look at homologs of genes identified in new Science metabolic paper.

Darpa

  • phone call on Friday

Erebus

  • got matrix and built cluster using vegdist
  • need to think about ways to identify pfams that have different counts to each other and to whole genomes.
    • take pfam counts from all completed genomes, get a distribution, then ask if a single count is normal or not taking into account mutiple test correction

Protein family stuff with Steve

  • Need to re-write R scripts to calculate correlations and ecological distances from matrix (this needed for erebus project as well).
    • Look at online notebook for some code, as well as darpa wiki (for the correlation stuff). I emailed steve asking for his R scripts.


Rough Ideas

Starting with PFAM counts across all GOS samples

  • Looking at samples
    • alpha diversity of GOS samples (measure total protein diversity in each sample)
      • provide a listing of most diverse samples and indicated if those are environmentally related
    • beta diversity of GOS samples (are the samples related...presumbly yes)
      • show a tree and possible a network describing the relatedness of the samples
    • estimate total number of different protein families for each sample and all samples combined using chao index?
  • Looking at families
    • alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
      • provide list of most diverse families and maybe suggest why those are so diverse?
    • beta diversity -> do the groupings tell us anything (e.g. are they similar function, similar localization, etc.)
      • map to GO terms to see if similar function
    • chao index
      • estimate total number of proteins for each family in the ocean (what is the most prevalent)

Collaboration with Steve would be a comparison between diversity measurements using taxon vs phylogenetic vs functional