User:Morgan G. I. Langille/Notebook/Project management: Difference between revisions
From OpenWetWare
Jump to navigationJump to search
Line 17: | Line 17: | ||
==Protein family stuff with Steve== | ==Protein family stuff with Steve== | ||
*correlate different sample similarities using taxon vs pfam bc | |||
*'''Need to re-write R scripts to calculate correlations and ecological distances from matrix (this needed for erebus project as well)'''. | *'''Need to re-write R scripts to calculate correlations and ecological distances from matrix (this needed for erebus project as well)'''. | ||
**Look at online notebook for some code, as well as darpa wiki (for the correlation stuff). I emailed steve asking for his R scripts. | **Look at online notebook for some code, as well as darpa wiki (for the correlation stuff). I emailed steve asking for his R scripts. |
Revision as of 12:39, 31 January 2011
Halophiles
Need to make list of things to be done for roche genome paper.
- Organize files for roche genomes and new NCBI completed halophile genomes
- run mugsy on all genomes
- Re-do crispr analysis for new NCBI genomes.
- Look at homologs of genes identified in new Science metabolic paper.
Darpa
- phone call on Friday
Erebus
- got matrix and built cluster using vegdist
- need to think about ways to identify pfams that have different counts to each other and to whole genomes.
- take pfam counts from all completed genomes, get a distribution, then ask if a single count is normal or not taking into account mutiple test correction
Protein family stuff with Steve
- correlate different sample similarities using taxon vs pfam bc
- Need to re-write R scripts to calculate correlations and ecological distances from matrix (this needed for erebus project as well).
- Look at online notebook for some code, as well as darpa wiki (for the correlation stuff). I emailed steve asking for his R scripts.
Rough Ideas
Starting with PFAM counts across all GOS samples
- Looking at samples
- alpha diversity of GOS samples (measure total protein diversity in each sample)
- provide a listing of most diverse samples and indicated if those are environmentally related
- beta diversity of GOS samples (are the samples related...presumbly yes)
- show a tree and possible a network describing the relatedness of the samples
- estimate total number of different protein families for each sample and all samples combined using chao index?
- alpha diversity of GOS samples (measure total protein diversity in each sample)
- Looking at families
- alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
- provide list of most diverse families and maybe suggest why those are so diverse?
- beta diversity -> do the groupings tell us anything (e.g. are they similar function, similar localization, etc.)
- map to GO terms to see if similar function
- chao index
- estimate total number of proteins for each family in the ocean (what is the most prevalent)
- alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
Collaboration with Steve would be a comparison between diversity measurements using taxon vs phylogenetic vs functional