User:Morgan G. I. Langille/Notebook/Project management: Difference between revisions

From OpenWetWare

< User:Morgan G. I. Langille‎ | Notebook

Jump to navigation Jump to search

Revision as of 11:48, 26 January 2011

Halophiles

Erin and Andrew getting ortholog sets all halophile genomes Need to make list of things to be done for roche genome paper.

Re-do crispr analysis for new NCBI genomes.
Look at homologs of genes identified in new Science metabolic paper.

Darpa

Manuscripts

Association paper showing Miguel and Xingpeng methods. Led by Miguel & Xingpeng
Microbial geographical paper incorporating environment and location. Is it distance or habitat? Using GOS data. Unknown genes? Xingpeng & Morgan
Unknown genes?

Erebus

~~Got 3 samples shared on mg rast. Need to run them through pfam pipeline.~~
Downloaded 3 samples, gzipped them, and started run on genbeo using hmmscan.
sample 1A 132K reads
sample 3A 500k reads
sample XB 580k reads
quick measurement suggests 30K sequences being matched per day.
- not splitting sample will take 20 days!
- Split the samples and started the runs over again.
re-wrote hmmscan_to_hmmscan.pl

Protein family stuff with Steve

trying to find data that I deleted...not looking good.
Need to check the following locations:
- work computer
- old tarball that is being copied to /share/eisen-d1 (this is from Jan, 2010, so not a good option, but would have the main pfam vs GOS dataset)
- pfam vs camera/gos matrix is on darpa wiki
- try to recover manually from old image?
Need the following (missing) files:

hmmscan of pfams vs GOS (or "camera") -> this is on darpa wiki
perl files to convert hmmscan output to a matrix of pfam counts (this needed only for erebus project as well, maybe rewrite)
R scripts to calculate correlations and ecological distances from matrix (this needed for erebus project as well)

Rough Ideas

Starting with PFAM counts across all GOS samples

Looking at samples
- alpha diversity of GOS samples (measure total protein diversity in each sample)
  - provide a listing of most diverse samples and indicated if those are environmentally related
- beta diversity of GOS samples (are the samples related...presumbly yes)
  - show a tree and possible a network describing the relatedness of the samples
- estimate total number of different protein families for each sample and all samples combined using chao index?
Looking at families
- alpha diversity. what fams are the most rich (not that interesting), diverse (interesting and informative)
  - provide list of most diverse families and maybe suggest why those are so diverse?
- beta diversity -> do the groupings tell us anything (e.g. are they similar function, similar localization, etc.)
  - map to GO terms to see if similar function
- chao index
  - estimate total number of proteins for each family in the ocean (what is the most prevalent)

Collaboration with Steve would be a comparison between diversity measurements using taxon vs phylogenetic vs functional

Retrieved from "https://openwetware.org/mediawiki/index.php?title=User:Morgan_G._I._Langille/Notebook/Project_management&oldid=488460"

Navigation menu