Moore Notes 7 6 11
From OpenWetWare
Jump to navigationJump to search
Group Call
- Protein Db
- Update from Guillaume
- Re-running crashed jobs
- PFAM clustering
- What distance and algorithm?
- Compare to what? Homology similarities and families
- Maybe save for a later paper
- Need to compare to PFAM: provides functional info and comparison of amount of clustering
- Should also compare to COGs (have phylogenetic context)
- Why is this db/paper different from existing protein databases?
- Full length gene families
- Derived from bacterial genomes
- High-throughput, automated, easily updated with new genomes, open
- Generation of full-length protein families and models
- Description of workflow
- Description of database
- Database accessibility
- Statistical assessment of the families
- Family size distribution
- Family PD distribution
- Precision and recall distributions (local v. global)
- The relationship between families in homology space.
- Which families have models that recruit the same sequences?
- Cytoscape-like network map of family homology (see attached image)
- Clusters may represent superfamilies
- The relationship between families in functional space
- Hierarchical clustering of families by their pfam annotations
- Can clusters be partitioned into broad-based functional groups?
- The overlap between these relationships
- Can we quantify the amount of overlap between the homology clusters and the functional clusters?
- What does this tell us about the evolution of function across superfamilies?
- To do
- Make an outline (Tom, Katie)
- Introduction (Morgan)
- Describe workflow and metrics (Dongying, Guillaume, Jonathan)
- Compare to PFAM
- Compare to COGs or describe differences
- Finish statistical analyses
- Search vs. metagenomes and/or new genomes (compare to PFAM or COGs?)
- Update from Guillaume
- GBMF proposal request
- Overhead issue
- Katie will start outlining, Jonathan back next week