Metagenomics1 – the study of the genomes of many microbes in an environment simultaneously -has the potential to revolutionize our understanding of the hidden yet incredibly important world of microorganisms. This potential has been highlighted by a series of recent metagenomic-based studies [1-8] as well as multiple government reports  including in particular the recent National Academy of Sciences report “The New Science of Metagenomics – Revealing the Secrets of our Microbial Planet.” The great potential of metagenomics comes with enormous challenges in the analysis of the data2. These challenges include the fragmentary nature of sequence data, the sparse sampling of genomes, populations and communities and the unknown phylogenetic diversity and ecological structure of the communities being sampled . Methods designed for analysis of single organism genomes simply do not work well on data sets sampled from complex ecological communities. To develop new methods, the NAS report suggested (and we agree) that integrated approaches involving interdisciplinary teams of researchers are needed in which the researchers both ask scientific questions and develop new data analysis tools.
Here we propose building exactly such an integrated, interdisciplinary effort, bringing together three labs with different relevant areas of expertise including statistics (to deal with the sparse sampling), comparative genomics (because the data is genomic in nature), evolutionary biology (to assess phylogenetic and genomic diversity), and ecological theory (to examine community structure).
The research we propose covers three major topics considered of fundamental importance in metagenomic studies3: biodiversity, evolutionary dynamics, and statistical measures. Our proposed work should lead to novel insights into microbial ecology and evolution. In addition, at the core of all of our work is the development of novel mathematical, statistical and computational methods for analyzing metagenomic data. Since these methods will be of use to the research community at large, we propose to work closely with the CAMERA team to the methods broadly available through the CAMERA database.
1 We use the term metagenomics to refer to shotgun sequencing DNA from environmental samples
2 The NAS report identifies five challenges in metagenomics: need for interdisciplinary teams, role of government, methods development, complexities of data analysis and need for databases
3 In the NAS report they identify four key questions: how can we find new functions, how diverse is life, how do microbes evolve and what role do microbes play in the health of their hosts
Jonathan Eisen Evolutionary and comparative genomics, metagenomics, phylogenetics
Katherine Pollard Statistical and computational genomics
Jessica Green Applied and theoretical ecology, microbial community structure
Proposed research areas
- Metagenomic based characterization of microbial biodiversity
- Guidelines and weightings for using different gene families in metagenomic based diversity assays
- Searching for novel phylogenetic types in metagneomic data
- Metagenomic analysis of community phylogenetic structure
- Estimating biodiversity from metagenomic samples
- Metagenomic studies of microbial evolutionary dynamics
- Molecular evolutionary dynamics of gene families
- Population genomics
- Statistical metagenomics: Correlation analysis of sequence data and metadata