Metagenomics1 – the study of the genomes of many microbes in an environment simultaneously -has the potential to revolutionize our understanding of the hidden yet incredibly important world of microorganisms. This potential has been highlighted by a series of recent metagenomic-based studies [1-8] as well as multiple government reports  including in particular the recent National Academy of Sciences report “The New Science of Metagenomics – Revealing the Secrets of our Microbial Planet.” The great potential of metagenomics comes with enormous challenges in the analysis of the data2. These challenges include the fragmentary nature of sequence data, the sparse sampling of genomes, populations and communities and the unknown phylogenetic diversity and ecological structure of the communities being sampled . Methods designed for analysis of single organism genomes simply do not work well on data sets sampled from complex ecological communities. To develop new methods, the NAS report suggested (and we agree) that integrated approaches involving interdisciplinary teams of researchers are needed in which the researchers both ask scientific questions and develop new data analysis tools.
Here we propose building exactly such an integrated, interdisciplinary effort, bringing together three labs with different relevant areas of expertise including statistics (to deal with the sparse sampling), comparative genomics (because the data is genomic in nature), evolutionary biology (to assess phylogenetic and genomic diversity), and ecological theory (to examine community structure).
The research we propose covers three major topics considered of fundamental importance in metagenomic studies3: biodiversity, evolutionary dynamics, and statistical measures. Our proposed work should lead to novel insights into microbial ecology and evolution. In addition, at the core of all of our work is the development of novel mathematical, statistical and computational methods for analyzing metagenomic data. Since these methods will be of use to the research community at large, we propose to work closely with the CAMERA team to the methods broadly available through the CAMERA database.
1 We use the term metagenomics to refer to shotgun sequencing DNA from environmental samples.
2 The NAS report identifies five challenges in metagenomics: need for interdisciplinary teams, role of government, methods development, complexities of data analysis and need for databases.
3 In the NAS report they identify four key questions: how can we find new functions, how diverse is life, how do microbes evolve and what role do microbes play in the health of their hosts.
Jonathan Eisen Evolutionary and comparative genomics, metagenomics, phylogenetics
Katherine Pollard Statistical and computational genomics
Jessica Green Applied and theoretical ecology, microbial community structure
Proposed research areas
- Metagenomic based characterization of microbial biodiversity
- Guidelines and weightings for using different gene families in metagenomic based diversity assays
- Searching for novel phylogenetic types in metagneomic data
- Metagenomic analysis of community phylogenetic structure
- Estimating biodiversity from metagenomic samples
- Metagenomic studies of microbial evolutionary dynamics
- Molecular evolutionary dynamics of gene families
- Population genomics
- Statistical metagenomics: Correlation analysis of sequence data and metadata
Though each of these projects can be considered separate activities, they are highly interdependent and the interdisciplinary nature of the labs involved is critical for the success of the project. For example, we propose to use phylogenetic analysis to search for new types of organisms and genes (project 1.2 led by Eisen). The results of this phylogenetic analysis will also be used to assess the phylogenetic structure of communities (project 1.3 led by Green) and to study evolution of gene families across environments (project 2.2 led by Pollard). Similarly, the statistical methods developed for comparative metagenomics (project 3 led by Pollard) will be used in the population genomic studies (project 2.1 led by Eisen) as well as in the development of biodiversity estimators (project 1.4 led by Green). To achieve this integration we plan to have active feedback between each of the PIs and each of the projects. To do this we propose a management plan that will help guide these interactions. We believe that by taking this integrated approach – both in terms of the research topics and by combining separate fields of study, we will not only make important scientific discoveries about microbial communities but we will also build and develop novel methods and approaches of great utility to the metagenomics community.