Survey of metagenomic data

From OpenWetWare
Jump to navigationJump to search

Home Project People News For Team Calendar Library


Survey of CAMERA Metagenomic Data

Joshua Ladau (with help from Samantha Riesenfeld)

A preliminary step of the iSEEM project is to use CAMERA to assemble information about existing metagenomic data. We compiled information on the types of metadata that are available for each project on CAMERA by consulting (i) individual metadata files for each project (on each project's webpage) and (ii) the listing of metadata on the CAMERA Project Samples webpage. We also assembled information on sequence data for each project by consulting files available for download on each project's webpage, published papers for each project, and the File Server Download page.

Results

Our findings are summarized in the Metadata and Sequencing Tables below.

  • Click on a table to see a larger version:
File:Metadata Table.jpg
Metadata Table
File:Sequencing Table.jpg
Sequencing Table

The metadata available in the individual project files and those compiled on the Project Samples page are generally alike, although in a few cases, metadata are available in only one location. The available types of metadata fall into two broad categories: attributes of samples and attributes of locations from which samples were collected. In the former category, most projects provide date, location, and sample size information. In some cases, the projects also include information on sampling procedure used (e.g., filter size), number of organism sampled (e.g., bacteria abundance), and volume of substrate sampled. With regard to the location attributes, almost all studies provide information on the temperature, habitat type, and depth of samples. Other attributes of the abiotic and biotic environment are also often included, including major ion concentrations and biomass estimates. See the Metadata Table above.

With respect to the sequence data, all projects, except the Ocean Viruses and Moore Microbial Sequencing projects, make available raw reads, amino acid sequences, open reading frames, and ribosomal sequences. Six projects also provide assemblies. See the Sequencing Table above.

Conclusions and Recommendations

Overall, our experience working with CAMERA was positive. In general, the website is clearly designed, reliable, and fast. However, we did identify some aspects in which the website could be improved. First, the File Server Download page could be better organized, with data sets perhaps grouped by project or data type (e.g., nucleotide sequences, amino acid sequences, etc.). Given the size of many of the files, it would also be very helpful to have direct access to the database or command line access to the directories that contain the flat files. Second, while many files are accompanied by helpful documentation, additional documentation would at times be useful. For instance, it is unclear whether the ORF files available for different projects are equivalent.

We believe that the metadata and sequence data available on CAMERA will be useful in the development and application of new approaches to metagenomic data analysis. Cataloguing the available data comprises an important initial step towards reaching these goals.