BioMicro:NextSEEK

From OpenWetWare
Jump to navigationJump to search

SEEK

The large diversity of data and the need to link them are major challenges in the field of systems biology; SEEK was built to address these issues. SEEK is a data management system that stores a wide range of biological metadata, and is built to handle all sorts of sample types, versus other repositories that can only handle one specific type of data.

SEEK stores experiments, which are then connected to their respective protocol and metadata. Since all data in an experiment is inherently linked, SEEKs approach storing data makes it easy to visualize the connections between all of the data. Experimental assays that describe experimental procedures are connected to samples and models in the system. The figure below shows all of the IMPAcTB assay connections between the respective sample types. Data is uploaded to SEEK in excel sheets, where it is then linked to its respective protocol, which is stored as a document. The SEEK system is built to support both individual and groups of scientists, due to its data sharing and management features.

For those interested in learning more about SEEK, this paper is a good starting point.

NExtSEEK

While SEEK met a lot of our needs, there were still some features we wanted to implement. Therefore, we created NExtSEEK, an implementation of SEEK that supports many of the original data storage and sharing resources. Similar to SEEK, NExtSEEK stores samples in tables and protocols in documents. The main difference is that experimental assays act as a connector between data samples. For example, in the diagram below, the mouse sample type is connected to the DNA library sample type with the ear punch assay, as the library was extracted from the mice via an ear punch. The overall sample types in NExtSEEK are connected by assays, while the individual samples are connected to the samples directly upstream and downstream using the “parent” metadata attribute. In the situation above, all of the DNA libraries would be connected to the specific mouse that they were extracted from. All of these connections result in a diagram that outline the workflow of the entire experiment, including the different data types that were generated and the protocols used to do so.



The generalizability of NExtSEEK data storage makes it simple to transfer the data to our public repository, FAIRDOMHub, or another external repository. Additionally, the flexibility of the system makes it easy to add more sample types if we do not already provide support for storing it.



Storing rich metadata in NExtSEEK can also allow for new findings, as it theorized that combining multiple datasets can lead to new discoveries. For example, by working with the IMPAcTB consortium, we can collect all of the different types of experimental data, with the hopes that these single omics datasets could potentially be combined into a multi-omic approach across multiple different datasets, to discover something that would not have been possible if not. This is phrased as “meta-analysis” across multiple scientific studies, often times performed by different researchers. This would not be possible if we did not have a well curated database with rich metadata describing the samples that span the different studies. Additionally, NExtSEEK allows users to query based on metadata values, allowing scientists to focus on specific groups of data.


For those interested in learning more about the technical aspects of NExtSEEK and the architecture of the data storage, this paper outlines the details behind the system.

FAIRDOMHub

FAIRDOMHub is the publicly available implementation of our data management system. A FAIRDOMHUB page encompasses all the associated metadata, protocols, and data files of a specific study. Because of this structure, all you have to do to release the data to the public is to link to the respective FAIRDHOMHub page in the paper. Data files are not stored on this page; instead, the metadata has attributes that point to where the data is stored on external repositories. All of our sample types associated with data files point to a specific repository tailored to that type of data, as shown below. For example, we store images on OMERO, a repository built for storing images, like tissue scans and microscopy images. If a researcher was publishing image data, then every image sample in FAIRDHOMHub would be connected to the respective OMERO link storing the actual image, while also containing the associated metadata. The FAIRDOMHub page also shows a diagram that encompasses all the sample types and protocols used in the study, making it easy to visualize the workflow.

To see the public data on FAIRDOMHub, check out studies published by Impact and SRP.