Synthetic Biology:Semantic web ontology

From OpenWetWare
Jump to navigationJump to search

This is a part of the effort to provide a standardized, extensible, scalable and machine-processable interface for the Registry of Standard Biological Parts. The ideas of the Semantic Web seem to provide a solution to this problem. The success of developing a Synthetic Biology ontology depends in part on a good definition of the BioBricks abstraction hierarchy.

Meetings

First meeting

Held on Tuesday (9/20/05) at 3pm, room 68-674

Participants

(please put your name here):

To be discussed

Notes

Revised and extended Ilya's notes. Feel free to add/edit further. --Reshma 14:45, 23 Sep 2005 (EDT)

One of the lessons I took away from the first meeting is that there is not a one-size-fits-all ontology that we can use for the registry. Depending on your perspective or what your goal is, there are different pieces of information that you care about and more importantly likely a different way of organizing the information. For instance, if you are fabricating a system, then you likely care about basic parts and composite parts. If I am assembling something then I would like to know what parts of my systems have already been assembled previously so that I can take advantage of those. Now if instead I am a system designer, then I might care about getting the list of devices that meet certain performance characteristics. I don't necessarily care whether these devices are made up of basic parts or composite parts. If instead you are the person in charge of the Registry inventory, then you don't care just about part BBa_B0015, but rather every physical copy of BBa_B0015 and where it is located. Thus, for each particular application, there is likely a slightly different hierarchy that makes it easy to ask your particular question of interest. So for the registry semantic web ontology, we don't envision a single hierarchy but rather multiple parallel hierarchies each of which might be tailored for a different application space.

Inevitably, we won't be able to predict every possible thing that someone will want to do with the registry. Therefore, the best we can do is try to design the semantic web ontology in such a way that it makes it as easy as possible for other people to use the information in our registry. The example Randy presented is that someone develops a method for computing mRNA stability. They go and crawl the registry and predict the stability of every mRNA in the registry and list it somewhere. Later, we could crawl the web and notice that someone has all this new information that points to our parts and point from our parts to their data. In order to facilitate situations like these, we want to make the mapping of new ontologies onto our existing ontology as easy as possible so that others can make use of our information and we can make use of theirs.

So how do the design decisions we make now affect the ease with which different ontologies can be mapped onto each other. Well, you could imagine that you might run into problems if the lowest level of our ontology is not low enough. For instance, if the lowest level (or most fundamental unit) of the registry ontology is a BioBrick part and its associated part number. If someone later wanted to come along and create an inventory system, well then their job would be quite difficult because a single BioBrick part could have multiple physical instantiations. There could be a copy in the Endy lab and one in the Knight lab and one at Caltech. Thus, the group making the inventory system would have to figure out some way of mapping a single BioBrick part to multiple locations. Perhaps this is doable. But what happens if there is a mutation in the Knight lab copy of the part. Do you associate that information with the BioBrick part or do you associate it with a particular location? The answer that we came up with is that that information should be associated with a physical instance of a part. That physical instance of a part has a property of a location and a particular sequence etc.

Our conclusion from this example is that in order to make the mapping of one ontology onto another as easy as possible is that we need to make the lowest level of the registry ontology as fundamental and basic as possible. Thus our plan of action was to develop a list of primitives or atomic classes between which we could define relationships and move from the bottom to the top.

Some of the parallel ontologies/hierarchies that might be useful are

  • Physical ontology
    • Basic parts: piece of DNA, has a sequence associated with it
    • Composite parts: a series of basic parts and an assembly scheme
  • Design ontology
    • Parts: something with a particular molecular function
    • Devices: something that can be composed with other devices
    • Systems: something that can't be composed with other devices
  • Assembly standards
    • BioBricks standard assembly
    • BioBricks++
    • De novo synthesis
  • Performance standards
    • PoPS 1.0 standard (for instance)
  • Inventory

Other related questions we had are ...

  • How many classes of standards can you imagine: assembly standards, performance standards, other?
  • Is a part restricted to one piece of DNA or can it span two? Is it independent of the assembly scheme? Probably yes.
  • A composite parts is defined not only by the list of basic parts but also the assembly method. Are two composite parts the same if they contain the same basic parts? Probably only if they were assembled under the same standard.
  • How should we define devices?
  • Is there a usefulness to the device-system distinction?

Separate spaces or sets of hierarchies

Physical (DNA sequence, assembly methods):

#BioBrick rdf:type rdfs:Class .
#Composite rdf:type #BioBrick .
#Basic rdf:type #BioBrick .
#Composite #contains #Basic .

Design (performance characteristics):

#Parts rdf:type rdfs:Class .
#Devices rdf:type rdfs:Class .
#Systems rdf:type rdfs:Class .

Second meeting

Held on Friday (9/23/05) at 10am, room 68-121.

Participants

To be discussed

Basic terms for describing the biological parts and their properties.

Notes

Synthetic Biology:SWO2minutes

Registry features

  • Registry_Wish_List
  • Subpart Search: search for parts that match a portion of this part or this sequence of parts. Software agent would take a part name and using the ontology definitions would query other registries via their semantic web interfaces (no need to know about schema: e.g., just say "need all <#part>s that match a <#component> of the given <#part>"). Software agent can search anyone's registry if they use a common ontology: simply follow URLs (or use query language) and add triples to the local RDF store.
  • Superpart Search: search for parts that contain the given parts
  • What about sub- and superpart searches in distributed registries?
  • Search for function (case insensitive): repressor, reporter, inverter, etc.
  • What are the available (instances of) parts? Are they used in any devices already? (saves time for constructing expression device). Problem: different names for exactly same DNA sequence
  • What kinds of devices/systems have been built?
  • ?

Semantic Web

RDF/XML, RDF Schema and OWL

"an initiative to enable cross-platform data exchange and reuse through well-defined ontologies and a common XML-based framework."

"The goal of the Semantic Web initiative is to create a universal medium for the exchange of data where data can be shared and processed by automated tools as well as by people." [10]

  • allows to model real things, not just documents or database tables (knowledge representation)
  • consists of statements about resources in the form of triples:
SUBJECT -> PROPERTY -> VALUE
-OR-
SUBJECT -> PREDICATE -> OBJECT
  • identifies every resource with a globally unique URI: don't say "color", say <http://example.com/2005/std6#col>
  • allows “serendipitous reuse”: integration with data sources in other fields (“web join”)

Resources

  1. A No-Nonsense Guide to Semantic Web Specs for XML People Part I and Part II - good intro for those familiar with XML
  2. W3C Semantic Web Activity (links to RDF, OWL, etc)
  3. Semantic Web primer from 2000 at xml.com
  4. Berners-Lee - Semantic Web Life Sciences - BioIT World
  5. Web Services - Semantic Web by Tim-Berners Lee
  6. Introduction to the Semantic Web and RDF by A.M. Kuchling
  7. Wikipedia article on Semantic Web
  8. Semantic Web tutorials from W3C
  9. Semantic Web tutorial using N3
  10. Primer: Getting into RDF & Semantic Web using N3
  11. Semantic Web: interview with Tim Berners-Lee
  12. W3C press release
  13. Design Issues @ W3C - web architecture and metadata
  14. Notions & Notations of the Semantic Web - MIT 6.898 Fall Seminar Course
  15. Semantic Web news aggregator

Resource Description Framework (RDF)

Used for making statements about facts

<http://www.example.org/index.html>  has a creator whose value is John Smith

the RDF terms for the various parts of the statement are:

Beware of thinking of RDF as a format for serailizing objects. The semantic web is different - it is weblike.

  • Any document can (potentially) say anything about anything. There is no set of "slots" or "attributes" for a class. The properties defined in a schema are not the only properties which one can use to describe something which is in that class.
  • An object can be in many classes. When you create a semantic web document about something, others can deduce more things about it, in vocabularies you have never heard of.
  • Entity-Relationship and UML diagrams are useful for describing RDF -- so long as you remember the above.

From http://www.w3.org/2000/10/swap/doc/formats.

Resources

  1. RDF Primer at W3C
  2. Wikipedia page
  3. RDF @ W3C - a lot of links to resources
  4. RDF Made Easy - a short tutorial
  5. Intro to RDF and Jena RDP API
  6. RDF Tutorial @ W3C - a lengthy presentation
  7. Practical RDF - O'Reilly book, decent but not great
  8. RDF FAQ @ W3C
  9. RDF Data Access Use Cases and Requirements
  10. Relational Databases on the Semantic Web
  11. RDF Tutorial from the University of Lyon
  12. RDF Concepts and Abstract Syntax
  13. RDF semantics
  14. RDF Test Cases
  15. RDF/XML Syntax Specification
  16. RDF Vocabulary Reference

Software

RDF Schema

  • ‘semantically extends’ RDF to enable us to talk about classes of resources, and the properties that will be used with them
  • provides the means to describe application specific RDF vocabularies
  • provides schema information as additional descriptions of resources, but does not prescribe how these descriptions should be used by an application
  • describes classes (corresponds to the generic concept of a Type or Category) and properties

RDF schemas differ somewhat from XML schemas (such as DTDs or W3C XML Schemas) in that they do not define a permissible syntax but instead classes, properties, and their interrelation: they operate directly at the data model level, rather than the syntax level

Resources

  1. RDF Vocabulary Description Language 1.0: RDF Schema - W3C Recommendation
  2. RDF Schema Vocabulary Reference

Software


Web Ontology Language (OWL)

Used to define relationships between vocabularies and other constraints. As the Semantic Web is inherently distributed, OWL must allow for information to be gathered from distributed sources.

OWL adds the ability to indicate when two classes or properties are identical. OWL declarations provide additional information to let rule-checking and theorem-proving systems work with RDF data.

  • e.g. the 'ancestor' property is transitive.
  • If (X ancestor Y) and (Y ancestor Z) are true, a system could infer that (X ancestor Z) is also true.

Resources

  1. Wikipedia page
  2. OWL @ W3C - links
  3. OWL Web Ontology Language Guide
  4. OWL Web Ontology Language Overview
  5. OWL Web Ontology Language Reference
  6. OWL Web Ontology Language Semantics and Abstract Syntax
  7. OWL Vocabulary reference
  8. OWL Implementations as of December 2003 (Historical)
  9. OWL Web Ontology Language Semantics and Abstract Syntax

Software

Query

Resources

  1. RDF Query Survey - query language survey
  2. Redland Rasqal RDF Query Demonstration

RDQL

  1. RDQL - A Query Language for RDF
  2. RDQL Tutorial from phpxmlclasses project
  3. A Programmer's Introduction to RDQL - Jena tutorial

SPARQL

  1. SPARQL Query Language for RDF
  2. SPARQL Query Results XML Format

Ontologies

Essentially, a formal description of objects and their interrelationships Described using RDF Schema and/or OWL.

Examples:

  1. Dublin Core provides a vocabulary to describe bibliographic metadata
  2. Gene Ontology provides a controlled vocabulary to describe gene and gene product attributes in any organism. GO terms are organized in directed acyclic graphs (DAGs), which differ from hierarchies in that a 'child' (more specialized term) can have many 'parents' (less specialized terms). GO terms are connected by 'is a' (generalizations) and 'part of' (composition) relationships.
  3. Sequence Ontology: features on a nucleotide or protein sequence
  4. BioPAX: biological pathway data
  5. UniProt (planning)
  6. SBML uses CellML metadata to describe its elements. See also a message on SBML forum.
  7. BioModels database and Systems Biology Ontologies (SBO) project
  8. Open Biomedical Ontologies
  9. Bio-Ontologies
  10. Ontologies for molecular biology and bioinformatics

Diagram of synthetic biology ontology v0.01 (developed using existing terminology described on the Registry website):

Resources

  1. GO annotation wiki (from Sri)
  2. RDF Schemas directory
  3. Ontology Development 101: A Guide to Creating Your First Ontology

Software

  • Ontology editors
    • Protégé is a free, open source ontology editor and knowledge-base framework
    • OilEd is an ontology editor allowing the user to build ontologies using DAML+OIL
    • Ontolingua
    • Chimaera is a software system that supports users in creating and maintaining distributed ontologies on the web

Implementation

Miscellaneous

Ontology

Contact: Ilya Sytchev