Revision as of 15:50, 4 February 2008

Data Exchange Standards

Questions

Discuss and answer these questions concerning data exchange standards:

What is the data model needed to describe a biobrick?

Once the data model is firmly in place, the format should follow as the one that best implements that data model. For example, if we settle on an RDF-like 'everything is a relationship triplet' approach, then some format that can handle these triplets would be most appropriate. In addition, with a model like this, there are XML-based and more human-readable formats that can both implement the model equally well.

I think that tying our selves to a format too early will make us not have a clear model in mind, and will cause us to hack up the format. It is best to do model, then format.

So things to think about in a model are what type of relationships to we want to convey?

Inheritance (where was a particular part derived from, and by who = link + data)
Characterization (something quantitative about that part by itself = data)
Plays well with others (what other parts can this one interact with - with possible data associated with this interaction = link + data)
...

What is the best format / technology for exchange?

Suggestions

Please fill in these sections with details

create a new XML format

adapt existing CellML, SBML XML formats

create a custom file format

use Turtle/N3 notation for semantic web documents

Example of Turtle/N3

I somewhat share the reservation about completely new file formats, but the non-readability and general nastiness of XML is also an issue. A good solution, IMO, would be to use the Turtle format (formerly "notation 3" or N3) developed by the semantic web folks. It is concise, human-readable and editable (i used it myself some years ago) *AND* is equivalent to XML. That means there is a well defined translation back and for and many libraries and tools do the conversion. Being semantic web, it also solves the linking problem (everything is a link).

I'll cook up some small example and send it around later today. Quick Preview:

... skipping namespace definition for rdf, bbf, harvard ...

define a biobrick hosted at this address

BBa_0001

      rdf:type        bbf:biobrick;
      bbf:sequence    "AAACCCGGG";
      bbf:contains    [:BBa_0003, harvard:BBa_J1000, :BBa_00010];

.

add information to biobrick defined elsewhere

harvard:BBa_J1000

      rdf:sameAs      :BBa_0999;

. </pre></html> OK, one can argue about human-readability but it's at least possible to understand and edit these documents (and much better than the equivalent xml).

The BioBricks Foundation:Standards/Technical: Difference between revisions

Revision as of 15:50, 4 February 2008

Contents

Data Exchange Standards

Questions

What is the data model needed to describe a biobrick?

What is the best format / technology for exchange?

Suggestions

create a new XML format

adapt existing CellML, SBML XML formats

create a custom file format

use Turtle/N3 notation for semantic web documents

Example of Turtle/N3

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

research

Tools