BBF RFC 30

From OpenWetWare
Jump to navigationJump to search

Add your comments to RFC 30 here. See BBF_RFC_20 for an example how comments can be formatted and signed.

Michal Galdzicki, 24/04/09

Raik, This RFC is really helpful. I appreciate the direction it gives for definitions of data formats and your support of RDF/OWL for this purpose. Below I detail some nit-picky comments that came up while reading. I have also included a paragraph which fits as a conclusion. I will consider RFC30 while drafting RFC 31, and that may result in some more comments.

5.2

"Whenever appropriate, extension authors SHOULD re-use definitions from well supported other RDF ontologies"

Consider : "Whenever appropriate, extension authors SHOULD re-use definitions from well established RDF/OWL ontologies, as they constitute standards for other domains science."

5.3

"In any case, a owl:sameAs link SHOULD connect the new standard back to the RDF document of the original proposal."

Consider that owl:sameAs will be interpreted to mean sameAs reciprocally.

Consider another versioning scheme, self defined (active research area I believe)

   *Raik 08:42, 25 April 2009 (EDT):
   I was assuming the BBF would accept the new ontology without changes but may
   prefer to have the elements defined in the BBF name space (which makes life
   easier for everyone). I guess, in this case owl:sameAs would be ok because the
   two copies really are mutually the same. Versioning is indeed a whole different
   issue though... for data *and* for schemata.  

5.4

"The data documents SHOULD be serialized to XML but, depending on the situation, other formats, like Turtle/N3 or JSON MAY be preferred."

Consider adding: " If another serialization is used the format chosen MUST NOT lose or leave out information in the conversion from the original XML serialization."

5.6

"That means a simple HTTP GET MUST serve the document just as it would serve an html formatted web page about it. That also means data access SHALL NOT require the initialization of web services or any other kind of remote procedure calls."

Potentially a technical contradiction

HTTP GET may be technically considered a web service it self. "SHALL NOT require" phrase lacks recommended actions that I should take when interpreting the RFC document. Personally, I agree with the sentiment that I think you are expressing; as a first choice data should be served in a way everyone can use it.

5.6

"Software that consumes Synthetic Biology data records MUST be able to open, parse and interpret RDF documents. The software SHOULD, at least, parse XML-formatted RDF documents. Support of more specialised and readable formats like turtle/N3 is RECOMMENDED."

This is a little confusing:

1. MUST parse RDF (at least one serialization)

a. SHOULD parse XML-RDF

b. RECOMMENDED Turtle / N3

According to RFC 0 'The word "SHOULD" or the adjective "RECOMMENDED" mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and care-fully weighed before choosing a different course.' Therefore a = b in objective terms therefore you are not recommending one over the other.

One solution, change to: MUST parse RDF-XML

Another solution, change to say: MAY support Turtle/N3

5.6

"At least in the long term, data stores are also RECOMMENDED to support the SPARQL W3C standard for more complex queries."

SPARQL endpoints are web services, as far as I understand. http://semanticweb.org/wiki/SPARQL_endpoint

Refers back to confusion that may arise from "SHALL NOT require the initialization of web services"

6.1

"Soap" Change to SOAP

*Raik 08:43, 25 April 2009 (EDT):

Thanks a lot for the comments! Looks like I was a bit too quick in pushing this out. I suggest we collect some more and then write a joint RFC that replaces RFC 30.

Douglas Densmore 20:27, 28 April 2009 (EDT)

Disclaimer: I am not very familiar with RDF:)

In general I like the RFC and do have not any concrete reasons why I would not support it. The broad goals of establishing a core data model and a way to share it and extend it are mine as well. Again, I really am a consumer of data at the end of the day. In fact, ultimately I want to get tools built so I can explore the intersection of bio and EECS through a set of much more abstract representations of biological data. I want to make tools that help us get standards in place and I think this is a step in the right direction.

Section by section comments:

4

You mention that "third party RDF data will therefore rarely immediately map into a pre-existing relational database schemas". So is the expectation that folks using a relational database have another layer on top which allows RDF data to populate their database? Would this be the same software that serves up their data as RDF?

You mention 4 things needed:

  • RDF definition of core data model - I see that there is an RFC for PoBoL now. We have created one as well (#33). I would be more than happy to capture ours as RDF. I would be interested to see how easy it is for two data models to exist in the RDF space. Your RFC makes it sound as if this is possible and that they could extend each other. Is this true?
  • Some guidelines for extensions - I agree that this is key.
  • Recommendations for data publication and synchronization - I am not clear on what this means. Do you mean the technical aspect of this (e.g. race conditions, data coherency) or do you mean from a "community organization" standpoint?
  • Software or servers that can read or write RDF - Sign Clotho up!


5.1

"Biobrick" as a term is used a lot. In my brief exposure to biology this seems like a loaded term and one that is not meaningful to all synthetic biologists (not to mention all biologists). Is something more generic warranted? I am all for bioBricks but it seems like biobricks and their assembly can be represented in more flexible schemes as well. I only mention this since some folks are wary of my tools thinking that they only work for folks doing biobricks.

5.2

Can you only extend the core model (as opposed to remove items from it)? What if fields are not used? Does something still adhere to the core data model in that case? Is there a notion of backward compatibility? Expressiveness? Basically I would like to know how we are going to compare data models and have tools that require more than the data model provides.

5.4

You state "RDF data documents SHOULD thus be hosted at permanent immutable locations". How robust is this? What if a server goes down? Does RDF have any notion of data distribution or replication? Has anyone built a system, that traverses the RDF "space" and creates mirror sites? Just wondering....

5.6

Speaking for Clotho, I am fine with the requirements for software and can commit to getting this capability into Clotho this summer.



Right now we are capturing our idea of a data model for RFC 33. I can volunteer to specify it as RDF to get feedback. It would be great if those making a more Pobol based version could do so as well.

Also, if Java code exists to process and use RDF, it would be great if someone could point me at it. Thanks.