The BioBricks Foundation:Standards/Technical: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
 
(34 intermediate revisions by 6 users not shown)
Line 1: Line 1:
*[[The_BioBricks_Foundation:Standards/Technical/Resources | Technical Resources]]
=BBF Technical Standards Mailing List=
*[[The_BioBricks_Foundation:Standards/Technical/BBF_WetLab_Challenges | BBF WetLab Challenges]]


= Data Exchange Standards =
Join or check out the BBF technical standards mailing list and archives [http://www.biobricks.org/mailman/listinfo/standards here]!


This working group aims to define standards for the description of biobricks and formats / technologies for the exchange (or networking) of biobrick-related data.
=BBF Technical Standards RFC Process=


This falls into the following questions (Discuss and answer!):
[http://openwetware.org/wiki/The_BioBricks_Foundation:RFC BBF RFCs] <-- read and learn how to contribute a BBF RFC here!


0. Aim / Application scenarios
if the above link is broken see: http://dspace.mit.edu/handle/1721.1/43714/browse?type=dateissued
[http://openwetware.org/wiki/The_BioBricks_Foundation:RFC?action=edit]


1. What is a Biobrick?
=Active Technical Standards Projects=


2. What is the data model needed to describe a biobrick?
*[[The_BioBricks_Foundation:Standards/Technical/Exchange | Data exchange standards]]  
 
**Synthetic Biology Open Language (SBOL) http://sbolstandard.org '''Active 2012'''
3. What is the best format / technology for exchange?
*[[The_BioBricks_Foundation:Standards/Technical/Formats | Biobrick&trade; Formats (aka Physical Composition)]]
 
*[[The BioBricks Foundation:Standards/Technical/Measurement | Measurement standards]]  
== Aim / Application scenarios for this standard ==
*[[The_BioBricks_Foundation:Standards/Technical/Resources | Technical Resources]]
 
*[[The_BioBricks_Foundation:Standards/Technical/BBF_WetLab_Challenges | BBF WetLab Challenges]] <-- is this really a standards topic?
Application scenarios [please discuss]:
*[[The_BioBricks_Foundation:Standards/Technical/E.coli promoter standard]]
 
* data exchange between local / central part registries
<sub>
Example: "We have a local registry and want to publish the finished Biobricks to the MIT registry."
See [[ http://brickit.wiki.sourceforge.net/ | BrickIt project]] for an example local registry system.
</sub>
 
* download biobrick data into local computer programs
<sub>
Example: "We want to simulate the behavior of device X and Y with the GePasy program." or "We want to develop bio-circuit design programs."
</sub>
 
* find suitable parts
<sub>
Example: "I need a 10-fold PoPs amplifier (input range 0 - 8 PoPs) that works in S. cerevisiae at 25 C temperature; response time doesn't matter but protein production load needs to stay below 100000 AA consumed; Sub-components must not interfere with the MAPK pathway [enter reactions]."
</sub>
 
* distributed annotation of Biobricks
<sub>
Example: "We have measured the toxicity of 1000 BioBricks from MIT and two other registries. Can we cross-link this data with the registy?"
</sub>
 
== What is a Biobrick? ==
 
=== Definition ===
 
A final definition is beyond the scope of this group. For data exchange purposes we adopt the following draft:
 
* BioBricks™ are standard DNA parts that encode basic biological functions. [http://www.biobricks.org/ see BBF home]
* A BioBrick has a unique DNA sequence.
* Basic parts are defined by this DNA sequence.
* Composite parts are defined as "sequence" of Basic BioBricks, along with intervening "scar" sequences.
 
=== Issue: BioBrick formats ===
 
(Raik)
You can have the "same" Biobrick in different formats, e.g. with prefix/suffix from one of the two suggested protein fusion formats. Now the sequence is exactly the same, but having a sample of biobrick X with biofusion flanks may be of no use if the other biobricks in you freezer are formatted differently. *Does a different prefix / suffix create a different biobrick?* To the assembling experimentalist in the lab it does; to the user of gene synthesis it doesn't really; the system designer or analyst couldn't care less...
 
=== Issue: closely related BioBricks ===
 
(Mac)
should there be a one-to-one relationship between a part 's functional definition and its sequence?  What if you introduce a silent mutation into a BioBrick - is there a "different sequence, different part" doctrine, even if the two are functionally equivalent? ... Is this a source code vs. compiled code issue?
 
(Raik)
We right now seem to follow the unspoken rule that a part is defined by its exact DNA sequence. Any modification creates a new part, which is kind of logical to the experimentalist because it maps a biobrick to exactly one DNA fragment (which you either have in your freezer or not) and vice versa. Options:
 
* keep/fix the sequence-based definition but introduce relations like "ortholog to", "equivalent to", etc.
* define "reference biobricks" and link variants to them
* find a more abstract definition ... and create the concept of BB 'implementation' or 'instance'.
 
(Mac)
Perhaps we could do both?  Assuming a biobrick always has one and only one dna sequence, perhaps we could build the data model to support organizing biobricks into families or sets of functionally related parts?  Each family could have one canonical biobrick associated with it that works, is available, and exemplifies the function that the family is supposed to have.
 
== What is the data model needed to describe a biobrick? ==
 
Following Ralph's and Barry's mails, Raik suggests to split this into the following sub-topics (re-organize at leisure).
 
=== minimal Biobrick information ===
 
The set of minimal information aims to (1) uniquely identify a biobrick, (2) provide sufficient detail for its application and handling in the lab and during assembly, (3) describe its origin/source and references for human study.
 
* unique ID
* DNA sequence / basic building blocks
* format ??? (see issue above)
* short description for humans
* long description for humans
* target chassis
* "collaborating"/complementing biobricks if any
* feature annotation
 
* experience flag
* ? bug tracker ?
* ? version / supersedes / history ?
 
* source genebank ID if applicable (with position?)
* source organism
* source lab/person
* references (web / literature)
 
=== Biobrick classification ===
 
Categorization and anything that helps (1) fishing this part out of the registry and (2) deciding what extra information may be needed.
 
==== Intrinsic Classification ====
 
Intrinsic classification covers those aspects of Biobrick classification which are defined by the Biobricks themselves.  For these the primary focus is defining the vocabularies used to describe Biobricks to the outside world.  Broadly speaking, this can include:
 
* Identifiers
* Biobrick taxonomy: defining types or species of Biobricks based on composition, function, etc.
 
Possible intrinsic classifiers include:
 
* DNA category: [ AA coding, RNA-coding[m-/t-/nc-/mi-/si-], regulatory [promoter,rbs,terminator,enhancer], unknown, ...]
 
==== Extrinsic Classification ====
 
Extrinsic classification refers to those aspects of Biobrick classification which are attributed to Biobricks from external sources or references.  The focus is defining the vocabularies for those aspects of the outside world which are related to biobricks.
 
* Functional Performance Parameters
* Function...
 
=== Characterization ===
 
Quantitative data about the part, important for design and implementation of devices containing it,
 
A) independent of the parts category:
 
* genetic stability
* ?
 
B) depending on the parts category.
 
* Static device behavior
* Dynamic device behavior
* Device compatibility (with other devices, environmental conditions etc.)
* Device interactions (including quantitative data)
* Device reliability (RNA half-life, protein half-life)
* Power requirements of the device
 
=== Further annotation ===
 
* Higher level descriptions for automated design & simulation?
* references to High-throughput data ?
* references to outside, non-standardized information about this part
 
== What is the best format / technology / architecture for exchange? ==
 
Tying ourselves to a format too early will make us not have a clear model in mind, and will cause us to hack up the format.  It is best to do model, then format.
 
Once the data model is firmly in place, the format should follow as the one that best implements that data model.  For example, if we settle on an RDF-like 'everything is a relationship triplet' approach, then some format that can handle these triplets would be most appropriate.  In addition, with a model like this, there are XML-based and more human-readable formats that can both implement the model equally well.
 
A few ideas about possible architectures are below; I am sure there are others.
 
=== Context ===
 
For reference, I'm considering a piece of web-accessible software, like the MIT Registry or BrickIt, that has BB data in some sort of persistence layer (be it a relational DB, an object DB,  an XML store, a hash store like CouchDB/SimpleDB, or a triple store), offers a human-facing UI, and a programmatic interface for 3rd party software integration that allows *read/write access* with authentication and authorization rules.
 
=== XML/DB backend, REST API ===
If we end up storing BB descriptor documents in XML of a custom schema (like CellML/SBML), or a relational database (like BrickIt does), and want the tools that store such data to expose a programmatic API, I believe that a RESTful architecture might have some advantages.  In particular, REST is a simpler approach to data access than SOAP; REST is easy to work with since it's simply HTTP, and software support is plentiful.
 
* [http://en.wikipedia.org/wiki/REST Wikipedia article]
* [http://www.xfront.com/REST-Web-Services.html Introduction to REST Web Services]
* [http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm Ch 05 of Fielding's thesis (theory behind REST)]
 
Note that this approach involves a layer of abstraction over the persistence layer.  The disadvantage is, compared to offering a straight up SQL/etc interface, is the additional step necessary to write the layer.  However, you'll have to design a layer of abstraction anyhow for the UI (such as a web application serving HTML) and frameworks such as Django and Rails can make it easy to expose alternative content types (XML, JSON) in parallel with your human-consumable HTML data views.
 
* [http://jamesgolick.com/resource_controller Rails resource_controller plugin]
* [http://code.google.com/p/django-rest-interface/ Django rest interface]
 
The advantage is that you get to decouple the internal representation from the public API.  This allows you to modify your underlying data store (database, schema, etc.) and not break the interface that your clients are using.  It also allows your application to perform data validation, and allows you to write that in the higher-level language of your application rather than in SQL triggers/keys.  Also, you do not have to repeat this validation logic across both your application and in the database.  It also affords you more power in the authentication/authorization department than simple database logins.  This approach (doing validation/auth in the application later) is that of an [http://martinfowler.com/bliki/ApplicationDatabase.html Application Database] and essentially precludes you from offering a raw SQL interface.
 
=== Triple backend, SPARQL/SPARUL API ===
If, on the other hand, we elect a triple-based storage format, query languages such as SPARQL and SPARQL/Update (aka SPARUL) offer great power.
 
* [http://www.slideshare.net/fabien_gandon/sparql-in-a-nutshell SPARQL in a nutshell] - presentation
* [http://jena.hpl.hp.com/~afs/SPARQL-Update.html SPARQL/Update]
 
Note that, with this approach, the tool could expose the underlying RDF as a SPARQL/SPARUL endpoint, and both the application's web interface and the API interface could work against that.  The point here is that triples are likely flexible enough to withstand a "schema change"  and providing a SPARQL-adhering endpoint is a layer of abstraction that allows you to swap out the underlying triple store if necessary .  I am not sure how authentication/authorization and data validation happen in this scenario, as I am less familiar with it.
 
For rolling up your sleeves and hacking around, you might like to check out object/RDF modeling libraries such as:
* [http://www.activerdf.org/ ActiveRDF] (Ruby)
* [http://oort.to/ Oort] (Python)
* [http://arc.semsol.org/ Arc] (PHP)
 
The following articles contain a good deal of discussion on the topic of building web applications for the semantic web:
* [http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_survey.html Using RDF on the Web: A Survey]
* [http://thefigtrees.net/lee/blog/2007/01/using_rdf_on_the_web_a_vision.html Using RDF on the Web: A Vision]
 
== Suggestions ==
Please fill in these sections with details
 
=== create a new XML format ===
 
=== adapt existing CellML, SBML XML formats ===
 
=== create a custom file format ===


=== use Turtle/N3 notation for semantic web documents ===
=Possible New Technical Standards Projects=  
==== Example of N3 ====
Comprehensive list of standards topics to consider working on (feel free to organize or add to this list).  If you start working on a topic please move it to the list above, and start a new page, as appropriate:
I somewhat share the reservation about completely new file formats, but the
*standard assembly for DNA, RNA, protein and parts (physical composition)
non-readability and general nastiness of XML is also an issue. A good solution,
*standard formats (same as above?)
IMO, would be to use the [http://en.wikipedia.org/wiki/Notation_3 Notation3 format] developed
*DNA synthesis & construction standards
by the semantic web folks. It is concise, human-readable and editable (i used it
*standards for sharing information about parts via computer networks
myself some years ago) *AND* is equivalent to XML. That means there is a well
*standards for defining parts / functional nomenclature
defined translation back and for and many libraries and tools do the conversion.
*standard ontology for parts (avoid reinventing the world, if appropriate)
Being semantic web, it also solves the linking problem (everything is a link).
*standards for defining function / behavior of parts (i.e., quantitative models)
*standards for defining qualitative functional descriptions (i.e., works, does not work)
*standards for support for searching (people-based, CAD-based)
*functional composition standards (e.g., signal carriers, signal levels, device timing)
*compatability / conflict standards (quantitative, degree)
*standards for the cellular environment, context, chassis (e.g., genotype, culture conditions)
*characterization standards (for experimental measurements, and reporting of experiments, e.g., promoters, RBSs, and many more)
*standards for defining the source of parts, uses of parts
*coordination with other standards setting organizations
*safety standards
*graphical / visual depiction standards
*standards for noting the current ownership status of a part (e.g., free)
*abstraction standards
*release status standards
*identification / signature standards


Quick Example:
=BBF4 lowhanging-fruit RFCs=
BBF4 technical standards meeting - "lowhanging fruit RFCs" blackboard image:


<pre>
'''Potentially low-hanging "fruit" that could be turned into RFCs using [http://openwetware.org/images/b/ba/BBFRFC0.pdf BBF:RFC0]''':
# shortcut definition for frequently used ressources ...
* DNA Sewing - Kim de Mora
@prefix rdf: &lt;http://www.w3.org/1999/02/22-rdf-syntax-ns#&gt;.
* Strain Descriptions - Tom Knight
@prefix bbf: <http://biobricks.org/ontology/1.1/>.
* [[The BioBricks Foundation:Standards/Technical/Formats#Tom_Knight.27s_BBb_proposal| BBa-v2 Standard Assembly]] - Tom Knight
@prefix harvard: <http://harvard.edu/registry/parts#>.
* [[The BioBricks Foundation:Standards/Technical/Formats#The_Berkeley_.28BBb.29_Format| BBb Standard Assembly]] - Chris Anderson
* What is the Nature of a Part - Kristian Müller
* Plasmid Naming - Reshma Shetty
* PoBoL- Michal Galdzicki (2009), now see SBOL http://www.sbolstandard.org
* Promoter Measurement - Jason Kelly
* [[The_BioBricks_Foundation:Standards/Technical/Synthetic Biology Graphical Notation | Synthetic Biology Graphical Notation]] - Mac Cowell


# define a biobrick hosted at this address
[[Image:BBF-4-lowhanging_fruit.jpg| 480px]]
:BBa_0001
      rdf:type        bbf:biobrick;
      bbf:sequence    "AAACCCGGG";
      bbf:similarTo  [:BBa_0003, harvard:BBa_J1000, :BBa_00010].


# add information to a biobrick defined elsewhere
=BBF Technical Standards Setting Process=
harvard:HBB_J1000
How are BBF technical standards defined?  Good question!  Here's the current version (v1) of the BBF Open Standards Setting Scheme.  Send comments to endy@mit.edu
      rdf:sameAs      :BBa_0001.
</pre>


OK, one can argue about human-readability but it's at least possible to
#You develop some scheme for standardizing some aspect of synthetic biology work. 
understand and edit these documents (and much better than the equivalent xml).
#You convince at least one other person, at a different location from you, that the scheme would help them with something that they care about.
#You each demonstrate that the proposed standard works for each of you (i.e., the standard must work and be good for something). 
#You document your scheme in writing.
#You request a BBF RFC number by asking for one (email the list)
#The BBF technical standards group (i.e., the folks on this list) comment on the standard, try it out, propose revisions.
#You revise the standard if necessary.
#The standard is formally accepted as part of the definition for BioBrick&trade; parts.  Congratulations, you win (publishers are standing by), the BioBricks&trade; technical standards suite is updated.
#New, possible standards tremble before you! Goto 1.

Latest revision as of 09:39, 21 March 2012

BBF Technical Standards Mailing List

Join or check out the BBF technical standards mailing list and archives here!

BBF Technical Standards RFC Process

BBF RFCs <-- read and learn how to contribute a BBF RFC here!

if the above link is broken see: http://dspace.mit.edu/handle/1721.1/43714/browse?type=dateissued [1]

Active Technical Standards Projects

Possible New Technical Standards Projects

Comprehensive list of standards topics to consider working on (feel free to organize or add to this list). If you start working on a topic please move it to the list above, and start a new page, as appropriate:

  • standard assembly for DNA, RNA, protein and parts (physical composition)
  • standard formats (same as above?)
  • DNA synthesis & construction standards
  • standards for sharing information about parts via computer networks
  • standards for defining parts / functional nomenclature
  • standard ontology for parts (avoid reinventing the world, if appropriate)
  • standards for defining function / behavior of parts (i.e., quantitative models)
  • standards for defining qualitative functional descriptions (i.e., works, does not work)
  • standards for support for searching (people-based, CAD-based)
  • functional composition standards (e.g., signal carriers, signal levels, device timing)
  • compatability / conflict standards (quantitative, degree)
  • standards for the cellular environment, context, chassis (e.g., genotype, culture conditions)
  • characterization standards (for experimental measurements, and reporting of experiments, e.g., promoters, RBSs, and many more)
  • standards for defining the source of parts, uses of parts
  • coordination with other standards setting organizations
  • safety standards
  • graphical / visual depiction standards
  • standards for noting the current ownership status of a part (e.g., free)
  • abstraction standards
  • release status standards
  • identification / signature standards

BBF4 lowhanging-fruit RFCs

BBF4 technical standards meeting - "lowhanging fruit RFCs" blackboard image:

Potentially low-hanging "fruit" that could be turned into RFCs using BBF:RFC0:

BBF Technical Standards Setting Process

How are BBF technical standards defined? Good question! Here's the current version (v1) of the BBF Open Standards Setting Scheme. Send comments to endy@mit.edu

  1. You develop some scheme for standardizing some aspect of synthetic biology work.
  2. You convince at least one other person, at a different location from you, that the scheme would help them with something that they care about.
  3. You each demonstrate that the proposed standard works for each of you (i.e., the standard must work and be good for something).
  4. You document your scheme in writing.
  5. You request a BBF RFC number by asking for one (email the list)
  6. The BBF technical standards group (i.e., the folks on this list) comment on the standard, try it out, propose revisions.
  7. You revise the standard if necessary.
  8. The standard is formally accepted as part of the definition for BioBrick™ parts. Congratulations, you win (publishers are standing by), the BioBricks™ technical standards suite is updated.
  9. New, possible standards tremble before you! Goto 1.