User:Ilya/Registry

From OpenWetWare
Revision as of 16:39, 3 May 2006 by Ilya (talk | contribs)
Jump to navigationJump to search

Home        About        Conferences        Labs        Courses        Resources        FAQ       

Data or Metadata

(from LSID best practices) Data is defined as a sequence of unchanging bytes. Examples of data are microscope images, a protein sequence, a text file, etc. Metadata is usually information that describes the data either literally (date created, MD5 check sum, size) or contains information describing the relationship between the data and other objects. If you cannot determine what should be data and what should be metadata from your data model, follow this rule of thumb: Large byte sequences are easier to manipulate as data, while short byte sequences can be included as data, metadata, or made available in both forms.

Abstraction Hierarchy

  • Part - simple biological function encoded in DNA
  • Device - simple logical function; collection of parts
  • System - collection of devices
  • Device is_a part in context of the system but also device has_a part.
  • Device is_a subclass of Part, System is_a subclass of Device
  • How to represent barriers and interfaces betwee levels of abstration?
  • Genetic, protein and cell devices
  • :RBS :subclassOf :BasicPart OR :RBS :typeOf :BasicPart (instance)
  • Basic parts: detailed specs and sequence data
  • Composite parts: basic parts plus assembly (composite parts have are the same if they have the same basic parts)
  • Device: not necessary on a single piece of DNA
  • Separate spaces: set of hierarchies
    • Physical (DNA sequence assembly)
    • Design
    • Standards (Performance)
  • Class of Standards: assembly standards, performance standards?

Current Design

Biobricks come in three flavors:

  • Parts/basic parts/subparts encode basic biological functions (RBS, CDS)
  • Devices/composite parts are made from a collection of parts and encode some human-defined functions, such as logic gates in electronic circuits) (inverter)
  • Systems perform tasks, such as counting (oscillator)

No need to specify deep_components vs component_list Right now: composite parts have only components listed; deep components produced from that list

Types: what type are Plasmid, Cell and T7?

Registry Parts Index

  • Basic parts - have its own sequence information
    • R Regulatory
    • B RBS
    • C CDS
    • B Terminator
    • RNA
    • F Signalling
    • E Reporter
    • M Tag
    • V Plasmid
    • V Cells
  • Composite parts - sequence is derived from their component parts
    • E Reporter
    • Q Inverter
    • Compsite (other)
    • I Project
    • J iGEM
    • G Generator
    • Measurement
    • T Temporary
    • S Intermediate (belongs to multiple types/categories)
    • Other (not classified)
  • A part is not allowed to contain both its own sequence and other parts
  • Subparts - ordered set

Normally, the part name contains the letter associated with the part's type. Confusion is possible when a part fits into multiple categories.

Part name/number - unique ID
BB a _ X nnnnnn
BB: BioBricks
a: alpha stage of development
X: part type
nnnnnn: 4-6 digit part number

  • Availability:
    • Planning
    • B Building
    • A Available
    • A Length OK
    • U Unavailable
    • U Cancelled
    • U Deleted
  • Usefulness:
    • None
    • ? Issues
    • W Works
    • X Fails
  • Sequence features
    • type
    • start
    • end
    • label
    • part_id

Part example (* marks properties that belong to composite, possible value(s) are in parenthesis)

  • name
  • short_description (Promoter (lacI regulated, lambda pL hybrid))
  • description
  • type (Regulatory)
  • status/availability (Available)
  • results/usefulness (None|Fails|Works)
  • component_list (NULL | BBa_B0032 BBa_C0051 BBa_B0010 BBa_B0012 BBa_R0063 BBa_B0030)*
  • base_components (0 | 9)*
  • deep_components (NULL | 149 156 603 145 193 147 161 603 145)*
  • deep_components_2 (own part_id | _149_156_603_145_193_147_161_603_145_)* ?
  • deep_component_count (1 | 9)*
  • device_name (NULL | inverter)*
  • sequence (why is sequence available for the composite parts)
  • feature(s)
    • type
    • start
    • stop
    • label
  • usage
    • lastmod_user
    • lastmod_date
  • biology (Very weak promoter)
  • functional parameters
    • efficiency 0.6
  • design
    • author (names(s) or id)
    • owner (number: owner_id)
    • creation_date
    • container_id
    • version
    • source (Bacteriophage 434 right operator)
    • notes
    • reference?
    • owning groups
  • physical DNA (instances?)
    • plasmid
    • plasmid_length
    • part_and_plasmid_length
    • VF2-VR
  • location(s) - This part may be found in these wells/tubes
    • library
    • well
    • plate
    • plasmid - this the same plasmid as in physical DNA section above?
    • cell
  • files
  • references
  • licenses

Miscellaneous

  • Semantics - the meaning that is implied by words and sentences.
  • Software agent can search distributed registries using an ontology. This is impossible right now because storage schema is unknown.
  • Data is represented by a graph of triples (statements about resources)
  • Syntax doesn't matter: there are many ways to serialize the data (XML, N3, etc).
  • Ontology vs taxonomy vs thesaurus vs list
  • Ontology vs Taxonomy vs Folksonomy vs Collabulary
    • Taxonomy - concepts and relationships but no attributes.
    • Controlled vocabulary - only concepts.
  • Microformats
    • "lowercase semantic web"
    • humans first, machines second
  • HCLS task forces:
    • BIORDF (Structured data to RDF) - Susie Stephen, Joanne Luciano co-leads
    • T2S (Text to Structured RDF) - Robert Futrelle, Matthew Cockerill
  • Architecture of the World Wide Web @ W3C
  • Reification @ Wikipedia
  • Metadata
    • Semantic mapper is tool or service that aids in the transformation of data elements from one namespace into another namespace.
    • Metadata registry s a central location in an organization where metadata definitions are stored and maintained in a controlled method.

To Do

  • Map parts database schema to RDF/OWL (D2R, other?)
  • Use LSID for parts identification
    • setup LSID resolution service
  • How to represent sequence features (do they belong to sequence or part)?
    • Part has features and has a sequence (piece of DNA with molecular function combined by BB assembly)
    • Sequence has features but a part already has sequence
  • Tools to create and edit ontology and RDF instances?
    • Protege from Stanford?
    • IsaViz from W3C?
  • existing RDBMS <-> RDF <-> objects (e.g., Javascript)
  • Do we need "Device"?
  • I want to build a NOR gate vs. I have a NOR gate

From XML to RDF

(from [1])

  • ?

Knowledge Management

Links

This site is hosted on OpenWetWare and can be edited by all members of the Synthetic Biology community.
Making life better, one part at a time.