User:Vincent Rouilly/Distributed Annotation System (DAS) for DNA Part Registries
From OpenWetWare
Jump to navigationJump to search
Distributed Annotation System for DNA Part Registries
Vincent 05:55, 28 August 2009 (EDT): This is a work in progress. If you are interested to contribute, or if you want some more info, please feel free to contact me.
Overview
- ...
- ...
Objectives
- ...
- ...
DNA Part DAS Server
- Server address:
- Typical queries:
- retrieve all parts
- retrieve all supported annotation types
- retrieve DNA from a given part
- retrieve all annotation from a given part
- retrieve subparts from a given part
- retrieve superparts from a given part
Software architecture
Implementation Steps
We summarise here the different steps undertaken during this project.
Run Dazzle on the Google App Engine (GAE)
- Dazzle is a Java application that usually runs on a Tomcat server. However, GAE support Java applications, and no tweaking is necessary to run Dazzle on GAE.
- Instructions@BioJava
Implement a BioSQL subset on top of the Google datastore
- BioSQL is a popular relational database model to store DNA sequences and annotations.
- BioPython, BioJava, and BioPerl projects provide easy connectivity to the schema.
- Google datastore is not a relational database. BioSQL schema has to be reformated into a more object oriented data model.
- Only a BioSQL subset was considered for this project. Below is listed the implemented BioSQL tables:
- Ontology and Term
- Biodatabase, Bioentry, Biosequence, Bioentry_Qualifier_Value, Seqfeature, Location
Implement a Dazzle plugin to support BioSQL/datastore queries
- You can find here instructions about how to write a new Dazzle plugin.
- The new plugin implements the following methods:
- ...
- ...
Process and Upload data from MIT Part Registry to Google App Engine (GAE)
- The MIT Part Registry implements a limited API to access its data:
- limited FASTA description of parts (part dump in FASTA)
- limited DAS description of parts (no assembly information for example)
- A Biopython script was used to process the FASTA dump file to generate GAE Upload files. Below is the BioBrick information that was processed:
- BioBrick Sequence
- BioBrick Author
- BioBrick Category
- BioBrick DNA Status
- BioBrick Short Description
- BioBrick Assembly information (subpart + superparts from BLAST queries within Biopython script)
Project resources
- DAS standard and its current specifications (v.1.53)
- Dazzle DAS server
- BioSQL schema
- BioPython and BioJava
- Google App Engine documentation
- BioSQL on GAE from Brad Chapman, see his blog post.