This provides an outline of the technical and development operations for the Open Source Malaria (OSM) project.
This document is intended to provide an outline of the technical and development operations for the Open Source Malaria (OSM) project. It also includes some related information about social media accounts.
The main landing page for the project can be found here. The project activity is pulled directly from other sites.
A guide to getting started as a contributor can be found here. The various platforms used are also summarised below.
Molecules already entered into ChEMBL may be browsed on ChEMBL's page for the project.
The source for the landing page is also available if needed; the pulling activity uses Ruby/Sinatra.
Users use the open source lab notebook Labtrove (previously Lablog) a PHP web application developed by the University of Southampton. Currently the primary malaria blogs run on malaria.ourexperiment.org on a Debian server at the University of Southampton.
- Write a General Entry
- Write the Perfect Chemistry Entry
- Write the Perfect Biology Entry
- Use the ELN to the Maximum Effect
- The OSM Compound Registration System - Related: How to Give Compounds MMV Numbers
How best to visualise the molecules in OSM has been discussed several times (Github issues 128, 112 and 99). Current protocol is to develop a means to upload to ChEMBL automatically so the molecules appear on the open drug discovery page. The central repository of the molecules is the SD file, but the repository of details for each OSM compound may be found in the Experimental Procedures page. A possible solution is DataWarrior.
More than one cheminformatics string for each molecule in OSM should be included to ensure some redundancy in searches, e.g. to get over any issues arising from implicit vs explicit H and tautomers (GHI 230 and this post)
Larger question of how to manage the data (to e.g. construct the SDF) is below.
What is the best way to collect OSM's data together into a single place where we can browse all the molecules and their properties?
The best way to do this is probably to construct an SD file (SDF). This would allow other software to interrogate/display the data (we've been talking with ChEMBL about automatic imports to their database, but have no solution as yet. Earlier discussion: GHIs 128, 127 and 99
So: How do we make the SDF? How can we ensure the SDF remains up to date? Here are the three approaches.
1) Manually Write the SDF
This is what we've been doing. It's not working.
At the moment people enter information into the ELN in human-readable form. That's nice for the experimentalists. The data are not particularly machine-readable, and people do not reliably enter metadata. So coming to OSM with a question like "Has anyone ever made molecule X, and if so how many attempts were made?" is impossible to answer well. Links between synthetic entries and biological entries are poor or non-existent.
To remedy this we started making an Improvised Compound Registration System - a collection of pages, each of which collates the information about each molecule made in the consortium. This involves manually linking every experiment with the relevant molecule page. The result is a fantastic resource (example for OSM-S-5). The problem is that it takes a huge amount of time to assemble such pages, to the extent that this system is probably unsustainable.
2) Automatically Write the SDF
We could write results/ELN/wiki more carefully, and then automatically scrape together the SDF. Probably won't work.
OSM contributors are pretty good about including cheminformatic strings (SMILES, InChI, InChIKey) in ELN entries to allow machines to understand which molecules are being discussed on a given page. But many people do not do this, partly because it's labour-intensive. We often forget. The strings themselves contain no other relational information (e.g. "this molecule has a potency of X"), meaning even with a way to build the SDF from the strings we'd still not necessarily build a good SDF, without some pretty serious addition of metadata. The SDF, when made, can contain all the data for a given molecule. We could manually collect the data on a wiki page (e.g. here) and then write a script that creates the SDF. But the effort involved, and the risk of people doing this wrongly or badly, is significant.
3) Construct Something Else that Writes the SDF
We are not yet doing this, but perhaps we should.
Bill Mills from Mozilla Science Lab suggested an interesting solution. How about we (OSM experimental contributors) input data using a system like an online form. We input data on the molecules we're using, what we've done to make those molecules and any and all properties that pertain to those molecules. This creates a relational database that is perfectly machine readable.
The system then both i) writes ELN entries on the fly, and ii) writes the SDF.
This is neat on multiple levels. We should build a prototype "form" and test it for the creation of a typical ELN entry, i.e. whether we can auto-generate a good ELN entry without writing it ourselves. This could be a very powerful approach.
For random jobs where we require hosting/a bit of compute the tendency is to use Nectar, a cloud based provider for academic and research institutions in Australia. It provides two free instances to researchers with reasonable enough specs that they can be used for most jobs. Debian or Ubuntu is typically the flavour of choice, but Nectar provides a wide range of images and snapshots including versions of Scientific Linux. For jobs which may require significantly more processing we may rely instead upon EC2 instances.
There are several different means used for communication, with email being the least favoured (due to a lack of openness).
The primary means of communicating issues requiring action/input (admininstration, science or technical) is on Github.
Publicity is important for the project to attract new inputs. Google ranking was assessed and could be improved (GHI 231) and a meeting was held to address a number of website-related issues. Related: GHI 64
Github is used for project management - a place to keep the To Do list. Tasks are called "Issues" and may be assigned a person responsible, a deadline and some tags to allow active items to be grouped. When a task is complete, it can be closed.
Almost all code and data for the OpenSourceMalaria organisation account (and landing page website) is resident on one of the Github repositories. The main .sd file of all compounds, for example, is kept there. All other experimental data will be on the electronic lab notebook, or summarised on the wiki.
If you still are unable to find something, post an issue on the Github Issues (to do) list and tag it with "Administration" and "question".
Online meetings use Adobe Connect provided and hosted by the University of Sydney. As with everything else, these meetings are open to everyone and each meeting is recorded and subsequently uploaded to the OSM youtube account.
- Mike Robins 20:00, 26 October 2013 (EDT):