Open Source Research
Open Source Research (OSR) adopts the following basic rules (first written down here):
- First law: All data are open and all ideas are shared
- Second Law: Anyone can take part at any level
- Third Law: There will be no patents
- Fourth Law: Suggestions are the best form of criticism
- Fifth Law: Public discussion is much more valuable than private email
- Sixth Law: An open project is bigger than, and is not owned by, any given lab.
This wiki gathers resources for open research aimed at finding new medicines for diseases. A wiki is intended for project status and notes - the actual collaboration occurs on other pages. In the case of the malaria project, for example, these may be found here.
History and Context
The Start: Open Source Software Development
Open source as a term in software development implies a project is open to anyone, and the final product emerges from a distributed team of participants. There may be a funded kernel of work, but the subsequent development by the community is not explicitly funded. There are many examples of high quality, robust and widely used applications that have been developed using the open source model, such as the Firefox and Chrome web browsers, the Linux operating system and the Apache web server. It's important to appreciate the commercial significance of such products. There are thriving open source software development communities on the web at, for example, Sourceforge and GitHub. Central to the operation of these sites and projects is the sharing of data and ideas in near-real time.
A Note on Open Access
"Open Access" refers to a scientific paper that is free to read, rather than behind a paywall. While this is an important issue, and is absolutely required of any publications arising from open source projects, open access needs to be distinguished from open research. The former describes a mechanism of publishing work that is complete. The latter describes a way for humans to work together.
Stage 2: Open Data
Many valuable initiatives advocating open data have emerged in which large datasets are deposited to assist groups of researchers (e.g., Pubchem, ChEMBL and SAGE Bionetworks); the release of malaria data in 2010 falls into this class. These very important ventures employ the internet as an information resource, rather than as a means for active collaboration. For people to work together on the web, data must be freely available. Yet the posting of open data is only a necessary and not sufficient condition for open research. Open data may be used without a requirement to work with anyone. The GSK malaria data, for example, may be browsed and used by people engaged in closed, proprietary research projects - there is no obligation to enagage in an open research project.
An important feature of open data is that it maximises re-use (or should be released in a way that permits re-use). Essentially the generator of data should avoid making assumptions about what data are good for. The data acquired by the Hubble space telescope has led to more publications by teams analysing the data than from the original teams that acquired the data.
The Panton Principles describe important recommendations for releasing data into the open.
A Note on Open Innovation
As an effort to stimulate innovation, several companies have adopted an "open innovation" model. This is a somewhat nebulous term that means companies must try to bring in the best external ideas to complement in-house research.NRDD Article The mechanisms of bringing in new ideas are:
- Prizes for solutions to problems (e.g., Innocentive). A competition means that teams work in isolation and do not pool ideas. Such a mechanism does not change the nature of the research, rather the motivation to participate. The pharmaceutical industry itself essentially already operates on this model.
- Licensing agreements with academic groups/start-ups (e.g., Eli Lilly’s PD2 program). In such arrangements, companies may purchase the rights to promising ideas. Vigilance of intellectual property may of course shut down any open collaboration at a promising stage. It has therefore been proposed to limit open innovation science to “pre-competitive areas” (e.g., toxicology) but to date the industry has been unable to define what the term “pre-competitive” means beyond the avoidance of duplication of effort and the requirement for public-domain information resources.NRDD article
For more on this distinction see Will Spooner's article.
Open Source Research describes a way of working that is fundamentally different from open innovation, since when something is open source everything is shared. This is not the case in open innovation, where teams are free to operate in secret.
The use of a widely distributed set of participants to accelerate a project is a strategy that has been widely employed in many areas. The writing of the Oxford English Dictionary made use of volunteers to identify the first uses, or best examples of the use, of words. Pioneering work on distribution of computing power required on science projects (where the science itself was not necessarily an open activity) was achieved with the SETI@Home and Folding@Home projects.
With the rise of the web, several highly successful crowdsourcing experiments have emerged in which tasks are distributed to thousands of human participants, such as the Foldit and Galaxyzoo projects. What is notable about such cases is the speed with which the science progresses through the harnessing of what has been termed the “cognitive surplus”.
Open science is the application of open source methods to science. Thus data must be released as they are acquired, and it must be possible for any reader of the data to have an impact on the project. There should be a minimisation of groups working on parts of the project in isolation and only periodically releasing data - ideally complete data release and collaboration happen in real time, to prevent duplication of effort, and to maximise useful interaction between participants.
Though there is no formal line to distinguish crowdsourced projects from open science projects, it could be argued that open science projects are mutable at every level. For example, while anyone could participate in the original Galaxyzoo project, the software, and the basic project methodology, were not open to change by those who participated. On the other hand in the Polymath project, while there was a question to answer at the outset, the direction the project took could be influenced by anyone, depending on how the project went. In the Synaptic Leap discovery of a chemical synthesis of a drug, the eventual solution was influenced by project participants as it proceeded.
Open Source Drug Discovery
Drug discovery is a complex process involving many different stages. Compounds are discovered as having some biological activity, and these are then improved through iterative chemical synthesis and biological evaluation. Compounds that appear to be promising are assessed for their behaviour and toxicity in biological systems. The move to evaluation in humans is the clinical trial phase, and there are regulatory phases after that, as well as the need to create the relevant molecule on a large scale.
Since no drug has ever been discovered using an open source approach it is difficult to be certain about how OSDD would work. However it seems likely that the biggest impact of the open approach would be in the early phases before clinical trials have commenced. Open methods could also have an impact on the process chemistry phase, in creating an efficient chemical synthesis on a large scale.
Open work cannot be patented, since there can be no delays to release of data, and no partial buy-ins. If a group opts out of the project to pursue a "fork", they leave the project. Open source drug discovery must operate without patents. The hypothesis is that through working in an open mode, research and development costs are reduced, and research is accelerated. This offsets the lack of capital support for the project. Costs of clinical trials and product registration would have to be sourced from governments and NGOs. Whether this is possible is one of the central questions of OSDD.
Philosophy of Open Research
Why Take Part?
What of motivations? Why would people want to contribute to this project? Partly to solve a problem. Partly to be involved with quality science that is open, and hence subject to the most brutal form of ongoing peer-review. Partly for academic credentials since regular peer-reviewed papers will come from the project. Partly to demonstrate competence publicly - open science is meritocratic and status-blind. Perhaps a mixture of all these things.
A competition is possible in the future, i.e. with a cash prize. Progress towards a very promising lead compound series has been rapid, but there is a long road to a compound that looks sufficiently promising that it moves towards clinical trials. There's a lot of tweaking needed, and perhaps even the move to another series. It is not obvious what will happen. It is certain the project will need a lot more input than it has received to date. A prize may increase traffic and input. The competition would be teamless, however, awarded based on performance of individuals within a group where everything is shared. Not sharing data or ideas leads to disqualification. Such a competition is difficult to judge, difficult to award, and hence almost certainly worth doing. More about this is here.
A final point - the project is open. Nobody owns it. Those people most active in the project lead it while they are active. If you wish to contribute, in any capacity, please do so. There is no need to "clear" anything with existing project members by email first. To date is has been very common for current participants to receive questions/suggestions from people by email, which is to be discouraged. In the development of Linux, the need for Linus Torvalds to approve everything caused a serious bottleneck, and the observation that "Linus doesn't scale". Nobody scales, but the team does. So it's more efficient if all the project discussions are held publicly. Many people do not like this idea. In science the idea of "beta testing" something is alien. When data are released in science there is an expectation that the data are correct, and essentially finished. This project eschews this view. All data are released immediately, all discussions are public, anyone can participate.
Logistics of Open Research
The way the project is run is one of the novelties, though as with everything in this project nothing is static and advice is always welcome on improvements. Raw experimental data are recorded in an online, openly-readable electronic lab notebook. The Synaptic Leap is being used to discuss ideas and results, as well as plan future work. The project's Google+ page is a light way to keep up with developments and discuss. The project's Twitter feed is a broadcast mechanism for updates. LinkedIn as used in the past on another project as a way of connecting with relevant experts, but has not been used much so far in this project. A wiki (that includes this page) is used to host the current overall project status. Updates on the project's progress can also be found at our Facebook page, and this also a place for interaction. If you wish to participate in this project, you can sign up to all these sites, and you would then be sent the Twitter/G+ passwords so you can used the same accounts.
Licences are essential in open source projects, to avoid any misunderstanding. An appropriate default licence for open research is CC-BY-3.0: any results are both academically and commercially exploitable by whoever wishes to do so, provided the project is cited. This allows for full commercial benefit from open research while maintaining well-worn standards of giving credit where credit is due.