OpenWetWare:Steering committee/NSF BDI Grant/Draft: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
No edit summary
No edit summary
 
(5 intermediate revisions by 3 users not shown)
Line 1: Line 1:
'''The working version is now stored on writelyAsk [[Sri Kosuri]] for access.''' A copy as of 05:55, 4 July 2006 (EDT) is [[Media:BDI v4.doc|here]]:
'''The grant has been submitted. Thanks to all of you that helped with grant and letter writing.  [[Media:OWWv17.pdf|Here]] is the submitted version of the Project Description.  More to come including references and front matter.'''


Please help expand/rewrite highlighted sections first.  We also need references to be filled in.  Finally, any small edits would be helpful at this point.


[[/Old Version|Old Version of Grant]]
[[/Old Version|Old Version of Grant]]
Below is an in progress version of the grant from [[User:Rshetty|RS]].  Reshma has included simply for the sake of adding references.  Please do not make edits to this version of the grant.  They will be ignored.
 
A. Overview and Structure of Proposal
B. Background and Significance
B.I Information Loss in Biology
The process of biological research generates and makes use off a wide variety of information, much of which is either inaccessible or unrecorded. For example, articles and conferences typically summarize completed projects and results, while biological databases store and share high quality experimental data. Much of the remaining information is either never recorded or never released from the laboratory. Furthermore, little to no information on unsuccessful research projects is disseminated (ref). Finally, existing mechanisms for storing and sharing information often do not share the information as rapidly as would be desired (ref.?).
Biological research could be advanced if all classes of biological information were stored and shared with the community in a timely manner. For example, detailed laboratory techniques, failed approaches, reasons for experimental selection could all be shared during the course of a project and stored for later reuse.  In the abscence of this information, the contexts, experiences, and methodologies of research are often lost.  We also lose the ability to learn from others' mistakes.  The lack of shared experiences with commonly used protocols prevents the avoidance of common pitfalls, interpretation of failed experiments, and slows the development of new, more reliable methods. Finally, laboratories often struggle to repeat the work of others due to insufficient information (ref). The problem of incomplete information storage and sharing has been highlighted by a set of large-scale studies of sharing in the field of genetics and other biological disciplines[2, 3, 4, 5]. Many types of information are withheld, from sequence information (28% of researcher surveyed), pertinent findings (25%), phenotypic information (22%), and information regarding laboratory techniques not used in publication (16%). The study also showed that most thought such withholding of information and materials “slowed the rate of progress in their field of science (73%) and had “adverse effects on their own research” (58%).
In the same studies, 80% of those researchers that withheld information responded that the effort to produce post-publication information or materials was too great.  First, much of the relevant information is not appropriate in the original publication because of space restraints, lack of direct importance to the results, and because too many details can detract from the publication's message.  The information subsequently becomes harder to share because there are neither strong incentives, nor an easy way to digitize, organize, and improve such information.  The effort to retrieve and organize the requested information, sometimes long after the original researcher has left the lab, is often too large for the laboratory to comply.  Furthermore, researchers that are sharing information are doing so offline, or in an ad-hoc manner that does not provide an efficient means of reuse of that effort nor make the information easily available to a wider community.  However, if most laboratory information were kept in an organized, archived, and sharable digital form, then compliance with such information requests would become much easier.
B.II Current Information Infrastructures
There are many existing infrastructures to digitally store and organize biological and laboratory information, however for a variety of reasons none provide a means to access this lost biological information.  First, there are many large databases for high quality information such as genome sequences and annotations, protein structure, molecular network connectivities, and large-scale experimental data sets.  These resources do not allow much growth beyond the existing information resource because they are very fixed in structure.  For example, there is no current means to link primary data, such as a sequencing run, to the final processed data available in such databases, such as a GenBank sequence.  In addition, individual communities cannot tailor these resources to fit their needs; they are fixed resources for all users.
Second, communities of biological researchers have constructed more informative databases tailored to their area of research.  Perhaps the best example is WormBase, a repository for information related to the C. elegans community.  WormBase contains data, relevant publications, researchers involved, and other information from large scale data sets, genetic screens, developmental observations, phenotypes and genotypes of strains, genome sequences, molecular biology results, etc.  In addition, there are powerful search tools that allow one to find relationships between these informational resources.  However, such a resource requires a tight-knit community and a mature research field in order to allow the allocation of resources to centrally collect, currate, and expand the resource.  Such an approach would be difficult to expand to nascent communities, and new types of collaborations and data collection.
Third, consortiums have recently developed standard formats for storing information.  Projects such as Gene Ontology and SBML provide standard ways to define relationships among biological information and systems level data sets, respectively.  These efforts allow diverse sets of groups to individually generate, share, and analyze data without the need for a central database to store and access such information.  However, these ontologies require large dedicated efforts on defining and expanding these standards.  This process, while important, does not allow quick and easy ways to define new relationships.  In addition, these standards require many dedicated tools in order to visualize, analyze, and share this information.  Thus, they share the same problems with the community base information management schemes in they require significant resources for new types of information storage.
Finally, individual laboratories often form their own ad-hoc informatic infrastructures to manage their data.  These range from collections of word processing documents to describe chemicals, protocols, and biological information to custom-built databases or websites to store such information.  This allows individuals or laboratories to store a wide variety of information.  However, this type of information management has many drawbacks.  There are few tools to organize and search this information.  In addition, because every individual chooses their own formats, there is little versioning and authoring data, the information is easily lost or outdated, and finally the information is difficult to share and subsequently mine.  These infrastructures also lead to repeated efforts at building the same sets of tools for each individual group.
Recently, many new tools have been introduced in information technology for collaborative information generation and organization (ref Web2.0).  For example, tens of thousands of users have generated an encyclopedia, Wikipedia, that organizes and stores encyclopedic information.  They have made over one million articles, and have generally been shown to be a useful informational source (ref Nature article). Wikipedia runs on software called a wiki that allows many people to easily generate, edit, and link between content simultaneously.  The ability to construct a webpage using HTML has been available from the start of the web.  However, it is a wiki's simplified syntax for editing, linking and generating information that has been a major force for the widespread adoption of wikis.  In addition, the wiki software allows many people to collaborate on informational resources quickly and efficiently.  More recently, a set of standard languages promulgated by the World Wide Web Consortium and Tim Berners-Lee, inventor of the World Wide Web, collectively known as “semantic web,” transforms the simple links between web pages into a machine-comprehensible structure. Semantic Web (SW) is at root a set of common standards to describe and name the relationships we contemplate and describe in text - "this gene is active in this disease, is related to this protein".  Built on the success of the web’s hyperlinks, these technologies extend the capabilities of the existing infrastructure by giving individuals the capability to assign greater meaning to digital resources.  Using SW gives some of the power of structured databases to information resources while allowing limitless and decentralized extendability.
 
B.III OpenWetWare
OpenWetWare is a wiki dedicated to capturing and curating the day-to-day knowledge of researchers at the bench that is otherwise lost in offline lab notebooks or shared only in small communities.  OpenWetWare started approximately one year ago in our lab as a tool to digitize, store, and share this lost information.  In order to evaluate the usefulness of this technology, we opened the site up to other labs at MIT, and then more broadly as an open invitation to the scientific community.  The rapid growth of the site (see section C) has clearly demonstrated the promise of utilizing wiki technology within scientific communities for organizing and sharing information.  We are now seeking funding to turn this successful experiment into a lasting resource for the scientific community.
We expect that the continued growth of OpenWetWare will significantly impact biological research.  First, as a result of easy and democratized contribution, the scope and pace of scientific communication will increase dramatically.  Already on OpenWetWare researchers have contributed many types of useful information that often cannot be found elsewhere, such as: up-to-date individual and laboratory research directions; protocols, notes and tricks, and expected results on hundreds of biological procedures; information on equipment operation, calibration, and control experiments; and community-generated information portals on particular fields.  Second, wikis allow detailed tracking of edits, providing new opportunities for evaluating scientific contribution, merit, and impact.  Third, OpenWetWare enables new opportunities for collaborations across institutional and geographical barriers.  The growth of several vibrant communities on OpenWetWare (see C.II.b) already suggests the potential in this area.  Fourth, the success of an MIT laboratory course on OpenWetWare demonstrates that biological course development, student involvement and feedback during courses, and reuse of educational materials will be greatly facilitated by wiki technology.  Finally, OpenWetWare will provide a new understanding for the process of scientific discovery and the current state of research for the funding agencies, ethics groups, and the general public.
C. Preliminary Results
C.I Overview
In a period of one year, OpenWetWare has grown from one laboratory at MIT to 60 and from a few users in our laboratory to over 1000.  This tremendous growth has been a function of OpenWetWare's usefulness to the biological community.  The ease with which users can generate, edit, and link between information, the accountability of all contributions made to the site, and the enhanced ability to communicate and collaborate between members have all made OpenWetWare a valuable tool for information storage and curation.  In addition, this sense of community has created a dedicated group of volunteers whose efforts to scale the site organization and technical infrastructure have allowed OpenWetWare to keep up with its growth.  In C.II, we will discuss how OpenWetWare has already been useful at generating and curating the types of information not commonly found elsewhere.  Section C.III will discuss the importance of communities on OpenWetWare and the technical infrastructure that has been built to support their growth.
C.II Information on OpenWetWare
As mentioned earlier, wikis do not provide any new technological infrastructure.  However, the wiki, and OpenWetWare, allow users to easily use these technologies for information management.  Thus far, users have generated many types of useful information that cannot be found elsewhere, such as: up-to-date individual and laboratory research directions; protocols, notes and tricks, and/or expected results on hundreds of biological procedures; laboratory notebooks detailing ongoing experiments; information on equipment operation, calibration, and control experiments; aggregated informational resources on strains and genotype information; collaborative project discussions and data; community generated information portals on particular fields; safety information and procedures; etc. In addition, professors at MIT and other institutions successfully developed and taught courses from sites developed on OpenWetWare, demonstrating the success of the wiki in an educational environment.  Here we will provide details on a few examples to illustrate the types of informational resources being developed.
C.II.a - Protocol, Equipment, and Biological Information Resources:
Users on the site have found it useful to post protocols and other information useful to research because it improves their ability to store, reuse, share, and improve this information. The protocol collection on OpenWetWare has hundreds of protocols in different areas of research.  Etiquette permits users to define protocols as editable by a restricted group.  In this case, these protocols provide a convenient storage and retrieval mechanism for the individual users, as well as providing the information to the general public (retrieved usually in the form of a web search).  These types of protocols can be quite detailed as there are no restrictions on space, and allow links to other informational resources.  For example, one popular protocol involves making better gels for electrophoretic protein separation.  Contributed by a user, the protocol contains information on background on how the protocol was developed (gleaning information from patent literature, which is linked), a summary of why these gels work better (better pH and buffering, better storage, ease of running), detailed protocols on how to run the gel, and pictures of gels run with the protocol for information on expected results.  Many other protocols exist, albeit at different levels of detail depending on the information contributed by the individual user.
However, information contribution is not limited to what are classically considered protocols.  Proper equipment usage, calibration and maintenance represent similar challenges to biological researchers, but information sources on these subjects are rare.  Some laboratories have begun posting such information on OpenWetWare, providing useful information that can be used by others, but otherwise would have been lost if not stored in an accessible format.  For example, we started a page to describe the location and who was in charge of our 96 well microplate reader.  This page became a convenient resource for posting information that otherwise would not have been digitized.  For example, the page now contains information on: scheduled user times; information on creating basic protocols and programs to run experiments; data from control experiments on detection limits, linear range, lamp energy, plate to plate variation, etc; tips such as countering evaporation and arrangement of samples; scripts in Matlab and Excel for data analysis; and a service history of the equipment along with major problems [8].  This has been repeated for equipment in other labs as well.  Such information is useful for the day to day work in modern biology.
As people continue to contribute, the combined information becomes a rich resource for information on method development, debugging, and collected knowledge.  A smaller group of users have begun to place a link or a copy of their protocols in the Shared Resources sections of OpenWetWare.  In these sections, different users will help contribute and collaborate on giving information on protocols and other information.  Three current trends of how users add such information are emerging.
First, users can aggregate data from different protocols on individual laboratories to provide a better information resource in general for the procedure.  For example, several labs posted protocols for DNA ligation using different methods [6]. Some members of those labs began a more general page describing the general steps of DNA ligations and links to individual laboratory protocols with a description of their differences. Finally, other individuals added tips, observations, and references to the general process of DNA ligation.
Second, users are beginning to give feedback on particular experiments where one user used an OpenWetWare protocol.  A researcher posted a particularly detailed protocol on a method for quantifying proteins using an improved ß-galactoside assay [7]. The protocol, when initially entered, contained much of the background information on how the protocol was developed, and additional tips and tricks to watch out for. Another researcher, who used the protocol from the wiki, then posted her general experiences with the protocol, and sample data demonstrating the repeatability, and general levels of output to expect on a standard control experiment.
Third, users are starting to use OpenWetWare as an area to store information that they use on a day to day basis.  This provides them a convenient place to store that information, but also allows other users to use, and help contribute to the information resource dynamically.  For example, a researcher began populating a page with Escherichia coli genotypes [9]. Later another user contributed explanations of the cryptic phenotype nomenclature allowing those outside the field to more easily understand the information on the page.  The page now has over 60 explanations of the nomenclature, information on over 40 commonly used E. coli strains, other information dealing with methylation and other common issues, links to other information sources, and references to particular papers of interest.  Collaborative aggregation of information relevant to a particular topic (in this case E. coli genotypes) is a key feature that OpenWetWare offers in terms of biological databases and informatics. Oftentimes such information aggregates don't initially warrant a structured database but yet are useful. OpenWetWare provides a convenient forum for this data.
Currently, there are some big challenges in making OpenWetWare a better resource for information storage and curation.  First, current methods on aggregating the collected protocols and equipment from individuals and laboratories, and allowing areas where people can extend and organize this information into a more useful resource are cumbersome because they rely on individual users to collect that data from all pages on OpenWetWare.  Automated methods of categorization and display of information would help highlight the types of information available to encourage collaboration.  In addition, the ability for researchers to more easily add structure and meaning to the informational resources created will begin to provide much more power to the information being stored.  We will address these specific issues in the research plan below.
C.II.b - Community portals: Laboratories and Fields of study
Online collaborations overcome many institutional and geographic barriers, removing some of the barriers to collaboration found using traditional approaches. By providing an easy to use and flexible method for developing online communities, OpenWetWare has already seen the growth of a number of scientific communities, ranging from individual laboratories to multi-institutional groups working in the same field.  Individual laboratory communities tend to share practical information such as protocols, details about equipment operations, and results of control experiments. Larger groups share information about community standards, research goals, and shared materials. The examples that follow demonstrate the utility of OpenWetWare for the rapid development of online scientific communities. Furthermore, they help confirm our hypothesis that such communities will be integral in encouraging researchers to contribute novel content to the site.
The most common community on OpenWetWare is the individual research lab, as the usefulness of OpenWetWare as a laboratory information management system (LIMS) is often the initial driver for a laboratory to join. Pam Silver’s group at Harvard Medical School is an excellent example of a lab that has integrated OpenWetWare into their day-to-day research. They use OpenWetWare to share information ranging from lab meeting schedules to details about reagents. As a result of this tight integration, users from the Silver lab alone have contributed more than 30 protocols to OpenWetWare. The tools made available by OpenWetWare enabled the Silver lab to rapidly create an online community site for their lab and to populate it with novel information that otherwise would not have been disseminated to the broader scientific community.
This scenario has been repeated with increasing frequency as more new labs join the site. In particular, work by community members to create a series of tutorial pages geared towards creating user and lab homepages led to a dramatic improvement in the comfort of new users in starting new communities on the wiki. The success in this area has inspired us to create similar tutorial pages in support of the other specific aims, as well as to expand the ease of use of the community development tutorials (see Section XX). Finally, these tutorial pages are also an example of why having a vibrant user community is so essential to the success of OpenWetWare or any project like it – the idea for the tutorial pages was a non-technical solution to the problem of becoming familiar with the wiki. The problem was best addressed by community members who had gone through the process themselves, rather than by a centralized authority, and we will elaborate on how we are more formally harnessing this community leadership in the project organization and logistics section.
Synthetic biology is a relatively new field generating much excitement within biology and engineering. The members of this nascent field have used OpenWetWare as an community organized information portal. The site, http://syntheticbiology.org, is dynamically generated from the community edited wiki page. OpenWetWare is used by researchers in the field for a number of purposes such as: dissemination of recent news relevant to the entire community, discussion pages about experimental protocol standardization, discussions on new research projects and efforts, individual and group research results, links to community resources, conference and job announcements, links to individual labs on OpenWetWare in the field, and much more. (ref) There are five labs that maintain this resource from 4 different institutions. OpenWetWare has provided a common online space for these labs to share information and collaborate more effectively, independent of institutional and geographical barriers. As a result of these labs choosing OpenWetWare to host this resource all information exchanged between groups is freely available to the larger scientific community.
C.II.c - Education
The advantages of OpenWetWare that make it useful in research have also made it a powerful medium for information management in the classroom.  There have been several spontaneous initiatives for teaching classes through OpenWetWare.  Putting these course materials online has provided easy collaboration for course development, easy methods for student involvement and interaction, and open sources for curriculum and educational reuse.
During the spring semester of 2006, MIT's Biological Engineering department taught a new undergraduate introductory lab techniques class titled Laboratory Fundamentals of Biological Engineering (20.109) [16].  The course was taught by a team of faculty (4), instructors (2), and teaching assistants (4).  Together they developed the course content, which ranged from background materials on the particular modules and experiments, detailed protocols, day to day laboratory instruction, and general information on the laboratory, safety, policies, presentations of their results, etc.  OpenWetWare's built-in methods for easy collaboration and content management provided a powerful tool in this process.  At the start of the course, students were given accounts on OpenWetWare and were allowed to edit the entire site.  The students quickly began improving content on the site by catching and correcting errors in course content [19], and uploaded experimental results based on the protocols.  Instructors were able to give real-time feedback on those results and improve course material for future teachings [ref].  Providing course content in a reusable form on OpenWetWare promotes sharing of educational ideas and materials within the community in a way that static course websites cannot.  Moreover, having 20.109 protocols and accompanying explanations available on OpenWetWare provides a rich resource for novices entering biological research (see accompanying letter from Mark Tatar).
C.III Technical and Community Infrastructures Serving OpenWetWare
Our experience with OpenWetWare has demonstrated the need for active, vibrant communities both to support the generation of new scientific content and to identify and address the needs of scientists on the site.  In particular, OpenWetWare has relied upon community-driven leadership, and we will describe how the organization of the leadership has evolved into a functioning steering committee.  Next, we will discuss how the committee has handled and encouraged the growth of the overall community.  Finally, we will discuss the informational and technical infrastructures that have been put in place in response to the changing needs of a rapidly growing community.
C.III.a Establishing Community-driven leadership 
The original leadership of OpenWetWare began with students in our laboratory interested in using a wiki to manage information and provide information to our collaborators.  These students allowed easy organization of laboratory information for us and our close collaborators.  As OpenWetWare grew to members and laboratories outside our community the workload and ability to make group decisions were severly diminished and required more work than possible from a couple of studnets.  Formal mechanisms were needed to determine how important decisions were made, how tasks were carried out, and getting more users that were willing to contribute to the overall maintenance of the site.  To this end, the OpenWetWare steering committee was formed in January, 2006.
Membership in the steering committee is on a volunteer basis, and there is a very fluid organizational structure.  One member serves as an organizer, setting an agenda for a monthly meeting.  Other members volunteer to spearhead particular projects, for instance, members have led small teams working on advertising, community development, software development, information management, and other needs as they have arisen.  The steering committee currently consists of 28 members from 12 institutions, and is largely made up of graduate students.  When consensus cannot be reached on a topic, decisions are made by majority vote of members present (or teleconferenced) to the meeting.  Since its inception, the steering committee has provided the overall vision, the division of labor, and the community building for the OpenWetWare project. 
Outside of the steering committee there are also certain users who are very active editors and contributors to the site (herein termed power users).  Power users often suggest new initiatives to the steering committee and pioneer placing new types of information on OpenWetWare.  Furthermore, they often help new users to become familiar with the site, and so play a role in user retention.  As our user base scales in size, it is essential that the number of power users scales with it, and to enable that we have actively encouraged the development of new power users by creating the Community Portal (figure X).  The Community Portal contains topics for discussions on various areas important to the development of OpenWetWare itself, such as: software, community development, information management, and courses.  A user coming to the community portal will find active discussions and places to help in the development of OpenWetWare, thus the portal serves as a power user recruiting tool by showing users who would like to help out where they can plug into the project.
Identifying and addressing community needs as OpenWetWare grows has depended upon leadership derived from within the community.  Top-down leadership will likely be unable to scale as the community grows.  Encouraging community leadership can provide a better mechanism for identifying problems and solutions as well as carrying out tasks can be distributed across a collection of volunteer community members.
C.III.b  Encouraging Community Growth
Collaborative web technologies provide technical solutions that lower the barriers to sharing information and collaborating online.  Alone, they are insufficient to establish the sort of information resource we are describing in this proposal – put simply there is more to OpenWetWare than the technical infrastructure underlying the site.  There is a dedicated community of users who are committed to using the site to share information and collaborate, and our experience suggests this community is the greatest asset in ensuring the success of the work described in this proposal.
The researchers on OpenWetWare, led by the steering committee, have actively encouraged the growth of communities on OpenWetWare in a variety of ways.  For example, there have been several informational pages developed to aid in getting new users aquainted with OpenWetWare.  Users have written detailed tutorials on building personal and laboratory information sites and guided tours of the various features of OpenWetWare.  In addition, the steering committee lead the reorganization of the front page for organizing the overall information instructure on OpenWetWare, attracting new groups to the site.  The OpenWetWare Highlights were started by a steering committee member to call attention to special news and outstanding contributions to OpenWetWare in order to provide others with examples for use.  Finally, individual communication between members (which is simple from within OpenWetWare) is integral in helping people get aquainted with the site. In combination, these efforts have provided users with rich resources for finding information on how to use the site, and what it is good for. 
Often, when new members of laboratories join OpenWetWare, there are serious concerns over vandalism and accountability since every user on OpenWetWare is able to edit every page on the site.  We have addressed issues on both technical and community levels.  First, each new member of OpenWetWare is screened by a small group of steering committee members (aided by tools developed by the community) to ensure that only researchers that wish to add content are given access and that each member has a verified contact information.  This also provides an opportunity to provide information on community norms and help with getting starting.  Second, each change on the site is tracked and stored on a history section at the top of each page.  Vandalism, and for that matter mistakes, can be quickly reverted, and attributed to a particular user that had gone through the aforementioned screening process.  This not only provides strong accountability on each edit, but also provides a mechanism for giving credit for contributions by a researcher.  Third, the steering committee esablished community norms to refrain making major changes to pages prefixed with a lab or group name.  For instance, the page “Silver:DNA ligation” represents the Silver lab’s DNA ligation protocol, and changes should not be made by non-Silver lab users, whereas the page “DNA ligation” represents a community protocol that could be improved and updated by any member.  The effectiveness of these measures is illustrated in the fact that to date, OpenWetWare has not been subject to even a single identifiable case of vandalism.
<i>c.II.a also describes the "namespace convention", albeit in less detail.  It seems to belong better here rather than there if you want to remove the redundancy. - B.C 7/5
from Becky
"You should probably have a section somewhere about what you'll do in case of abuse, e.g. vandalism of someone's pages.  What are the backup systems to make sure data aren't lost?  What are the safeguards against someone setting up dummy accounts to allow them to annoy people without being caught?"
We should include above something about backups.</i>
C.III.c Growth of informational and technical infrastructure
There are a number of software tools that have been developed by members of the OpenWetWare community.  Although these tools have largely been developed by users to address problems they are personally facing in using the site, they serve to make the site more useful to all OpenWetWare users.  The steering committee has taken steps to encourage software developers within the OpenWetWare community to develop new tools.  For instance, we have provided access to a development version of OpenWetWare that runs independent of the actual site.  This provides a platform for software developers to test new extensions without compromising the main site.
The list of commonly used custom extensions developed by community volunteers is shown in Table 1.  As examples, some particularly critical extensions are the dewikify, citation manager, filtered changes, and wiki merging extensions.  Dewikify enables a wiki page to be shown without the wiki frame surrounding the content. (figure X)  This enables labs to use OpenWetWare both as an information management tool and as their general, more aesthetic webpage.  The citation manager, Biblio, enables the easy creation of citations and bibliographies within a wiki document.  The user simply provides a PubMed ID or ISBN number and a full citation with authors, publication title and reference, and links to PubMed is generated automatically.  Filtered changes allows users to filter recent changes to OpenWetWare by user, laboratory, and other useful criteria.  This filtering enables a laboratory to easily monitor changes within its own portion of the site.  Finally, wiki merging allows a separate Mediawiki-based wiki to be merged with OpenWetWare.  This enables already-existing wikis that would like to join OpenWetWare for whatever reason to be able to easily move their content to the site.
While community software developers are a tremendous resource for OpenWetWare, it is unrealistic to expect that they will be able to meet all technical solutions requested by the steering committee.  For instance, we currently have a backlog of software tools (some of which are outlined in Specific Aim 2) that are of immediate importance to the community, but are unlikely to be developed by community members due to complexity and effort required of the projects.  Some tools were so critical to OpenWetWare’s continued growth (such as a robust user management system for dealing with high volumes of new user requests) that we contracted a paid developer to complete the work.  Moving some of the technical development to a small team of workers would allow development of these larger important tools, as well as take some of the burden from the steering committee to focus on other tasks of importance. 
As OpenWetWare has grown, we successfully scaled the computer hardware serving the site from a server in our laboratory to a faster server that is professionally managed off-site with regular backups.  MIT’s Computational and Systems Biology Initiative (CSBI) has graciously donated funds to allow us to remain on this faster machine (see Letters of Support), however based on our current growth we will soon require increased hardware support (see Proposed Research).
Table 1: Tools Developed and Incorporated by OpenWetWare Community
Tool Name
Function and Usage
Citation Manager An extension that allows simple citations and creation of bibliographies. Users can use a PubMed ID or an ISBN number to a reference, and a full citation with authors, publication title and reference, and links to the external database are generated automatically.
Automatized Linking
Automatic linking to published documents using DOI, or other biological databases such as the Registry for Standard Biological Parts.
Show/Hide A simple extension allowing users to simply hide excess information, which viewers can expand if they choose to get more detail.
Filtered Changes Allows users to filter recent changes to OpenWetWare by user, laboratory, group edits, as well as other useful criteria.
SideBar Customization Allows users to customize the sidebars on OpenWetWare to allow easier navigation within the larger site of OpenWetWare.
Wiki Import Allows importing of separate wikis into OpenWetWare to facilitate merging of disparate wikis.
Dewikify A tool that allows automatic dewikification of wiki pages. This allows groups to use the site as a distributed content mangement system for websites.
User Management System A customized tool that allows simplified user request processing.
D. Proposed Research
D.I Overview
The proposed work seeks to build upon the existing OpenWetWare community in order to achieve three specific aims: (1) establish an infrastructure for identifying and addressing community needs, (2) develop a series of critical tools of immediate use to existing communities on OpenWetWare, (3) implement new approaches to growing and strengthening online scientific communities.
D.II Specific Aim 1: Community & Technical Infrastructure
As mentioned previously, the user community on OpenWetWare is in the best position to identify problems and solutions to information generation, curation, and sharing on the site.  As a result, it is important that the leadership in charge of allocating personnel and resources is positioned to leverage this community.
The future organizational structure for site leadership will consist of two components: the existing volunteer community-based steering committee and a new administrative/technical team tasked with supporting the committee. To date, the steering committee has shouldered the community leadership as well as the administrative and technical burdens, reducing the overall efficiency of site operations, the contributions of the steering committee towards information management problems, and making it more difficult to recruit volunteer committee members.  Asking the community to contribute towards technical management of the site requires too much time and technical skill to ask of volunteers. In the new arrangement, the purpose of the steering committee will be to continue to identify and address community needs and the administrative/technical team will support these efforts with specialized technical skills in software development and project management for some of the larger challenges facing OpenWetWare.
In years 1-2 the administrative/technical team will consist of two software developers and a half-time administrator.  In the first two years the administrator will support activities of the committee, coordinating equipment purchases, interaction with server support personnel, and controlling funds. The administrator will then transition to a full-time project lead. In the final three years, the project leader will take over the administrator’s responsibilities, but in addition galvanize communities in locations outside of MIT, be responsible for overseeing the developers’ work and progress, coordinate and scale the steering committee as appropriate, and become the central contact point for OpenWetWare. The purpose of the project lead is to transition administration of OpenWetWare from MIT to an independent organization that will be able to consider options for longer-term sustainability. The developers will provide technical support to the steering committee and are tasked individually to the tool development described in later sections (see appropriate sections).
This administrative and technical core will serve the needs of the general OpenWetWare community as represented by the steering committee. The express purpose of the steering committee will be to identify and address community needs.  This can range from requests for software tools from specific scientific communities to general community-level tools such as improved discussion boards.  It will be up to the steering committee, by majority vote, to prioritize the list of goals for the administrative and technical team.  The organization of the steering committee will be essentially flat, with the administrator/project lead serving the added responsibility of scheduling, running, and recording meetings.  This model has been successful in developing the site's many improvements (see Previous Research). Finally, the steering committee has consisted of a large fraction of graduate students up to this point, and we expect that to continue in the future.  As active researchers, students make up some of the most prolific contributors to OpenWetWare, and are often the most in tune with community needs. Membership on the steering committee is currently unrestricted; all volunteers are accepted.  As the site grows and matures we intend to restrict the overall size of the steering committee and to put in place a process of community voting for selecting steering committee members (e.g., other community-based sites such as Wikipedia have successfully used such a system).
OpenWetWare is currently growing at a rate of 20% per month (for the last 8 months), and to support this growth we will need investments in technical infrastructure. We will purchase new server equipment and support in year 1 to meet our immediate needs, as well as in year 3 to account for projected growth. We have an existing relationship with Tech Square, Inc, who have been providing servers and support through our association with the Computational and Systems Biology Initiative at MIT, whose servers we currently use. We will expand that relationship with explicit funding for new equipment, and a service contract to take care of hosting and management. OpenWetWare upgrades and improvements will all be coordinated and vetted by the technical team described previously.
D.II Specific Aim 2: Critical Tool Development
The steering committee has previously spearheaded projects to construct tools, extensions, and tutorials to address community needs on OpenWetWare. However, the scope and technical detail of certain tools require paid, specialized personnel. The technical and administrative infrastructure specified previously will enable the pursuit of tools that are of immediate use to the OpenWetWare community, such as interfacing existing biological databases with OpenWetWare (D.II.a), automating information organization using categories (D.II.b), providing opportunities to associate greater context and meaning with data through semantic web technologies, (D.II.c), and developing software tools for easing content generation (D.II.d).
D.II.a New types of information to organize and annotate on OpenWetWare
As mentioned earlier, researchers often develop new information resources on OpenWetWare by aggregating information from disparate resources.  The ability to incorporate information from large structured databases  such as GenBank and the Protein Data Bank would give researchers more powerful tools to pull together information in a useful form [ref].  One of the developers from the technical/administrative team will develop tools to initially link OpenWetWare to GenBank, an almost universally used tool for biology.  Successful development of this tool (and subsequent evaluation of its usefulness to the community) will guide development of interfaces with other databases (PDB, WormBase, etc) requested by the community in the future.
GenBank provides DNA sequences of organisms, vectors, genes, and other biological samples.  Researchers use GenBank to drive research on many aspects of research such as forward genetics, and hypothesis development on phenotypic screens, and as substrate for sequence comparisons.  The first phase of the tool will pull information from GenBank, based on the GenBank Accession Number, of the sequence information.  This will provide a means for users to pull basic information on a particular DNA sequence.  For example, entering <genbank:seq>NC_001604</genbank:seq> will provide information such as the source organism (Bacteriophage T7), sequence length (39,937 bp), and relevant references (12 sequencing references).  The next phase will incorporate methods of viewing sequence and features directly within OpenWetWare, as one currently can in GenBank.  This would allow users to focus in on regions of interest in order to highlight information specific to the research being conducted.  These features have been heavily requested by the user community, however the technical nature of implementing these additions require the attention of a skilled web developer.  The most important feature while developing this tool will be maintaining the ease of use that allows OpenWetWare to be harnessed by non-technical users.
Finally, we expect in the future that OpenWetWare will be able to feed back into these structured databases, by providing context and experimental details.  For example, the semantic web technologies (described in D.II.C) will provide a means to define structured relationships between these larger database and the experimental details and other information that related to them. 
D.II.b Automized Information Organization
One of the major hurdles facing OpenWetWare is coping with efficiently organizing and storing information generated on the site (see C.II.A).  Most of the information on OpenWetWare is generated by individuals and small groups to track, store, and improve their informational resources.  Few researchers spend the extra effort to place links to these protocols in the shared protocols page.  Hence, while the total information content has grown quickly, these aggregation pages has grown at a slower rate.  To address this problem, we intend to provide users with the ability to easily categorize their own information.  Users can label pages using defined categories according to subject.  Then, these categories can be used to dynamically make pages that organize information from pages marked with the particular category.
While each of the technological steps has already been proven, the new tools will only be useful once all of them are completed.  The developer will be tasked with implementing simpler schemes for categorizing particular pages.  A tab at the top will provide users the option to "categorize" the page.  The user then can pick from existing categories, or create their own.  Finally, the ability to produce custom dynamic pages has already been demonstrated with other tools such as the Recent Changes filtering (see Table 1), and the developer will extend these tools to allow customized category pages.  The timescale of this project should be quite short (1 month), however, should provide a large jump in the organizational capacity of information on OpenWetWare.
D.II.c Providing Context and Meaning to Information
While the organization of individual pages is important, providing context and meaning to the information researchers generate will provide a more powerful research tool. Scientists should be able to flexibly categorize, search, and discover all the digitized knowledge. For example, if a user at UC Berkeley uploads a copy of a new paper, she should be able to easily assign short text phrases (tags) like “p53” and “Huntington’s disease” as well as crosslink the supporting data, electronic descriptions of the protocols used, analysis methodology, and links to order the biological materials involved. However, the real advantage comes from the broad use of a community – when another researcher in another laboratory uses the same reagent and the same “tag”, the information automatically connects itself to the previous experiment. The impact of such automated connectivity is significant.  Just as the open source programming model allows the community to organically grow the software code base for OpenWetWare, the “tagging” model allows the community to organically organize and add value to the knowledge that is created day by day in the laboratory.
The implementations of these technologies into OpenWetWare can be accomplished by using the tools developed at the World Wide Web Consortium (W3C) called the Semantic Web (SW, see B.II). Briefly, the technologies behind SW are theoretically simple and are based upon three standards developed by the W3C. First, the RDF allows individuals to make statements relating two objects in the form subject, verb, subject (e.g., ‘Paper A’ uses ‘DNA Ligation Protocol B’). Each of the subjects and verb are actually URI (Uniform Resource Indicator, A web URL is a URI). The RDF schema provides hierarchies for concepts and relations (e.g., ‘DNA Ligation Protocol B is a Protocol’). Finally, OWL allows more complex forms of types and relations as well as the ability to merge different ontologies by defining equivalence (e.g., If a ‘Ligation’ is a ‘Protocol’, and a ‘DNA Ligation’ is a ‘Ligation Protocol’, then ‘DNA Ligation’ is a ‘Protocol’).
We will begin the project by making simple relations based on the categorization of current pages developed in D.II.b. Since each page on OpenWetWare is itself a URI, we can begin to make simple relations based on the categorization of pages that are being developed (D.II.b). This will allow testing of the ability to quickly and easily tag data, pulling out the semantized information, and viewing and querying this information in a useful form. While tools exist for all of these steps, customizing them for the wiki, and ensuring ease of use will be the most time consuming tasks. Once these goals are accomplished, we will move to more powerful ways of tagging data with pertinent information. During this time-frame we will also coordinate with the developers who have recently started developing prototypes of incorporating Semantic Web into MediaWiki (ref), as well as other projects who intend to use these tools (EcoliHUB and NeuroCommons, see Letter of Support).
OpenWetWare is uniquely positioned to take full advantage of technologies surrounding SW and apply them to biology. First, OpenWetWare is specifically designed to capture the typically undigitized knowledge in the laboratory. Thus, this prevents recapitulation of existing database information that is already useful, and allows tighter integration with these existing resources. For example, a researcher would create a semantic link between a protocol on OpenWetWare and specific protein in the PDB by simple specifying the PDB ID when inputting their protocol. The PDB would later collect this information automatically and include a link on the PDB page to the protocol.  Through this mechanism, end users on OpenWetWare are providing context and experimental details for structured databases, just by doing their day-to-day lab work on OpenWetWare.  Second, large community of scientists on OpenWetWare could jumpstart the adoption of powerful SW practices as they will be powerful and easy to use. Third, the interconnections and categorizations on wiki sites already provide strong initial substrate for material to be organized (ref). Fourth, biology is a constantly changing science, and therefore OpenWetWare’s flexibility as a wiki to develop and easily make use of definitions and relations is very attractive. As consensus forms around particular methods of tagging information, they can more quickly be incorporated into more standard ontologies, such as the Gene Ontology (ref). Fifth, the OpenWetWare community includes key members of the growing semantic web for life sciences, including John Wilbanks, the first staff member focused on life sciences at the World Wide Web Consortium (see accompanying letter). Sixth, one of the developers tasked with the project (Ilya Sytchev) has significant experience with both Semantic Web and wiki technology with his work on the MIT Registry of Standard Biology Parts (ref). Finally, we will coordinate with other large wiki-based projects including EcoliHUB and NeuroCommons so that the tools we develop are generally useful to other communities through the OpenWetWare distribution (see Section D.III and letters of support).
D.II.d Increasing ease of use and encouraging new technologies
Previous efforts at improving ease of use have produced increases in contribution (see C.III).  We intend to focus here on "wizards" that will quickly generate templates for common information posted on OpenWetWare such as user pages, laboratory websites, course development, protocols, equipment, etc.  We expect these wizards will increase ease of contributing information, as well as provide increased use of some of the newer technologies of OpenWetWare.  For example, automatic incorporation of some of the tools for organization (D.II.b), and information tagging (D.II.c) will improve the use of these technologies and will accustom researchers to their benefits.  As a result, the wizards will also serve the secondary purpose of enabling better organization and standardization of site materials.
One of the developers will be tasked with developing a protocol submission wizard and providing a rough guide to the development process.  The guide will enable motivated users to build similar wizards for information relevant to their communities.  This wizard will ask users for information about a protocol such as the title, category, tags, list of labs or individuals that use it, and list of materials that are involved.  Using this information the wizard will generate a page for the protocol that follows a community standard (the 'standard' protocol look and feel are currently being developed by the steering committee), and automatically generate links from the listed labs and user page's protocols sections.  Additionally, it will place the protocol in the appropriate common protocols area based on its catgory and tags.  The user benefits from having a nicely formatted protocol page to fill in, and in the process the community benefits by having information that is better organized then if the user had undertaken the process without the wizard.  Finally, one important note is that we will continue to maintain existing approaches to generating new protocols.  We understand that it is the ability to create free-form content that has inspired the contribution of much of the unique information to OpenWetWare and we will ensure that researchers will always have this capability.  We would rather have content digitized and searchable, even if more unstructured, than to not have it available online at all.
D.III A standardized wiki distribution for biology
OpenWetWare is committed to remaining a resource for open publishing of information related to biological research. There are many individual efforts to use wiki software for either storing and organizing private laboratory information or facilitating larger collaborative projects aimed at community work with specific goals such as genomic annotation. In these cases, investigators are building their own infrastructures for these efforts. We will develop, distribute, and support a distribution that combines the customized MediaWiki software and tools described in the previous sections.
We expect that an OpenWetWare organized distribution will solve a number of problems. First, it will allow individual investigators to easily establish private wiki-based information management systems. This benefits individual investigators and small communities by getting access to the many tools that make OpenWetWare useful as well as reducing the effort at producing and maintaining such sites. In addition, open efforts, such as OpenWetWare, benefit because investigators are putting their information in an interoperable digital form that will make it much easier to make public when appropriate. The distribution will contain an easy method to move information from a private wiki form to other wikis such as OpenWetWare to openly distribute specific information. There are already several investigators, including our laboratory, eager to use the distribution to support their private work (see Letters of Support).
Second, the distribution will allow several efforts by larger communities to develop wikis to generate and curate specific types of information. These larger projects have their own specific needs for their communities, and thus have embarked on using and developing their own wiki infrastructures. However, coordination between these groups would help ensure interoperability, reduce repeated efforts at tool development and customization, and allow strategic division of labor for specific goals. To begin, three new projects, EcoliHub, Wikiomics, and the NeuroCommons have agreed to coordinate tool development and distributions to ensure interoperability between these large projects in the future (see Letters of Support).
The committment to an open-source standards based-approach (e.g., MediaWiki, W3C's Semantic Web) between these various projects is important because the ability to flexibly integrate new technologies is essential to the any software distribution – imagine a web browser that cannot run media players, for example.  While it is impossible to predict or program for disruptive technological advance, OpenWetWare's standards-based approach represents the best technical methodology to react to change.  And the open source methodology by which individual users can adapt the system to react to opportunity and recontribute code to the community means that the OWW distribution can grow and adapt without a large, cost-heavy organization driving requirements.
OpenWetWare is in the best position to lead this effort because of the tools and extensions already developed specific to biologists, the experience of hosting a large scientific community, and the insight from that community to lead development of new useful tools. Ilya Sytchev, the programmer who has thus far led the technical upkeep and maintenance of OpenWetWare, will spearhead this project as one of the two developers on OpenWetWare. To begin, Ilya will develop two distributions, one geared towards individual laboratories with privacy concerns, and another for open sites similar to OpenWetWare. These distributions will contain all the extensions, tutorials, help pages, et cetera that make OpenWetWare easier to use and more powerful than the standard MediaWiki distribution. In addition, he will work on extensions that make it easy for these distributions to share information amongst each other by means of a new “publish” tab. This will allow for example, very simple publishing of a private protocol page on an individual’s wiki to OpenWetWare. Also, throughout the project, Ilya will act as a liason with other projects to incorporate and vet the new tools developed elsewhere into OpenWetWare. Most importantly, Ilya will ensure future compatibility of these extensions by testing these extensions and additions as new versions of the core MediaWiki software are released.  Finally, depending on interest from the community, other specialized distributions will be developed, such as an education wiki distribution for developing and sharing course materials.


==References==
==References==

Latest revision as of 10:20, 8 September 2006

The grant has been submitted. Thanks to all of you that helped with grant and letter writing. Here is the submitted version of the Project Description. More to come including references and front matter.


Old Version of Grant

References

  1. Campbell EG, Clarridge BR, Gokhale M, Birenbaum L, Hilgartner S, Holtzman NA, and Blumenthal D. Data withholding in academic genetics: evidence from a national survey. JAMA. 2002 Jan 23-30;287(4):473-80. DOI:10.1001/jama.287.4.473 | PubMed ID:11798369 | HubMed [Campbell-JAMA-2002]
  2. Blumenthal D, Campbell EG, Gokhale M, Yucel R, Clarridge B, Hilgartner S, and Holtzman NA. Data withholding in genetics and the other life sciences: prevalences and predictors. Acad Med. 2006 Feb;81(2):137-45. DOI:10.1097/00001888-200602000-00006 | PubMed ID:16436574 | HubMed [Blumenthal-AcadMed-2006]
  3. Vogeli C, Yucel R, Bendavid E, Jones LM, Anderson MS, Louis KS, and Campbell EG. Data withholding and the next generation of scientists: results of a national survey. Acad Med. 2006 Feb;81(2):128-36. DOI:10.1097/00001888-200602000-00007 | PubMed ID:16436573 | HubMed [Vogeli-AcadMed-2006]
  4. Blumenthal D, Campbell EG, Anderson MS, Causino N, and Louis KS. Withholding research results in academic life science. Evidence from a national survey of faculty. JAMA. 1997 Apr 16;277(15):1224-8. PubMed ID:9103347 | HubMed [Blumenthal-JAMA-1997]
  5. ISBN:020171499X [Cunningham-2001]
  6. DNA ligation. (2006, June 24). OpenWetWare, . Retrieved 18:29, June 25, 2006 from http://openwetware.org/index.php?title=DNA_ligation&oldid=44446.

    [DNAligation]
  7. Beta-Galactosidase Assay (A better Miller). (2005, December 28). OpenWetWare, . Retrieved 18:30, June 25, 2006 from http://openwetware.org/index.php?title=Beta-Galactosidase_Assay_%28A_better_Miller%29&oldid=15730.

    [BetaGalAssay]
  8. Endy:Victor3 plate reader. (2006, June 20). OpenWetWare, . Retrieved 18:32, June 25, 2006 from http://openwetware.org/index.php?title=Endy:Victor3_plate_reader&oldid=43362.

    [PlateReader]
  9. E. coli genotypes. (2006, June 17). OpenWetWare, . Retrieved 18:33, June 25, 2006 from http://openwetware.org/index.php?title=E._coli_genotypes&oldid=42693.

    [EcoliGenotypes]
  10. BE.109. (2006, May 11). OpenWetWare, . Retrieved 17:38, June 25, 2006 from http://openwetware.org/index.php?title=BE.109&oldid=36681.

    [BE109]
  11. Wheeler DL, Church DM, Federhen S, Lash AE, Madden TL, Pontius JU, Schuler GD, Schriml LM, Sequeira E, Tatusova TA, and Wagner L. Database resources of the National Center for Biotechnology. Nucleic Acids Res. 2003 Jan 1;31(1):28-33. DOI:10.1093/nar/gkg033 | PubMed ID:12519941 | HubMed [Wheeler-NAR-2003]
  12. Berman H, Henrick K, and Nakamura H. Announcing the worldwide Protein Data Bank. Nat Struct Biol. 2003 Dec;10(12):980. DOI:10.1038/nsb1203-980 | PubMed ID:14634627 | HubMed [Berman-NatStrucBiol-2003]
  13. Grumbling G and Strelets V. FlyBase: anatomical data, images and queries. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D484-8. DOI:10.1093/nar/gkj068 | PubMed ID:16381917 | HubMed [Grumbling-NAR-2006]
  14. Schwarz EM, Antoshechkin I, Bastiani C, Bieri T, Blasiar D, Canaran P, Chan J, Chen N, Chen WJ, Davis P, Fiedler TJ, Girard L, Harris TW, Kenny EE, Kishore R, Lawson D, Lee R, Müller HM, Nakamura C, Ozersky P, Petcherski A, Rogers A, Spooner W, Tuli MA, Van Auken K, Wang D, Durbin R, Spieth J, Stein LD, and Sternberg PW. WormBase: better software, richer content. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D475-8. DOI:10.1093/nar/gkj061 | PubMed ID:16381915 | HubMed [Schwarz-NAR-2006]
  15. ISBN:0-7695-2108-8 [Wang-2004]
  16. ISBN:1-58113-499-1 [Bergin-2002]
  17. [BE109gels]
  18. [BE109errors]
  19. [BE109notes]
  20. [BE109steeringcommittee]
  21. [BE109urop]
  22. [BE109collaborate]
  23. [BE109questions]
  24. [SBstarterkit]
  25. [Ecoliwiki]
  26. [SemanticMediaWiki]

All Medline abstracts: PubMed | HubMed