OpenWetWare:Information management/a model for novel publishing
novel publication group formed
- see nature idea for text mining http://blogs.nature.com/wp/nascent/2006/04/open_text_mining_interface_1.html
- Lucks 17:57, 2 August 2006 (EDT): This seems right up the alley of your idea Jonh - or at least a technical implementation to achieve some of your ideas. Perhaps OWW can work with publishers to figure out what kind of markup to use (ATOM,RDF) and how to markup (what tags) a document?
- John, Julius, Drew, Jason
- ACTIONS: to present proposal on novel publishing to next SC meeting
- August 10th,
- can we use the money for other things, core care and feeding money.
- e.g %65 direct costs
- collaborations with all publishing houses.
- they want exploratory channel and want to know what the inter
- OWW is considering a novel publication channel and is currently looking to partner with publishing houses
OWW Project Brainstorm - Option 1
This idea stems from the OWW retreat where we discussed ideas for a novel publishing model.
Fast publication of research and ideas
- A fast track for publishing of novel scientific findings
- A common format for publications and comments and reviews on publications, so that findings published in one location can be reported and in another one.
- Any comments made or reviews added would be stored centrally.
- 3 years of work does not need to be saved up for one publication, large publications are welcome, as are reviews, but so are small publications that concentrate on answering one question.
- Semantic web of publications about scientific knowledge
- XML (Extensible Markup Language) implementation for sharing publication data and comments, like SBML (Systems Biology Markup Language)
Keeping abreast of current information
- people can subscribe to feeds of keywords/tags
- Web portals can spring up supporing and acting as editor for one particular field
- Labs and PI's can post paper ID's for different papers of interest to their group
- Each paper has a link to the public contributions
An analogy with the Media Industry
How the media industry runs:
I envisage that a future model of novel publication would be just like news stories are currently published. for example, companies and organisations issue press releases or news agencies like Associated Press (AP) or Reuters report news stories onto 'the wire'. Newspapers and other media outlets then pick up the stories and publish them. Newspaper editors and TV producers choose which stories and press releases are relevant and publish them. They also commision articles or stories to be written that they think will be of specific interest to their readers.
- Lucks 21:31, 1 August 2006 (EDT): The meaning of 'publish' for media is different for science. News over the wire is some set of facts picked up by wire reporters in the field. When a newspaper decides to run a story, they (usually) do further research which means following up more leads and verifying facts. They then write the story and release it. The scientist is more like the newspaper in this case. The scientist might have gotten their wire-tip from a comment on a paper, a paper an unfinished lead, etc. There is a key difference though in that the news is not peer reviewed. The news agency will have some in-house fact checker, but this is not nearly as rigorous as the scientific peer review process is (or is supposed to be) by the existence of very biased news sources which I will not name. Basically no peer review let's there be tons of spin in media.
How the scientific publishing industry could run:
- (press release) Authors submit their publications to a central database (or just publish it on a folder on their webserver and it is picked up by a webcrawler?)
- (same as newspaper editors) Editors commision certain publications
- (News agencies publish information) same as collaboratative publishing by a group of people, communities publishing together? conference proceedings?
Integration into existing publishing channels
- Submit papers the same way to journals, get support of publishing houses to adopt the common format.
- But, content of journal article will be open, so it can be republished in another format and the comments are in the public domain.
Integration into new publishing channels
- e.g publish to OWW button on wiki's, blog's etc.
The role of editors
- Editors maintain their role as publication aggregators, receiving feeds from (like an editor of a news
- Editors could be assigned an ID, that packages their comments, ratings and keywords in the same format as the public contributions, but if the editor publishes a publet in a journal or on a website then their contributions appear at the top, or are given a higher priority than other comments.
- Editors could commision certain articles
The role of public contributions
- If a publet
- Lucks 21:46, 1 August 2006 (EDT): It seems to me that the main crux behind this model is to move away from publishing works in a paper format (abstract, introduction, results, discussion, conclusion) to a model where scientific 'stories' are peaced together from more scientifically meaningful bits (hypothesis, small test, small result). This would enable the small day-to-day science work to be published as they are, or agregated into paper-sized scientific stories. The aggregation would be carried out by editors.
- I have heard that Nature is looking into an XML specification for their articles (from the recent talk by the Editor-in-Chief in Cambridge). This would be a finer-grained markup of the article to allow for better data mining. This seems related to the current proposal, but still requires one to publish a full paper instead of the individual bits one at a time.
- I don't see how the system accounts for peer review or citation. Perhaps each small bit will have a DOI that can be used for citation (and aggregation). The publishing of small bits almost necessitates a community review system so as not overwhelm a committee of appointed reviewers.
- How to retract a publication?
- How to submit an adendum? (author contributions get higher priority)
- Lucks 21:37, 1 August 2006 (EDT): How is peer review carried out in this system?
How to decide on authorship
- Set up a trial of this database with some example data to see how it might work
- Allow a "publish this abstract to my wiki/blog/etc/ button"
- Semantic web about publications (SWAP)
OWW Project Brainstorm - Option 2
Lucks 09:53, 27 July 2006 (EDT):This is not necessarily a completely different idea - I just couldn't figure out how to blend the text with the idea above. Perhaps we can pick and choose features ... Although this might be a long term goal, or an idea to research as one of many possibilities, I think there are some important issues to think about.
The Open Source Software movement has faced some of the same challenges that OWW faces in trying to establish on open-source science collaboration framework. Eric Raymond has written an illuminating text on several aspects of this culture called The Cathedral and The Bazaar. A must read.
The goal is to provide a publishing system that shifts the academic currency system away from large, highly-polished works to a system that treats the everyday scientific 'baby-step' as a meaningful, citeable, and reviewablee contribution. To examine the need for such a system, let's highlight the features of the current publication system.
Current Publication System
The typical publication process usually consists of the following steps:
- A project idea is incarnated - either a novel idea, or one derived from other projects/literature searches
- Funding is secured
- Research is performed - length of time is variable, but in most cases can easily be years. Since research, by definition, is the exploration of a new phenomenon, the path of the research endeavor is more like a slightly-biased random walk rather than a ballistic motion at a specific target. During those years, there is a dialy process of devising smaller hypotheses and testing them, which are bottstrapped into larger hypotheses, etc.
- After a certain period of time, the research is declared to be publishable by the group of researchers. More often than not this is NOT when the original goal was reached - more likely when the researchers think they have achieved significant progress.
- The writing of the paper starts - the length of this process can vary, but with multiple drafts this can take months. Because of journal page restrictions, the desire to publish in high-impact, 'trendy' journals, often times a significant amount of effort is spent on making the research into a 'story' that will capture an audience of editors and scientists (the researchers need those citations down the road). This 'story' doesn't actually have to follow the path that the research occurred in. Most times there is not a good way to publish all the knowledge that was gained in the investigation, only that which can fit within the story. Based on my own experiences, only about 50% of the information learned can be fit in a publication - the rest is comprised of unpublishable mini-investigations (referenced in the publication as 'data not shown') and other loose ends. This extra information is usually left to wallow in lab notebooks never to be followed up.
- The paper is submitted and undergoes review (if lucky) - this is a variable length process that can last months.
- Authors address reviewer comments - variable length.
- If the paper is still alive, the author now has to play a cat and mouse game with the layout people of the journal becasu the document preparation system used by the author does not quite match that of the journal. I have had layout times take longer than the reviewing cyle.
What are the good qualities about this sysem that we want to preserve in a future system?
- Scientific results are communicated through publishing and distribution of journals.
- Scientific results are reviewed by 2 to 3 reviewers and an editor before they are published. (I would argue that even more reviewing eyes would be beneficial to reduce the chances of lazy or biased reviewers).
- The publication process allows works to be cited and archived.
What are the problems with this system that need to be addressed
- Smaller scientific achieviments don't count as currency. You don't get credit until you can publish a paper on it. This biases the system against researchers who have short attention spans, and also encourages researchers to force publications out, when maybe theey shouldn't be forced out. Since the curruncy is published (and highly cited - the currency is actually citations), the funding structure can impose deadlines that causes considerable amount of rushing to publication. This is the most serious problem with the current literature system.
- The process takes way too long. Even if we stick with the current currency model of a polished-paper, then the reviewing and layout process can have a lag time of up to a year in some cases - way too long to wait when working on cutting edge science. Even when we eliminate these 2 processes (as has been done by Paul Ginsparg's revolitionary arxiv.org, the researcher still has to hoard smaller, but useful mini-investigations until the paper is written. I would argue that it is better to get out complete investigations (little mini-hypotheses) out before a paper can be written - these are often useful to more people than the author.
- Too much information is lost - since there is no venue to publish the mini-investigations, most of those that don't fit in the paper wallow in lab notebooks never to be seen again. This also brings up the debate about publishing negative results, which often don't make it into papers, but have a clear utility to science.
- The peer review process can break down with lazy, unbiased, or uncompetent reviewers. It would be better to have a system that let a broad group of people comment on smaller parts of the larger work as well as the work as a whole. Make peer review change to community review.
- Literature searching is a very difficult task. This is in part due to poor search tools designed for the needs of a scientific liteature search, but also due to the paper presenting a 'story' that spins the content of the research. In my experience, I perform a query - those articles whos TITLES look promising then get their abstracts read. The point is that titles can often be chosen for spin-factor. If the model separated form from content (what CSS does for HTML), then you can search based on scientific content, find all papers or work that has that content, and then read the work - rather than searching based on the authors choice of words.
Literature System Design
I am going to discuss a proposed design of a technology that would allow the good features of the existing literature system to be held, while fixing some of its problems.
The system would consist of a central web site that has the following features
- Management of user accounts - one account per researcher
- Users can post material on the site - text, images, ...
- Users can cross-reference content in their posts
- Every post has the user's identity associated with it
- Every post has a score - users vote on the post (up or down), and the score is aggregated from all the votes
- Posts are ranked based on their score, and can be sorted and displayed based on the score
- A user reputation is stored - users gain in reputation when their posts have high scores. The user reputation can be seen by other users
- Posts are citeable
- Small Achievements Count!: The system is designed to focus on smaller publication quanta.
- Short Procees Time: Once an idea is ready to post, the only time before others can see it is the time to post (minutes).
- Information Not Lost: Since the system promotes a smaller unit of publication, researchers are encouraged to post all progress, not just that that can fit in a paper.
- Community Review: Since every comment is rateable, everyone in the community can 'review' a post. They can comment on content related with their expertise. This removes the problem of a reviewer not being technically suited to review a work. Also the process will be much faster. When Netscape released the source of their browser, the first changes to the software occurred within hours - this is because thousands of people were reading the content as it came out and began reviewing immediately.
- Separation of Content from Form: Since the units are smaller, the scientific content can be separated from the larger work. This is equivalent to adding advanced markup strategies (XML grammers) to the existing publishing model - this is done to let computers start to understand the smaller relevant pieces. In this system, the pieces are transparent from the beginning.
- Literature Review Mechanism: The system also allows for the establishment of a review system of current literature. Here posts are links to the article pages and comments are associated with the literature. This allows the collection of community established literature clusters as users cross references relevant literature.
- Evolvable Documents: Rapid publication in small units that can be reviewed and commented on by the community lead to a system that supports evolvable scientific documents that reflect the true nature of the dynamic process that is science.
- Should user names be displayed by the posted material?
- Should posts be editable by other parties, or just referenced? If editable, how to best track and account for the authorship of those edits.
I am involved in an implementation of these ideas that focuses on the literature review aspect. The background is that the physics preprint arxiv is a repository for fast publication of papers in the physics community. Authors submit to the arxiv at the same time submitting to a journal. The arxiv allows for immediate distribution of work, solving some of the problems above, but has no peer review process, which is often cited as its biggest weakness. A partucular strong point though is that it is free access to anyone in the world.
arxiv.reddit.com is a collaboration with the makers of reddit to add a peer review process to the arxiv through literature comments. The features of the cite are much like those above. The idea is that a post is created to a specific paper on the arxiv. This post is just a link to the arxiv site, but the post can be commented on. Both posts and comments can be rated. Comments can highlight typos, discuss controversial elements, follow up on ideas and propose new ideas. In principle comments can be cited and serve as the individual elements of scientific publication. User names are show by each comment and the user reputation can be looked up by anyone. Comments can be sorted by their rating.
The site is an exploration and is just getting started. As an experiment, I have posted slides from a talk I gave to collect comments after the talk.
OWW provides many of the features outlined above. However several things can be improved
- Display of user names with authored content.
- Voting/ranking of content.