User:Andy Maloney/Open Science

From OpenWetWare
Jump to navigationJump to search

Purpose

This page describes my experience with open science.

This is the fourth chapter in my completely open notebook science dissertation. If you would like to post questions, comments, or concerns, please join the wiki and post comments to the talk page. If you do not want to join the wiki and would still like to comment, feel free to email me by using the provided link below.


Introduction

This chapter will talk about my experience with the growing movement that is pushing science in the open. I will not discuss the pros or cons about open science in this chapter as every one will have their own opinion on the subject. I will only briefly outline below some success stories I have encountered with using open science and some of the pitfalls that I have encountered.

What is open science?

I cannot think of a better way to describe it than what David Gezelter, Associate Professor at University of Notre Dame, said about it in a blog post. He defined it as four simple ideas.

  1. Transparency in experimental methodology, observation, and collection of data.
  2. Public availability and reusability of scientific data.
  3. Public accessibility and transparency of scientific communication.
  4. Using web-based tools to facilitate scientific collaboration.

The transparency in methodology, observation, and collection of data using web-based tools is easily accomplished using services similar to OpenWetWare, which is a provider of open notebooks. Open notebooks are web based notebooks that are completely open to the public for viewing. Open notebooks are just like paper notebooks in that you write in your notebook what you have done in in the lab. The only difference is that in a web based open notebook, you can embed videos, pictures, and links very easily. You can embed images and links in a paper based notebook via the tape it in the page method, however, you would have to make a flip book in order to embed a movie in a traditional notebook. I will discuss below my experience using an open notebook.

Public availability and accessibility of scientific data is a bit more complicated because there does not exist a standard for the dissemination of data. Nor is there a repository for collecting the data. I will discuss some of the advances made here at the University of New Mexico in an attempt to create a repository and other web based services that have taken the initiative to also start disseminating scientific data.

Open science experience

I feel that open notebooks or even electronic notebooks in general are preferable to paper notebooks in that they are accessible from anywhere there is internet access. Private wiki based notebooks are available that maintain a level of security to projects if an open based notebooks is not an option. See the project by Galois for an example. This ability to access information done in a lab from anywhere is very beneficial and has aided my research quite a bit. Since my notebook is open, a simple Google search reliably returns information that I put in my notebook with simple search terms.

The use of an open notebook does come at its own cost, however. Since this is web based technology; servers can crash, someone can forget to pay the bill, or data can be inadvertently lost if not redundantly backed up. Thankfully I have not experienced a server crash but, I have experienced other infuriating issues with this technology. One case is when the browser crashes. If one does not continuously save pages written in the wiki, then they can be lost due to a browser or system crash. This is not a problem when using a paper based notebook.

I cannot discuss open notebooks without discussing the myriad of other web based technologies used in conjunction with the notebooks. These things include BenchFly, YouTube, Google, Scribd, Instructables, and many others. This many others is actually problematic as there is no standard in how scientific information is disseminated using these types of services. Some services such as Flavors attempt to bring all online content that users make into one single space. Unfortunately no such service exists for scientists that are doing open notebook science which means one is left with trying to coerce the available web based applications to do what the scientist needs. This is just indicative how young the area of web based open science is and I hope it changes in the future.

The storage and user readability of data from experiments can be a very complicated subject. Every experimenter uses short hand and abbreviations in their experiments. Those notes may be easy for the experimenter to read but, they are basically gibberish to someone that is looking at the data for the first time. Dr. Koch and I have been collaborating with Rob Olendorf PhD, a library scientist at UNM. Our goal is to use the library system here to store scientific data at an institutional level. Rob is programming an automated XML tagging system that will allow the raw data I take (mostly images) to be tagged with user readability in mind. The tagging is similar to the meta data used for mp3s. We are basically pushing the limits of storage and usefulness of scientific data at UNM. Since one of my experiments can produce 1 TB of image data, the storage and the transferring of data become a big obstacle.

Not at the institutional level are other web based applications that can disseminate data. I have spoken to Alan B. Marnett PhD, founder of BenchFly about the possibility of using his service as a way to host my image data. Hosting image data on BenchFly is a very natural evolution to the site's purpose and Alan was excited about the possibility. Unfortunately the cost of uploading greater than 1 TB of data has been some issue and we are currently in the process of finding a solution with Alan about uploading.

I will not discuss the advantages or disadvantages of using an institutional data hosting service compared to a cloud based one, i.e. BenchFly. I believe that both are essential to the dissemination of data because one is designed to be archival (the libraries), while the other is designed to be easily navigable by users.

Before speaking to Alan from BenchFly, I uploaded videos of data to YouTube. Doing a simple search on Google using the key terms "gliding motility assay" will bring up several movies showing data I took. That data led to Dr. William Saxton and Dr. Josh Deutsch both professors at UCSC to ask Dr. Koch if they could obtain the data in the movies. Of course I was extremely happy to give the data to them. The gliding motility assay data I took was designed for a specific purpose, trackability. This was so that I could take speed measurements discussed in Chapter 2 and microtubules that exhibited motion that was circular, were not tracked for my purposes. The data did have some microtubules that did exhibit this circular motion however. It turns out that the circular motion is what Dr. Saxton and Dr. Deutsch were after. Stuck microtubules have been shown to mix the insides of fly eggs (Serbus 2005) and Dr. Deutsch, his student M. Brunner and Dr. Saxton (Deutsch 2011) came up with a spectacular model describing this motion. The data I took served as an in vitro check to the model. The use of the data I took is well beyond anything I or Dr. Koch could have imagined since before our interactions with Dr. Deutsch and Dr. Saxton, we had no idea this area of research existed. Because of the open data, the group at UCSC was able to use it in their paper.

Not only did they use it in their paper but, Chapter 1 of this thesis was used to get Dr. Saxton's group started with gliding motility assays. Gliding motility assays are not easy and I have seen them fail for inexperienced researchers almost every time they attempted it. Dr. Saxton's student, Corey Monteith, read my rough (and I mean really rough) draft of Chapter 1 and was able to get the assay working in their group. Having a common language helped as well and I was very glad to speak to Corey using the same language outlined in Chapter 1. This is a major advancement for their group because what took me nearly a year to perfect, Corey was able to do in two weeks.

References