DataONE:GEO reuse study
This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.
Analysis of data reuse of NCBI's GEO dataset
Long term Aims
To understand the extent and value of data reuse for data stored in the NCBI's GEO database.
To fill in the blanks in these sentences:
- We have collected some information using ??? on the GEO database, which is made possible because GEO citations are indexed in ????. We recorded all papers that cite the GEO data per year and the number of data sets in GEO for that year. We examined a subset of XXX of those citing papers to estimate the proportion of citations which (1) reused the original data in a significant way (rather than simply allude to its existence), and (2) did not include an author of the original work (because these authors would have access to the data in the absence of the archive). We also used this sample to record the nature of the reuse, for verification, meta-analysis or new questions.
- For every data set in GEO, there are XXX citations to data. Moreover, GEO is rapidly growing and there is a necessary time lag between deposition and reuse (on average XXX months after deposition), from which we can estimate that the typical paper is likely to generate YYY citations over the short term. This number should increase as more time passes and citations continue to accumulate for each paper, and it is an underestimate because not all citations to the data use standard references that can be tracked by ????. Of these citations, XXX% of them are estimated to results in novel scientific work that could not have been performed with the archive, for a total of XXX new pieces of work for each archived data set.