DataONE:GEO reuse study: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(add stub info)
 
(→‎Short-term Aims: add placeholder text)
Line 4: Line 4:


==Short-term Aims==
==Short-term Aims==
To fill in the blanks in these sentences:
:: We have collected some information using ??? on the GEO database, which is made possible because GEO citations are indexed in ????. We recorded all papers that cite the GEO data per year and the number of data sets in GEO for that year. We examined a subset of  XXX  of those citing papers to estimate the proportion of citations which  (1) reused the original data in a significant way (rather than simply allude to its existence), and (2) did not include an author of the original work (because these authors would have access to the data in the absence of the archive). We also used this sample to record the nature of the reuse, for verification, meta-analysis or new questions.
:: For every data set in GEO, there are XXX citations to data. Moreover, GEO is rapidly growing and there is a necessary time lag between deposition and reuse (on average XXX months after deposition), from which we can estimate that the typical paper is likely to generate  YYY citations over the short term. This number should increase as more time passes and citations continue to accumulate for each paper, and it is an underestimate because not all citations to the data use standard references that can be tracked by ????. Of these citations, XXX% of them are estimated to results in novel scientific work that could not have been performed with the archive, for a total of XXX new pieces of work for each archived data set.


==Long term Aims==
==Long term Aims==


==Pilot results==
==Pilot results==

Revision as of 08:47, 18 June 2010

This DataONE OpenWetWare site contains informal notes for several research projects funded through DataONE. DataONE is a collaboration among many partner organizations, and is funded by the US National Science Foundation (NSF) under a Cooperative Agreement.

DataONE

Home        People        Research        Summer 2010        Resources       


Analysis of data reuse of NCBI's GEO dataset

Short-term Aims

To fill in the blanks in these sentences:

We have collected some information using ??? on the GEO database, which is made possible because GEO citations are indexed in ????. We recorded all papers that cite the GEO data per year and the number of data sets in GEO for that year. We examined a subset of XXX of those citing papers to estimate the proportion of citations which (1) reused the original data in a significant way (rather than simply allude to its existence), and (2) did not include an author of the original work (because these authors would have access to the data in the absence of the archive). We also used this sample to record the nature of the reuse, for verification, meta-analysis or new questions.
For every data set in GEO, there are XXX citations to data. Moreover, GEO is rapidly growing and there is a necessary time lag between deposition and reuse (on average XXX months after deposition), from which we can estimate that the typical paper is likely to generate YYY citations over the short term. This number should increase as more time passes and citations continue to accumulate for each paper, and it is an underestimate because not all citations to the data use standard references that can be tracked by ????. Of these citations, XXX% of them are estimated to results in novel scientific work that could not have been performed with the archive, for a total of XXX new pieces of work for each archived data set.

Long term Aims

Pilot results