User:Heather A Piwowar/Notebook/PhD thesis: Difference between revisions

From OpenWetWare
Jump to navigationJump to search
(remove calendar and icons)
Line 2: Line 2:
<sitesearch>title=Search this Project</sitesearch>
<sitesearch>title=Search this Project</sitesearch>
==Summary==
==Summary==
'''Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data'''
My PhD dissertation, '''Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data,''' was defended in the [http://www.dbmi.pitt.edu/ Department of Biomedical Informatics] in the School of Medicine at the University of Pittsburgh on March 24, 2010.
 
defended in the [http://www.dbmi.pitt.edu/ Department of Biomedical Informatics] in the School of Medicine at the University of Pittsburgh on March 24, 2010.


Committee:
Committee:

Revision as of 07:31, 6 June 2010

<sitesearch>title=Search this Project</sitesearch>

Summary

My PhD dissertation, Foundational studies for measuring the impact, prevalence, and patterns of publicly sharing biomedical research data, was defended in the Department of Biomedical Informatics in the School of Medicine at the University of Pittsburgh on March 24, 2010.

Committee:

  • Dissertation Advisor: Wendy W. Chapman, PhD, Assistant Professor, Department of Biomedical Informatics, University of Pittsburgh
  • Brian B. Butler, PhD, Associate Professor, Katz Graduate School of Business, University of Pittsburgh
  • Ellen G. Detlefsen, PhD, Associate Professor, School of Information Sciences, University of Pittsburgh
  • Gunther Eysenbach, MD, MPH, Associate Professor, Department of Health Policy, Management and Evaluation, University of Toronto
  • Madhavi Ganapathiraju, PhD, Assistant Professor, Department of Biomedical Informatics, University of Pittsburgh

Abstract

Many initiatives encourage research investigators to share their raw research datasets in hopes of increasing research efficiency and quality. Despite these investments of time and money, we do not have a firm grasp on the prevalence or patterns of data sharing and reuse. Previous survey methods for understanding data sharing patterns provide insight into investigator attitudes, but do not facilitate direct measurement of data sharing behaviour or its correlates. In this study, we evaluate and use bibliometric methods to understand the impact, prevalence, and patterns with which investigators publicly share their raw gene expression microarray datasets after study publication. To begin, we analyzed the citation history of 85 clinical trials published between 1999 and 2003. Almost half of the trials had shared their microarray data publicly on the internet. Publicly available data was significantly (p=0.006) associated with a 69% increase in citations, independently of journal impact factor, date of publication, and author country of origin.

Digging deeper into data sharing patterns required methods for automatically identifying data creation and data sharing. We derived a full-text query to identify studies that generated gene expression microarray data. Issuing the query in PubMed Central, Highwire Press, and Google Scholar found 56% of the data-creation studies in our gold standard, with 90% precision. Next, we established that searching ArrayExpress and the Gene Expression Omnibus databases for PubMed article identifiers retrieved 77% of associated publicly-accessible datasets.

We used these methods to identify 11603 publications that created gene expression microarray data. Authors of at least 25% of these publications deposited their data in the predominant public databases. We collected a wide set of variables about these studies and derived 15 factors that describe their authorship, funding, institution, publication, and domain environments. In second-order analysis, authors with a history of sharing and reusing shared gene expression microarray data were most likely to share their data, and those studying human subjects and cancer were least likely to share.

We hope these methods and results will contribute to a deeper understanding of data sharing behavior and eventually more effective data sharing initiatives.

Full Text

PDF, Word docx

Associated Publications, Data, and Source Code

Proposal

Pilot Study

Aim 1

Aim 2a

Aim 2b

Aim 3

Defense

Ongoing work

Check out my personal page at OpenWetWare.



Recently Edited Notebook Pages