Skip to main content

Currently Skimming:

2 Data Sharing and Data Preservation
Pages 7-19

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 7...
... Ferguson asserted that sharing data and making them interoperable is the best strategy to better understand these complex disorders. Ferguson described the bottleneck that is created when researchers perform data entry and curation on enormous amounts of heterogeneous raw data.
From page 8...
... This initiative has expanded with the development of the TBI Open Data Commons2 and the Veterans Affairs Gordon Mansfield SCI Consortium,3 the latter of which is focused on translational SCI stem cell therapies. Ferguson also described Transforming Research and Clinical Knowledge (TRACK)
From page 9...
... Althoughcommunity was an early adopter of the FAIR principles, host The SCI clinical study reports are familiar to individuals primary ing a workshop industry-sponsored drugaor devicein 2017 on the involved in in 2016 on FAIR data sharing,5 workshop trials, we select a policythe general term fulldata sharing via an open data commons,6 use needed to execute FAIR study report here to encompass prespeci unabridged final reports for all clinical and preclinical studies, 5 For more information, see https://www.ninds.nih.gov/News-Events/Events studies. The study protocol and full study report provide Proceedings/Events/Spinal-Cord-Injury-Preclinical-Data-Workshop-Developing-FAIR, transpar detailed information that is not included in the published accessed September 12, 2019.
From page 10...
... Data citation standards are then applied via SciCrunch,8 Digital Object Identifiers are issued, and the data are released under a creative commons attribution license. This framework provides a single place in which SCI researchers can organize and publish their data as well as receive primary credit for the data as a work product.
From page 11...
... Robert Williams, University of Tennessee Health Science Center, said that the creation of a journal series via Jupyter Lab Notebooks could help address long-tail data issues, although this may increase publication costs for researchers. Ferguson speculated that publications in the future might resemble the Internet -- one could click to source data from links within a web-based version of an article.
From page 12...
... In response to a question about curation costs and associated standards from Alexa McCray, Harvard Medical School, Ferguson said that data curation costs increase when many thousands of variables are present in siloed data systems. He explained that although common data elements may exist in two data sets, interoperability drifts over time as various people begin to recode variables in published papers.
From page 13...
... He added that it is unclear what motives prevent researchers from sharing their data; however, these motives will become clearer if funders begin to enforce data-sharing policies. PANEL DISCUSSION: RESEARCHERS' PERSPECTIVES ON MANAGING RISKS AND FORECASTING COSTS FOR LONG-TERM DATA PRESERVATION Margaret Levenstein, University of Michigan, Moderator Nuno Bandeira, University of California, San Diego Jessie Tenenbaum, Duke University and the North Carolina Department of Health and Human Services Georgia (Gina)
From page 14...
... The NIHfunded Center for Computational Mass Spectrometry at the University of California, San Diego, develops algorithms for large-scale analyses of mass spectrometry data and for two prominent service platforms for sharing mass spectrometry data. One platform is the Mass Spectrometry Interactive Virtual Environment,11 which contains more than 10,000 data sets that are assigned identifiers and shared in conjunction with publications.
From page 15...
... laboratories pride themselves on being stewards of data and tools across various domains. For example, in a partnership between DOE and the National Cancer Institute, Tourassi explores the use of high-performance computing and large-scale data analytics on cancer registry data -- 70 percent of data collected across cancer registries is text data that need to be curated.
From page 16...
... Tourassi concluded by emphasizing the value of sustained infrastructure investment, which can advance innovation, support scientific discovery, and improve clinical practice. Robert Williams, University of Tennessee Health Science Center, began his career with a study of brain architecture, which demanded a global approach and the integration of data with primitive tools such as Excel, FileMaker, and FileVision.
From page 17...
... He added that if an investigator can use the same platform for both data analysis and data sharing, incentives increase and costs decrease. The Center for Computational Mass Spectrometry's systems offer continuous reanalysis of data; as new knowledge becomes available, it is automatically transferred to the data sets over the course of a project, thus reducing the data analysis burden for researchers and connecting them to other researchers with overlapping data sets.
From page 18...
... Levenstein noted that data are active resources that extend beyond FAIR principles, thus creating both challenges and opportunities for the future. Tenenbaum asserted that financial risks and privacy risks are not mutually exclusive -- when a breach occurs, both types of risk are realized -- and noted that data provenance is another area in which to consider risk.
From page 19...
... With the emergence of tools that allow for joint analysis, there is an influx of transcriptomics data surfacing alongside the proteomics data. Martone wondered how repositories could communicate and coordinate in the absence of a centralized entity, and Bandeira noted that replication would be unavoidable.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.