Skip to main content

Currently Skimming:

1 Introduction
Pages 10-23

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 10...
... Archivists might also need to determine whether greater value should be placed on particular data types and to consider how risk management might inform data-preservation and archiving strategies. To support the scientific endeavor, all involved in managing the data throughout the data life cycle need to consider how choices regarding their data affect the costs of future preservation, management, and use, regardless of who bears those costs.
From page 11...
... COMMITTEE INFORMATION GATHERING AND APPROACH TO ITS TASK Given the complexity of the task, the committee employed a series of public meetings, a workshop, multiple site visits, and one-on-one communication to hear from the numerous stakeholders in the biomedical research community. These stakeholders included biomedical-science researchers at academic or nonprofit institutions; data scientists and institutional administrators within academic, private, and public sectors; data archivists; software engineers; data platform managers; and many others -- all individuals who make decisions about data throughout the entire data life cycle.
From page 12...
... ; ●  ost consequences for various practices in accessioning and deaccessioning data sets; C ●  conomic factors to be considered in designating data sets as high value; E ●  ssumptions built into the data collection and/or modeling processes; A ●  nticipated technological disruptors and future developments in data science in a 5- to 10-year A horizon; and ●  ritical factors for successful adoption of data forecasting approaches by research and program C management staff. The committee will provide two case studies illustrating application of the framework to different biomedical contexts relevant to the National Library of Medicine's data resources.
From page 13...
... The committee also met with NIH staff across institutes that run various data repositories. Ultimately, the committee members met with more than 100 individuals with different perspectives and expertise who engaged with research data at different states within the data life cycle.
From page 14...
... . Examples of primary research data repositories include the Protein Data Bank,5 the National Institute of Mental Health Data Archive, and the National Archive of Computerized Data on Aging;6 examples of knowledge bases include UniProt7 and the Monarch Initiative.8 The distinction between a database and a knowledge base is not always clean -- many digital repositories serve dual purposes -- but NIH defines the primary function of a data repository to "ingest, archive, preserve, manage, distribute, and make accessible the data related to a particular system or systems."9 The primary function of a knowledge base, according to NIH, is to "extract, accumulate, organize, annotate, and link growing bodies of information related to core datasets."10 A third type of digital artifact that does not fit neatly into these categories is the many digital spatial atlases that cover structures such as the nervous system (e.g., Brainspan Atlas of the Developing Human,11 Cell Atlas of Mouse Brain-Spinal Cord Connectome12)
From page 15...
... The NIF project shows that, of the 200 data repositories listed, only 18 have gone out of service or have merged with other entities, and approximately 10 others 15  The website for NIH's Data Sharing Repositories is https://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html, accessed August 13, 2019. 16  The website for the Neuroscience Information Framework is https://neuinfo.org/, accessed December 2, 2019.
From page 16...
... . FIGURE 1.2  The number of repositories in the Neuroscience Information Framework Registry classified by data type.
From page 17...
... The variety of expertise and types of infrastructures and services required to work with diverse data make it unlikely that biomedicine will ever be served by a single large data resource; multiple archives and data repositories will continue to exist, even for the same type of data. The advantages of such an approach are specialized tools and services and a certain amount of robustness and innovation in the ecosystem.
From page 18...
... TABLE 1.2.1  Research Data Management and Limitations Data Collection Data Analysis Data Sharing Limits The amount of time it takes 69.60 71.30 79.46 Lack of best practices 43.20 48.70 49.11 Lack of incentives 36.80 32.18 37.50 Lack of knowledge/training 32.80 40.87 41.07 The financial cost 17.60 8.70 22.32 Other 7.20 6.09 5.36 Motivations Prevent loss of data 100.00 85.83 78.57 Ensure access for collaborators 76.80 73.33 70.53 Openness and reproducibility 63.20 64.17 66.96 Institutional data policy 52.00 39.17 47.32 Publisher/funder mandates 35.20 28.33 41.96 Availability of tools 12.00 9.17 8.93 Other 3.20 3.3 0.0 NOTE: Limits and motivations for RDM during the data collection, analysis, and sharing/publishing phases of a research project. All values listed are percentage of total participants.
From page 19...
... . Communities attempting to implement FAIR principles recognize associated costs accrued by both the data provider and those providing data access.
From page 20...
... On the other hand, although perhaps too early to tell, implementation of the FAIR principles also has the potential to contribute to long-term data sustainability, as agreement on and adherence to standards and best practices can conceivably lower the cost of porting data from one archive to another. REPORT ORGANIZATION The statement of task requests a general cost-forecasting framework that is applicable to all data resources and throughout the data life cycle.
From page 21...
... Chapter 5 applies the framework of the cost forecast for a new repository and platform for biomedical research data, and Chapter 6 applies the framework to forecasting costs for new research in the primary research environment. Chapter 7 discusses potential economic, technology, policy, and legal disruptors that could affect data costs in the future.
From page 22...
... While NLM commissioned this study, the cost-forecasting framework presented as part of this report is intended to be useful across multiple stakeholder groups, including the following: • Researchers who need to estimate the costs involved in acquiring data, managing them effectively in the laboratory, and preparing them for submission to an archive; • Graduate students and other shorter-term research staff who may not see or appreciate the long-term cost benefits of good decision making related to data collection and curation; • Institutional officials at the researchers' home institutions -- these institutions bear significant shared and shifted operating and capital costs to maintain data infrastructure and supporting staff; • Archive managers who need to estimate costs when determining the amount of funding required to fulfill their mission and who may need to transfer their archives to platforms receiving greater or lesser use; • Program officers or other funding agency staff who are launching new programs and need to anticipate costs across the different stages of the project, including long-term preservation and access; and • Data preservationists who will need to estimate the costs for long-term preservation ahead of procuring or accepting data. Expanding this conversation among these and other stakeholders will not only advance data preservation, archiving, and access, but it will also foster rich scientific discovery.
From page 23...
... 2019. Guides for Researchers: How to Select a Data Repository?


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.