National Academies Press: OpenBook

Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs (2020)

Chapter: 6 Applying the Framework to a New Data Set

« Previous: 5 Applying the Framework to a New State 2 Data Resource
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×

6

Applying the Framework to a New Data Set

Per the statement of task, the cost-forecasting framework was applied to a second scenario, in this case, to the development of a new data set in a State 1 (primary research) platform.

USE CASE 2: ESTIMATING COSTS ASSOCIATED WITH A PRIMARY RESEARCH DATA SET

The cost-forecasting framework is applied to a proposed State 1 (primary research) data platform. The study committee applied the framework as might a young investigator (see Box 6.1). Box 6.2 demonstrates the logic introduced by the forecaster who, although enthusiastic, might be less experienced and unaware of available resources.

Applying the Framework to Use Case 2

Using the forecasting steps in provided in Table 4.1, the forecaster (in this case, the researcher) begins to construct the cost forecast.

Step 1. Determine the type of data resource environment, its data state(s), and how data might transition between those states during the data life cycle.

The forecaster examines the request for application (RFA) for requirements related to data management. Comparing the RFA requirements with the descriptions of the data states in Chapter 2, the forecaster determines this will be a State 1 (primary research) platform for her laboratory’s use. However, the forecaster also plans to transfer the data to a State 2 active repository. Funding for transfer activities between platforms will also be considered.

Step 2. Identify the characteristics of the data (Chapter 4), data contributors, and users.

In light of needs, goals, and RFA requirements, the following preliminary assumptions about the data are made that will be refined throughout the conduct of the cost forecast.

Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×

Data Characteristics (Section A, Appendix E)

  • The data are moderate in size: gigabytes (GB) per individual data set (several mature packages currently support fMRI).
  • There are a moderate number of files and moderate size of individual files.
  • Sizes of data sets will be stable over the life of the project.
  • There are multiple neuroimaging modalities.
  • The data are complex.
  • There are significant metadata requirements.
  • Data will come from a single contributor.

Data acquisition costs can be estimated because the number of subjects will be known ahead of time through institutional approvals. If the researcher decides to use a newer technology (e.g., multiband imaging), data sizes will increase fourfold to fivefold and the computational methods for processing and analyzing the data are less well known. In that case, the raw k-space data1 will be kept available for reprocessing as new algorithms and approaches emerge.

As the forecaster, at this point, is only estimating the costs for her own use of the data, she skips the questions regarding the user community (Section F, Appendix E) but does keep in mind that the data may be of value to others in the future.

Step 3. Identify the current and potential value of the data and how the data value might be maintained or increased with time.

Perceived value is difficult to predict. However, all data sets underlying the results of a study will be made public so that the data can be inspected and reanalyzed. The availability of public data sets may also encourage technology development if she chooses to use more advanced techniques. As outlined in Box 6.2, if the data are well annotated and prepared according to community standards, they might be an important source of information and data for designing future studies.

Step 4. Identify the personnel and infrastructure likely necessary in the short and long terms.

Based on consideration of State 1 (primary research) and activities necessary to prepare data for State 2 (active) as described in Tables 2.1 and 2.2, respectively, the forecaster identifies the relevant major activities. The project objectives, informed by the RFA, the relevant activities, and personnel necessary (based on Table 2.1) are listed in Table 6.1.

Step 5. Identify the major cost drivers associated with each activity based on the steps above, including how decisions might affect future data use and its cost.

Table 4.2 is consulted to understand the likely important cost drivers for a State 1 resource, and the cost-driver template in Appendix E is filled in too (see Table 6.2 shown after the discussion of the use case). In this application of the framework, the guiding questions in Chapter 4 and the template about cost drivers are not all applicable, and so the forecaster revises the template to help delineate costs and decision points in as complete a manner as possible.

The relative costs related to data acquisition for this use case are straightforward to predict using the cost-forecasting framework. Relative costs associated with cost drivers identified in Table 4.2 are provided below based on the assessment made while filling out Table 6.2. In a real-world application of the cost-forecasting framework, these costs would be quantified with the help of State 2 (active) repository resources.

___________________

1 K-space data are arrays of numbers that represent different spatial frequencies of the image.

Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
  • A: Content → Likely low-medium
  • B: Capabilities → Likely low
  • C: Control → Likely medium
  • D: External Context → Likely low
  • E: Data Life Cycle → Likely low-medium
  • F: Contributers and Users → Likely low-medium
  • G: Availability → Likely low-medium
  • H: Confidentiality, etc. → Likely medium
  • I: Maintenance and Operations → Likely low
  • J: Standards, etc. → Likely medium-high

TABLE 6.1 Map of the Use Case 2 Scenario to Data States, Activities, and Subactivities

Project Objectives and Tasks States, Activities, and Subactivitiesa Personnel
  1. Review of the literature and publicly available resources leads to a proposal to assess the feasibility of fMRI measurement techniques for this purpose.
I.B.1 Researcher, data scientist, software engineer, research domain project manager, policy specialist, administrative staff
  1. Consider various funding sources and determine that potential funders expect collected data to be publicly shared.
I.A.1, I.B.2 Researcher, records management specialist, data scientist, data librarian, education specialist, policy specialist, software engineer, research domain project manager, administration staff
  1. Assess suitability of existing repositories for the ultimate data deposit. Outline in data management plan the management and sharing approaches and costs estimates while data are under her stewardship. Consent methods for sharing data described.
I.B.3., I.B.4, I.B.5 Researcher, data scientist, software engineer, research domain project manager, policy specialist, administrative staff
  1. Consider available tools for collecting, processing, and validating data using community-accepted standards. Considers documentation and curation levels required.
I.A.2, I.A.3, I.C Researcher, records management specialist, data scientist, data librarian, metadata librarian, education specialist, policy specialist, research domain project manager, research domain curator, software engineer
  1. Data management processes are in place that maintain primary and derived data (given evolving technologies). Derived data may include data in deidentified form.
I.C.3 Researcher, metadata librarian, data scientist, research domain project manager, research domain curator, software engineer
  1. Deposit data in chosen repository on a regular schedule or when all data collection and analysis are complete.
I.D Researcher, research domain project manager, IT project manager, software engineer, data wrangler

a The activity numerals correspond with labels in columns of Table 2.1.

Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×

TABLE 6.2 Decision Points for Use Case 2

Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
A. Content
A.1 Size (volume and number of items)

> size = higher costs
  1. What is the order of magnitude of data that will be produced?
    GB.
  2. How large is an average data set?
    Per subject ~ 10 GB (multiple scans over time).
  3. Are the data sizes likely to stay stable over the life of the project?
    Yes.
  4. What is the total amount of data expected?
    ~400 GB.
  5. How many individual files in a typical data set?
    Hundreds.
  6. If the data are to be transferred to a repository for long-term management, is there a cost depending on size?
    No. Data will be submitted to OpenNeuro, which currently does not have costs associated with these data.
  7. Are there publicly available data that can be used to augment these data or perform preliminary analyses?
    No relevant data were found.
L-M
A.2 Complexity and Diversity of Data Types

>
complexity + diversity = higher cost
  1. How complex is the underlying structure of the data?
    Complex-image data.
  2. How complex is the experimental paradigm that produced the data?
    Standard fMRI block design.
  3. What sort of additional data are acquired along with the primary data?
    Cognitive assessments, statistical maps, demographic data.
  4. How many different data types are being produced?
    Multiple modalities.
  5. What are the relationships among these data types—for example, are the data correlated?
    Not applicable.
M
A.3 Metadata Requirements

> metadata amounts + type = higher cost
  1. How much metadata must be stored with the data to make them findable, accessible, interoperable, and reusable?
    Basic descriptive metadata, imaging parameters, experimental metadata, processing metadata, anatomical metadata.
  2. How are metadata recorded?
    In data file headers, in Neuro Imaging Data Model (NIDM), in laboratory notebooks, in BIDS manifests.
M
A.4 Depth Versus Breadth

> breadth = higher cost
  1. Is this study part of a multicenter study?
    No.
  2. How many institutions/collaborators are involved?
    Not applicable.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
A.5 Processing Level and Fidelity

> compression = lower cost
  1. Do the raw data need to be stored?
    K-space data not stored for standard fMRI. Will likely store k-space data if multiband imaging used.
  2. Do processed data need to be stored?
    Yes. Analyses are performed on the reconstructed data.
  3. Are there compression algorithms that can reduce the file size without compromising fidelity?
    Data files are not that large, so compression not typically used.
  4. What kind of data structure requirements will the resource have?
    No particular structure enforced by imaging center. Data submitted to OpenNeuro must be organized according to the BIDS standard.
  5. Is the data contributor or the repository responsible for any restructuring necessary?
    Researcher is responsible for restructuring data transferred to OpenNeuro.
  6. How is the data structure verified?
    BIDS validator will likely implement it within our imaging pipeline.
H
A.6 Replaceability of Data

> replaceability = lower cost
  1. Are there existing data sets that might be used instead of gathering primary data?
    Not to our knowledge.
  2. Are the data managed by an institutional repository?
    Our imaging center provides primary storage.
  3. Are there copies of the data elsewhere?
    Local copy of data kept on a workstation in laboratory.
  4. Can the data be easily recreated?
    No. It would be expensive to retest subjects. Disease progression information would be lost.
L
B. Capabilities
B.1 User Annotation

> user annotation functions = higher cost
  1. How long does it take to annotate/segment a data set?
    Processing does not take very long.
  2. Is the process largely manual or automated?
    Analysis data annotation is fully automated; experimental and descriptive metadata is added manually.
  3. Are these annotations stored with the data?
    They are in a separate file.
  4. Is the relationship (provenance) between the data file and the annotations recorded in the metadata?
    No, the association is captured through file-naming conventions.
L
B.2 Persistent Identifiers

type of identifier = potential costs
  1. What persistent identifiers are used when annotating these data (e.g., Open Researcher and Contributor Identifiers, Ontology IDs)?
    None.
  2. How are these persistent identifiers accessed?
    Not applicable.
L
B.3 Citation

> citation functions = increased cost
  1. Are the contributors to the production of a data set recorded in the metadata?
    No.
  2. Is there a plan to submit the data to a repository that supports data citation?
    Yes.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
B.4 Search Capabilities

> advanced search may lead to decreased cost
  1. Does the platform where the data are stored provide any search functions?
    Just the native functions of the storage system (search on file name, creation date, owner, etc.).
  2. Was a search performed to locate data sets that might be relevant to this study?
    Yes.
  3. What tools were used?
    OpenNeuro; PubMed.
L
B.7 Data Analysis and Visualization

> services = higher cost
  1. What type of data visualization tools are required?
    Interactive viewing of images and 3D volumes; visualization of statistical maps. Freely available open-source tools used.
  2. What types of other data operations need to be supported?
    Processing pipelines for the data; signal-extraction tools.
  3. Do these services require significant computational resources?
    Moderate.
  4. Is there an explicit cost associated with compute resources?
    Basic compute time is included with the fee paid to imaging center; many operations run locally on workstation.
L
C. Control
C.2 Quality Control

> quality control = increased cost
  1. What quality control processes are used?
    Some automated and manual inspection of the data for issues such as motion artifacts.
  2. Does the public data repository have any quality control requirements?
    OpenNeuro requires the data to be in BIDS format, so BIDS validator run.
L
C.3 Access Control

> controls = increased cost
  1. What types of access control are required for the data?
    Human-subjects data—institutional requirements for handling human-subjects data followed. Only qualified laboratory personnel can access the data.
  2. How is access to data managed, e.g., data access committees?
    The principal investigator is responsible for managing access to the data.
L
C.4 Platform Control

> platform restrictions = increased cost
Are there restrictions on the type of platform that must be used for storing or analyzing the data?
Yes. Data infrastructure must adhere to our institution’s security requirements for storing human-subjects data.
M
D. External Context
D.1 Resource Replication

> replication = increased cost
Is there a requirement to replicate the information resource at multiple sites (i.e., mirroring)?
The imaging center backs up primary data to a local private cloud. Costs associated with replication are included in our fee to the imaging center.
L
D.2 External Information Dependencies

> external dependencies may or may not = increased cost
Will the resource be dependent on information maintained by an outside source?
No.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
E. Data Life Cycle
E.1 Anticipated Growth

> growth = increased costs
  1. Is the total amount of data to be generated over the course of the project known?
    Yes.
  2. Are there any factors that might affect the amount of data?
    Not likely. The possibility that techniques used could increase data sizes has been accounted for, but approval gained to obtain data from a specified number of subjects and the processing pipelines, and so on, are well established.
L
E.2 Update and Versions

> updates + multiple versions = increased cost
  1. Are multiple versions of the data created?
    Yes, sometimes we have to reprocess individual subjects.
  2. If so, how are they managed locally?
    Through the file names.
M
E.3 Useful Lifetime

limited lifetime = decreased cost
  1. Are the data likely to have a limited period of usefulness?
    Hard to predict; it will depend on the rate at which imaging technology evolves and whether new processing approaches are developed to compare our data to data collected by new instruments.
  2. Are there specific data retention institutional or regulatory requirements for these data?
    Copies of all study data generally kept for at least 5 years after the study is completed.
L
E.4 Offline and Deep Storage

> offline/deep storage = decreased costs

> transfers = increased cost
  1. For long-term storage of laboratory data, are there offline/deep storage resources available?
    Yes, the institution runs a data archive for faculty research.
  2. Is there a plan for migrating laboratory data to a State 3 archive for long-term preservation?
    Yes, data will be placed in the institutional archive after the study is completed.
M
F. Contributors and Users
F.1 Contributor Base

> number and diversity of contributors = increased cost
  1. Is the number of contributors known? If not, can it be estimated?
    Just our laboratory members.
  2. Are all the data originating from the same source (e.g., a single instrument or a single organization)?
    Yes.
L
F.2 User Base and Usage Scenarios

> access and diversity of users = increased cost
  1. How many users will likely access the data?
    Laboratory members (currently six).
  2. What will be the frequency of access?
    Data accessed daily during the study and processing phase.
  3. How will users access the data?
    Necessary compute infrastructure is available—the data will be on local machines.
  4. Will the resource be building analysis tools?
    Yes, customized pipelines for processing our data, based on open-source toolkits, are built.
  5. How many different types of users must be supported?
    Not applicable.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
F.3 Training and Support Requirements

> training + services = increased cost
  1. Is special training required for data upload to the repository?
    Yes.
  2. What form will the training take?
    Online tutorials and workshops.
  3. How long will this training take?
    We will attend a training workshop on BIDS.
  4. What is the skill level required for data wrangling?
    Moderate knowledge of neuroimaging and computer skills.
M
G. Availability
G.1 Tolerance for Outages

> reliability = increased costs
What is the tolerance for outages of the resource?
Access to the data reliably is necessary. Will maintain adequate backups and system performance; scheduled outages for system patches and upgrades are tolerable.
M
G.4 Local Versus Remote Access

> cloud could lead to increased costs
  1. Does the resource require that any data be shipped via physical media?
    No, that is not likely. We have adequate bandwidth to transmit our data where required.
  2. Will commercial clouds be used?
    No, not for primary storage.
L
H. Confidentiality, Ownership, and Security
H.1 Confidentiality

> confidentiality = increased cost
  1. Will any of the data require special protections?
    Yes, they are human-subjects data.
  2. Are there any audit requirements for those who have accessed or downloaded the data?
    No, we expect no users outside of laboratory staff.
M
H.2 Ownership

> ownership = increased costs
  1. Do rights to use the data have to be negotiated with collaborators, institutions, commercial entities, or funders?
    No.
  2. Will all data be released under the same license, or will different permissions be assigned to different data sets?
    Data will be released under the license used by OpenNeuro.
  3. Will data submission agreements be necessary?
    No.
L
H.3 Security

> security = increased cost
  1. What types of security measures must be taken to protect against loss or corruption of data?
    Standard practices will be used.
  2. Do these measures require using protected computing, storage, or networking platforms?
    Yes.
L
I. Maintenance and Operations
I.1 Periodic Integrity Checking

> integrity checking = increased cost
  1. What processes will be put in place for checking the integrity of the hardware, software, and data?
    We do not have any specific processes for this.
  2. How frequently will these checks be performed?
    Not applicable.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Category Cost Driver Decision Points/Issues Relative Cost Potential (Low, Medium, High)
I.2 Data-Transfer Capacity

> data-transfer upgrades = increased cost
Will the bandwidth available be sufficient for the data sizes and rates required for transfer/access?
Yes. Campus connectivity recently upgraded. No problems anticipated.
L
I.3 Risk Management

> risk mitigation = increased cost
  1. Will the researcher be solely responsible for risk mitigation?
    Yes
  2. Is a response plan for unexpected termination required?
    No
H
I.4 System-Reporting Requirement

> system reporting-requirements = increased costs
What types of system reporting will the resource be required to do?
None.
L
I.5 Billing and Collections Will there be charges for use of the resource?
No. All laboratory members have free access.
J. Standards, Regulatory, and Governance Concerns
J.1 Applicable Standards

> mature standards = decreased costs
  1. How many different standards will be needed for the data?
    Will use BIDS and NIDM along with standard registration tools to a common coordinate space.
  2. Do these standards exist?
    Yes.
  3. Has the researcher worked with the standards before?
    Yes.
  4. Are the standards mature?
    Yes.
  5. Are tools (e.g., data validators and converters) available for the standards, or do they have to be developed?
    Yes.
  6. How frequently will the standards update?
    BIDS is a fairly mature standard. It is currently on version 1.2.1.
  7. Do the standards require spatial transformations?
    Yes.
  8. How many file formats will be supported?
    Digital Imaging and Communications in Medicine used.
  9. Is there an open file format available?
    Yes. Neuroimaging Informatics Technology Initiative.
H
J.2 Regulatory and Legislative Environment

> regulation = increased cost
  1. What laws and regulations cover the data and operation of the resource?
    HIPAA.
  2. Is the resource covered by an open-records act?
    Not applicable.
L
J.3 Governance

> outside governance = increased costs
  1. How are decisions regarding data use managed?
    Not applicable, no use outside the laboratory (i.e., no collaborators).
  2. Is a formal data-sharing agreement in place among the collaborators?
    Not applicable.
L
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×

Decisions made in the project planning stage, and the information resources available to the researcher during that planning, can influence the overall project costs, the study outcomes, and future data curation and preservation. For example, given that data might be transferred to a repository that has submission requirements, additional data preparation costs may be incurred. If the forecaster/researcher uses no formal data management software in the laboratory, a decision can be made to include additional costs in the budget to account for the effort. Funds could be requested for a data manager or wrangler to manage the data and set up the necessary infrastructure to adhere to data formatting standards. Automated pipelines could also assist transfer to a State 2 active repository on a regular basis. Cost to implement those pipelines may be greater up front but could also save many human hours over the duration of the project.

Because an individual forecaster, in this case a primary research environment researcher, cannot be responsible for estimating all costs for data management in perpetuity, the goal in applying the forecasting framework should be to estimate costs incurred during data acquisition and stewardship while they are in the researcher’s control (i.e., the costs incurred while data are in State 1). However, the forecaster needs to be aware of requirements for long-term stewardship and be ready with the resources required (e.g., time, money, personnel) to prepare data for transfer to a State 2 (active) repository if to be shared or, if not, to a State 3 repository for long-term preservation.

Step 6. Estimate the costs for relevant cost components based on the characteristics of the data and information resource.

In a quantitative cost forecast, the costs for the activities in the previous section would be quantified for each of the major cost components (e.g., Box 3.2). As noted previously in the report, quantifying costs is dependent on numerous case-specific factors such as the objectives for the information resource, the personnel and infrastructural resources available to the forecaster, and host institution requirements. In a real cost forecast, all of these would be considered to arrive at monetary values.

REFERENCE

Maumet, C., T. Auer, A. Bowring, G. Chen, S. Das, G. Flandin, S. Ghosh, et al. 2016. Sharing brain mapping statistical results with the neuroimaging data model. Scientific Data 3:160102. https://doi.org/10.1038/sdata.2016.102.

Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 97
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 98
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 99
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 100
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 101
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 102
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 103
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 104
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 105
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 106
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 107
Suggested Citation:"6 Applying the Framework to a New Data Set." National Academies of Sciences, Engineering, and Medicine. 2020. Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs. Washington, DC: The National Academies Press. doi: 10.17226/25639.
×
Page 108
Next: 7 Potential Disruptors to Forecasting Costs »
Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs Get This Book
×
 Life-Cycle Decisions for Biomedical Data: The Challenge of Forecasting Costs
Buy Paperback | $75.00 Buy Ebook | $59.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Biomedical research results in the collection and storage of increasingly large and complex data sets. Preserving those data so that they are discoverable, accessible, and interpretable accelerates scientific discovery and improves health outcomes, but requires that researchers, data curators, and data archivists consider the long-term disposition of data and the costs of preserving, archiving, and promoting access to them.

Life Cycle Decisions for Biomedical Data examines and assesses approaches and considerations for forecasting costs for preserving, archiving, and promoting access to biomedical research data. This report provides a comprehensive conceptual framework for cost-effective decision making that encourages data accessibility and reuse for researchers, data managers, data archivists, data scientists, and institutions that support platforms that enable biomedical research data preservation, discoverability, and use.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!