Skip to main content

Currently Skimming:

4 The Cost-Forecasting Framework: Identifying Cost Drivers in the Biomedical Data Life Cycle
Pages 44-77

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 44...
... It can be applied by anyone who generates, collects, or manages data at some point in the data life cycle, or it may be applied by a funding or institutional official. The framework walks the cost forecaster through the various characteristics of data and information resources to determine which of those are likely to represent major cost drivers in the short and long terms.
From page 45...
... 5.  dentify the major cost drivers associated I •  Identify the major cost drivers and associated uncertainties for each of with each activity based on the steps above, the activities identified above by completing the cost-driver template including how decisions might affect future (Appendix E)
From page 46...
... . MAPPING COST DRIVERS TO ACTIVITIES IN EACH DATA STATE The fifth step in the forecast is identifying the cost drivers and decision points associated with each anticipated activity and how those decisions might affect the ways data may be used, as well as the cost of those uses.
From page 47...
... . When specifying and scoping a biomedical information resource, special attention should be given to these cost drivers because the ramifications of decisions related to them will strongly influence costs.
From page 48...
... , data management mandates, Institutional Review Board specifications, federal regulations, and journal requirements all influence costs across the data life cycle. Data management plans that incorporate costs and value across the data life cycle may reduce the cost and time required for later data deposit and sharing.
From page 49...
... Examples of how the template can be applied are provided in Chapter 5. Appendix F compares cost drivers for three hypothetical biomedical information resources (one for each data state)
From page 50...
... Capabilities B.1 User annotation ✓ ✓ ✓ ✓ B.2 Persistent identifiers ✓ ✓ ✓ ✓ ✓ ✓ ✓ B.3 Citation ✓ ✓ ✓ ✓ ✓ B.4 Search capabilities ✓ ✓ ✓ ✓ B.5 Data linking and merging ✓ ✓ ✓ B.6 Use tracking ✓ ✓ ✓ ✓ ✓ ✓ B.7 Data analysis and visualization ✓ ✓ ✓ C Control C.1 Content control ✓ ✓ ✓ ✓ C.2 Quality control ✓ ✓ ✓ ✓ ✓ ✓ ✓ C.3 Access control ✓ ✓ ✓ ✓ C.4 Platform control ✓ ✓ ✓ D
From page 51...
... Contributors and Users F.1 Contributor base ✓ ✓ ✓ ✓ ✓ ✓ ✓ F.2 User base and usage scenarios ✓ ✓ ✓ ✓ ✓ F.3 Training and support requirements ✓ ✓ ✓ ✓ ✓ ✓ F.4 Outreach ✓ ✓ ✓ ✓ G Availability G.1 Tolerance for outages ✓ ✓ ✓ ✓ ✓ ✓ ✓ G.2 Currency ✓ ✓ ✓ ✓ G.3 Response time ✓ ✓ ✓ ✓ ✓ ✓ G.4 Local versus remote access ✓ ✓ ✓ H
From page 52...
... Referring to the columns in Table 4.2, the forecaster can identify the major activities often associated with State 2 active repository and platform information resources (highlighted in green in the table; more detail about the activities is found in Table 2.2) , and then the likely influential cost drivers (checked boxes)
From page 53...
... A.2 Complexity and Diversity of Data Types Data in some biomedical information resources, such as The Cancer Genome Atlas,2 were collected expressly for the resource. In such a situation, the resource managers have strong influence over the specific formats, standards, required fields, and other elements.
From page 54...
... It might indicate who collected or generated the data, where, and when. Having such information with the data in the repository can improve trust in the data, thereby increasing the value of a biomedical information resource.
From page 55...
... A.4 Depth Versus Breadth A biomedical information resource might be directed at a certain class of data (e.g., DNA sequences or cell images) , regardless of the kind of study that generated them.
From page 56...
... This section covers aspects of a biomedical information resource that describe what information resource users are able to do with the data in the resource (i.e., without extracting the data into another environment)
From page 57...
... B.3. Citation A biomedical information resource might support citation of data items (or sets of items)
From page 58...
... B.6 Use Tracking A biomedical information resource might track uploads, access, and downloads of data items to inform contributors and resource operators about their use. Statistics of such operations may incentivize researchers to contribute data by providing evidence of data use.
From page 59...
... C.2 Quality Control A biomedical information resource may exercise more or less rigorous control on the quality of the information within it. At one extreme, it might leave all quality control to be the responsibility of the data contributors.
From page 60...
... C.3 Access Control A biomedical information resource might place restrictions on which users can see which data -- for example, if data are embargoed from general release for a certain length of time or the resource might provide private workspaces for individual users or groups. The data may also be consented for particular uses, in which case consent information will need to be linked to particular data items and consulted when deciding access permissions.
From page 61...
... E Data Life Cycle This section deals with aspects of a biomedical information resource that concern how it is expected to evolve over time.
From page 62...
... E.2 Update and Versions The frequency of updates and the need to retain past versions for a biomedical information resource affect operating costs. Some resources provide periodic releases, which batch updates and apply them all at once, whereas other resources are revised incrementally as updates come in.
From page 63...
... F Contributors and Users This section covers aspects of a biomedical information resource associated with user characteristics and numbers that might influence costs.
From page 64...
... F.2 User Base and Usage Scenarios The number of people accessing an information resource and the frequency and kinds of access can all influence costs for a biomedical information resource. A resource that serves an entire research community will likely see much more use than, say, an internal project repository for a single research group.
From page 65...
... Data availability encompasses the reliability of the resource hosting the data, how quickly new data appear, how fast requests for data are serviced, and from where the data can be accessed. G.1 Tolerance for Outages Different biomedical information resources have different tolerances for system outages.
From page 66...
... G.4 Local Versus Remote Access While most biomedical information resources of which the committee is aware support remote access over the Internet, there are examples in other domains (e.g., film archives, defense-personnel information) where users must physically come to the resource to access it.
From page 67...
... These issues are complex subjects and warrant more attention than can be given in this report, but the questions provided here will allow the cost forecaster to identify the relevant cost drivers. H.1 Confidentiality A biomedical information resource may need to protect the confidentiality of the data it holds, because those data contain either personally identifiable information or sensitive intellectual property.
From page 68...
... I.1 Periodic Integrity Checking As part of ongoing maintenance, operators of a biomedical information resource will need to assess the integrity of its hardware, software, and data. The frequency and detail of such assessments will affect operating costs.
From page 69...
... Example decision point related to system-reporting requirements: • What types of system reporting will the resource be required to do? I.5 Billing and Collections If the biomedical information resource charges for upload, access, and download of data, then there will need to be an operational function responsible for billing for and collection of those charges.
From page 70...
... J.2 Regulatory and Legislative Environment A biomedical information resource may be bound by laws and government regulations, particularly if it maintains information on individuals. Those requirements may entail additional record keeping or notification of 10  Seethe HDF Group at https://portal.hdfgroup.org/display/HDF5/Introduction+to+HDF5, accessed on May 12, 2020.
From page 71...
... J.3 Governance A biomedical information resource may have a policy-setting body for itself or as part of a larger organization. Policies may be set either initially or on an ongoing basis.
From page 72...
... Many of the activities and cost drivers in the template in Appendix E may not be directly applicable to a State 1 information resource, but the forecaster needs to remain aware of potential future cost drivers so that decisions might be made that could keep life-cycle costs low. In most circumstances, labor costs will be the largest single element of her cost forecast.
From page 73...
... Again, calculating the present discounted values of various options and courses of action will give the State 2 resource host a method to weigh the costs of one course of action against another. The present discounted value calculation will be particularly helpful given that a State 2 resource host must necessarily look a long way into the future, providing a way meaningfully to sum up the long stream of operating costs that will be encountered, as well as required periodic reinvestments.
From page 74...
... Once again, the characteristics of the data sets will probably be important predictors of storage costs and IT services; these will likely dominate the State 3 forecast. Labor costs may not be especially important once the data set is formatted for long-term retention, and facilities costs may be negligible.
From page 75...
... With the explosion of life science research and clinical data, and the hunger for good cost forecasts, establishing such a data-collection effort would be the first step to a better understanding of what will be needed, whether it is for the State 1 researcher, the State 2 active repository, or for State 3 long-term preservation. INFRASTRUCTURAL ELEMENTS NOT CONSIDERED IN THE COST MODEL There are many infrastructural or data environment systems, standards, services, and activities that are essential to data preservation and access broadly, and to biomedical data in particular, but where it does not make sense to try to allocate costs to specific sources or collections of data.
From page 76...
... 15  The website for ORCID is https://orcid.org/, accessed December 5, 2019. 16  The website for the Data Observation Network for Earth project is https://www.dataone.org/, accessed December 5, 2019.
From page 77...
... Knowledge Structures Standards and best practices for description of biomedical data objects rely not only on the use of identifiers as previously discussed but also on tools such as managed vocabularies and ontologies (i.e., knowledge structures)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.