National Academies Press: OpenBook
« Previous: 2 Ensuring the Integrity of Research Data
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page59
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page60
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page61
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page62
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page63
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page64
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page65
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page66
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page67
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page68
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page69
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page70
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page71
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page72
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page73
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page74
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page75
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page76
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page77
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page78
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page79
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page80
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page81
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page82
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page83
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page84
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page85
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page86
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page87
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page88
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page89
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page90
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page91
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page92
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page93
Suggested Citation:"3 Ensuring Access to Research Data." National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2009. Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age. Washington, DC: The National Academies Press. doi: 10.17226/12615.
×
Page94

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Ensuring Access to Research Data The advance of knowledge is based on the open flow of information. Only when a researcher shares data and results with other researchers can the accuracy of the data, analyses, and conclusions be verified. Different researchers apply their own perspectives to the same body of information, which reduces the bias inherent in individual perspectives. Unrestricted access to the data used to derive conclu- sions also builds public confidence in the processes and outcomes of research. Furthermore, scientific, engineering, and medical research is a cumulative process. New ideas build on earlier knowledge, so that the frontiers of human understanding continually move outward. Researchers use each other’s data and conclusions to extend their own ideas, making the total effort much greater than the sum of the individual efforts. Openness speeds and strengthens the advance of human knowledge. As an example, Box 3-1 describes how the shar- ing of genomic data has advanced life sciences research. Finally, only by sharing research data and the results of research can new knowledge be transformed into socially beneficial goods and services. When research information is readily accessible, researchers and other innovators can use that information to create products and services that meet human needs and expand human capabilities. The Organisation for Economic Co-operation and Development (OECD) describes a new effort to enhance public access to research data (see Box 3-2). According to this approach, “Openness means access on equal terms for the international research community at the lowest possible cost, preferably at no more than the marginal cost of dissemination. Open access to research data from public funding should be easy, timely, user- friendly and preferably Internet-based.” As the National Research Council’s  “OECD Principles for Access to Research Data from Public Funding,” Available at http://www. oecd.org/dataoecd/9/61/38500813.pdf. 59

60 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA BOX 3-1 Access to Genomic Data In biology, the culture of research and the applications of digital technologies have traditionally been heterogeneous, independent, and dispersed. However, the growth of interdisciplinary research, the advent of projects that have generated large volumes of data, and the invention of data-intensive devices such as DNA micro­arrays and high-throughput sequencers have highlighted the increasing importance of digiti- zation of the biomedical sciences.a In the field of genomics, strong forces have pushed in the direction of unrestricted access to data, including directives from funding agencies, requirements from jour- nals that researchers submit data to public repositories, community expectations, and the development of powerful data-sharing systems such as PubMed. In the case of the human genome, for example, the desire by funding agencies, researchers, and the general public for public access to research data led the genomics research com- munity to develop an ethic of unrestricted access. This ethic was formally adopted as the “Bermuda statement” in February 1996: All human genomic information produced at large-scale sequencing centres should be freely available and in the public domain, in order to encourage research and development and to maximize its benefit to society.b At the same time, other forces have had the effect of restricting access to ­genomics data, including: • The need to protect patient or individual privacy; • The principal investigator’s desire to maintain research advantage; • The danger of misuse (e.g., of virus sequences); • A profit motive (for data with potential commercial value); • The tendency to “publish and forget” used data, especially supplementary data. Committee on Issues in the Transborder Flow of Scientific Data stated in its report Bits of Power: Issues in Global Access to Scientific Data, “The value of data lies in their use.” The norms and traditions of research reflect the value of openness. ­Researchers receive intellectual credit for their work and recognition from their peers—and perhaps from the broader community of researchers and the public—when they publish their results and share the data on which those results are based. Some  National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data. Wash- ington, DC: National Academy Press.

ENSURING ACCESS TO RESEARCH DATA 61 The generation of complete genome sequences for a growing number of organ- isms has intensified the digitization of biomedical research. These data have many applications in both basic and applied research, with the lines between the two often being difficult to discern. For example, computational processing and reference to information and knowledge bases about organisms and disease processes allow researchers to reach faster conclusions about the likely results of a therapy.c The combination of cellular data, genomic profiling, and biological simulation may reduce the failure rate of drug candidates and the cost of testing. In the near future, it will even be possible, given sufficient computing and storage resources, to record the genotype of each person in a secure database. Variations in genes may indicate specific disease susceptibility or responses to known drug types. This information could enable physi- cians to prescribe a personal immunization and screening schedule or to recommend specific preventive measures for each patient. Further integration of the biomedical sciences using digital technologies could allow independent investigators to remain the engine of innovative research by par- ticipating in “virtual team science.” Early examples of such “cyberinfrastructure”— i ­ncluding the Biomedical Informatics Research Network, myGrid, and the cancer Bio- medical Informatics Grid—indicate that it is technically feasible, if not easy, to integrate the many threads of biomedicine. The challenge is to ensure that new “cybersilos” do not replace existing disciplinary and institutional silos.d a “The race to computerize biology.” 2002. Economist, Dec. 12, 2002. b David R. Bentley. 1996. “Genomic sequence information should be released immediately and freely in the public domain.” Science 274:533–534. This statement was written on behalf of the Sanger Institute at the Wellcome Trust Genome Campus and the Genome Sequencing Center at Washington University in St. Louis. c Chris Sander. 2000. “Genomic medicine and the future of health care.” Science 287:1977–1978. d Kenneth H. Buetow. 2005. “Cyberinfrastructure: Empowering a ‘third way’ in biomedical research.” Science 308: 821–824. journals require the submission and public dissemination of the data supporting an accepted manuscript. Funding agencies and research institutions also have policies that require the open sharing of the data on which research conclusions are based. Codes of conduct in a research community, whether explicit or tacit, can exert a powerful influence on researchers to make data accessible. Advances in information technology—for instance, the advent of grid com- puting and cloud computing—will continue to transform the environment for  In grid computing, distributed computing resources link experimental apparatus, processing, analysis, and storage; cloud computing involves large-scale, data-intensive, Internet-hosted applica- tions and related infrastructure.

62 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA BOX 3-2 OECD Principles and Guidelines for Access to Research Data from Public Funding From 2004 to 2006 the 30-nation Organisation for Economic Co-operation and Development (OECD) developed a set of guidelines based on commonly agreed principles to facilitate cost-effective access to digital research data generated through public funding. Endorsed by the OECD Council on December 14, 2006, the “OECD Principles and Guidelines for Access to Research Data from Public Funding” serve as objectives for each member country to achieve given its own legal, cultural, economic, and social context. The Principles and Guidelines cover 13 broad areas: Openness Flexibility Transparency Legal conformity Protection of intellectual property Formal responsibility Professionalism Interoperability Quality Security Efficiency Accountability Sustainability The Principles and Guidelines call “for a flexible approach to data access” under a default principle of openness and recognize “that one size does not fit all.” They also state that “Whatever differences there may be between practices of, and policies on, data sharing, and whatever legitimate restrictions may be put on data access, practi- cally all research could benefit from more systematic sharing.” NOTE: For more information, see Organisation for Economic Co-operation and Development. 2007. OECD Principles and Guidelines for Access to Research Data from Public Funding. Avail- able at http://www.oecd.org/dataoecd/9/61/38500813.pdf. research and lower the technical barriers to sharing data. As this transformation occurs, researchers are organizing their work in new ways to take advantage of new possibilities. An innovative example is the conduct of research in what can be called an open-knowledge environment. Building on the methodology pioneered by the open-source software movement, this approach begins with  Economist. 2004. “An Open-Source Shot in the Arm?” June 10.

ENSURING ACCESS TO RESEARCH DATA 63 the identification of a problem that is to be examined in a public forum on the Internet. Researchers from different disciplines, organizations, and countries then can all contribute to solving the problem, with the open sharing of data and ideas that might bear on that problem. An open-knowledge environment allows people with many different backgrounds and viewpoints to interact in a relatively unstructured way while moving toward a common objective. The free flow of information speeds progress, while the global reach of the Internet greatly expands the number and breadth of researchers who can con- tribute to a project. Another approach to sharing is open-notebook science.  Similarly, blogs, wikis, and other forms of electronic interaction are tools that enable collaborative work on common problems in a generally open research environment. In the context of this report, sharing research data enhances the data’s integ- rity by allowing other researchers to scrutinize and verify them (as described in the Chapter 2). Sharing also increases the likelihood that data will be preserved for long-term uses, although the stewardship of data requires more than that the data be accessible (as described in the Chapter 4). Thus, the three themes of this report—integrity, accessibility, and stewardship—are intertwined. BARRIERS TO SHARING DATA Despite the many benefits to be gained by the sharing of research data and results, even a cursory survey of research activity reveals many circumstances in which access to data is limited. Because researchers require time to verify data, analyze their data, and derive research conclusions, individual researchers generally are not expected to make all their data public immediately. Individual researchers need latitude to follow hunches, experiment with methods, explore conjectures, and make mistakes. New tools for automatically assessing the quality of data and sharing them with others can facilitate the rapid sharing of digital data, although verify- ing the reliability of these tools presents its own set of challenges. Once a research result is published, the norms of science—and often the terms of the research grant or contract—call for the supporting data to be accessible. Researchers may nevertheless try to keep the data private, perhaps to derive additional results without competition from others, for the exclusive use of a student or postdoctoral fellow whose career would be advanced by generating further papers, or just to avoid the effort to put the data in usable form for others. In the worst cases, they may retain data to hide acts of research misconduct or to conceal defects in the dataset. The norms of a research community may allow keeping data private for a certain period. These norms can be formalized through the terms of a grant  Katherine Sanderson. 2008. “Data on display.” Nature. 455:273.

64 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA giving the investigator a defined period of exclusive use of the data, with the exclusivity ending upon the publication of results, after a particular length of time, or when data are deposited in a data center or archive. There is great variation among research fields in their data-sharing norms, to such an extent that different fields can be said to have different data cultures. (Box 3-3 describes aspects of the data culture in economics.) A recent report commissioned by the Research Information Network of the United Kingdom examined data-sharing practices and expectations across a number of fields (Table 3-1). The report highlights the global importance and relevance of data accessibility in research, as well as the fact that differences between fields are often more important than national differences in determining data-sharing practices. The international aspects of data access and sharing are discussed in more detail below. Observational astronomy offers a good example of the data-sharing norms that can characterize a field of research. Astronomical data often can be used for multiple purposes and are usually made public, but proprietary periods in which only the members of a research team have access to data are common. The European Southern Observatory (Europe’s large optical observatory) and the National Aeronautics and Space Administration have 12-month proprietary periods. The U.S. National Optical Astronomy Obser- vatory has an 18-month proprietary time. These periods provide researchers with an opportunity to make discoveries as a reward for dedicating significant periods of their careers to creating new facilities and developing new tech- niques. They also provide an opportunity for critical evaluation of the data before they are released. In the high-energy physics community, collaborations are so large and the experiments so complex—with hundreds of scientists involved with the opera- tion of a single detector—that it could take years for an independent scientist to learn enough to reanalyze the data. The data of each collaboration are treated as proprietary. Other groups that want to undertake the same measurement must form their own large collaboration and repeat the experiment. As explained in Box 2-1, large collaborations in high-energy physics involve elaborate proce- dures for internal scrutiny of and validation of data. Cultural norms and expectations in research fields regarding data can change over time. For example, as data sharing has proven increasingly valuable to the advancement of research in many areas of the life sciences, researchers, sponsors, research institutions, and other stakeholders have built new infra- structure and established guidelines to facilitate data sharing. A 2003 National Research Council study (Box 3-4) recommended guidelines for the sharing of  Alma Swan and Sheridan Brown. 2008. To Share or not to Share: Publication and Quality Assur- ance of Research Data Outputs. Report Commissioned by the Research Information Network. June. Available at: http://www.rin.ac.uk/data-publication.

ENSURING ACCESS TO RESEARCH DATA 65 BOX 3-3 Data Sharing Within Economics Economists rely on an enormous variety of research data—for instance, adminis- trative data from government records, datasets provided by companies to the federal government, or data provided directly to researchers by companies. Some economists rely on methods similar to those used by anthropologists, in which large quantities of data are collected and analyzed. Often the datasets are subject to confidentiality agreements because individuals could be identified from the data. Use of the data may even be restricted to “enclaves,” where a researcher has to work on a nonnetworked computer in a secure room from which materials cannot be removed. Analysis of economic data may depend critically on highly complex computer programs. These programs, rather than the actual data, can be the most valuable part of an economist’s research, because many datasets are available publicly, whereas a computer program could embody months or years of individual effort. Thus, to assess the original analysis, other researchers often need access to the computer programs as well as to the original data. As in other sciences, the social sciences have an expectation of reproducibil- ity—that if the data are available and analyzed with the same assumptions, the same results will emerge. But without considerable assistance from the original researchers, actual replication of published results in economics can be time-consuming, tedious, and subject to many errors. Furthermore, journals are reluctant to publish studies that are confirmatory rather than groundbreaking. Social scientists, like other scientists, are more interested in doing their own studies and getting credit for something new than in repeating work that has already been done. Even if replication is not common, the data should be available to enable replica- tion, but in economics this often is not the case.a Several years ago two economists wrote to the authors of every paper in the March 2004 issue of the American Economic Review, a leading journal in the field, and requested the data to replicate the research. Although the journal has a statement saying “Authors are required to maintain their data and supply it to other researchers upon request,” 14 of the 15 sets of authors to whom the economists wrote said that they did not have the data or would not share them. The authors summarized their findings in an article and submitted it to the American Economic Review, which published their paper. As a result of this and other cases, the American Economic Review adopted a new policy. For published articles, the authors must provide both the data and the programs sufficient for the articles’ findings to be replicated. These data and programs are then posted on the journal’s Web site. If the use of the data is restricted, the authors must provide instructions on how to obtain permission to use the data. If some of the data are proprietary, the editors try to work out ways for other researchers to use the data. In addition, the journal is encouraging studies to reanalyze data and replicate results. The American Economic Review is supported by dues from 20,000 members and has the resources to institute such a policy, whereas journals with fewer resources could have difficulty adopting and enforcing the same or similar policies. Also, the data and programs are not requested at the time of submission of an article—only upon acceptance—so that the 92 percent of the papers submitted to the journal that are rejected do not fall under the new guidelines. Some economists have decided not to submit a paper to the American Economic Review because they do not want to release their data or software. Nevertheless, because authors want to publish their papers in the journal, it has considerable influence over their actions. a Robert A. Moffitt, American Economic Review, Presentation to the committee, April 17, 2007.

TABLE 3-1  Summary of the Data-sharing Environment in Various Fields in the United Kingdom 66 Overall propensity to Effect of policy initiatives publish datasets (with Infrastructure-related to encourage data appropriate metadata and Culture of sharing data barriers to publishing data publishing contextual documentation) Astronomy Strong culture of sharing Low level of barriers Policy has medium Strong propensity to publish positive effect datasets Chemical crystallography Medium culture of Low level of barriers Policy has little positive Strong propensity to publish sharing effect datasets Genomics Strong culture of sharing Low level of barriers Policy has strong positive Strong propensity to publish effect datasets Systems biology Medium culture of Moderate level of barriers Policy has strong positive Medium propensity to sharing effect publish datasets Classics (Humanities) Strong culture of sharing High level of barriersa Policy has medium Medium propensity to positive effect publish datasets Social and Public Health Weak culture of sharing Low level of barriers Policy has little positive Low propensity to publish Sciences effect datasetsb RELUc Medium culture of Low level of barriers Policy has medium Medium propensity to sharing positive effect publish datasets Climate Science Weak culture of sharing Low level of barriersd Policy has medium Low to medium propensity positive effect to publish datasets a The Arts and Humanities Data Service was established in 1995 to provide a national service to collect, preserve, and promote electronic resources in the arts and humanities; its funding was eliminated in 2008. b This descriptor covers researchers not directly connected with a national data collection. c The Rural Economy and Land Use Program is a collaborative research program among several UK research councils. d The Natural Environment Research Council provides data centers. SOURCE: © Research Information Network. 2008. To Share or not to Share: Publication and Quality Assurance of Research Data Outputs. June. http://www. rin.ac.uk/data-publication.

ENSURING ACCESS TO RESEARCH DATA 67 BOX 3-4 Sharing Publication-Related Data and Materials In 2003 the National Research Council Committee on Responsibilities of Author- ship in the Biological Sciences released a report that focused directly on the issues discussed in this chapter. In that report, the committee established what it called “the uniform principle for sharing integral data and materials expeditiously” (UPSIDE). They described this principle as follows: Community standards for sharing publication-related data and materials should flow from the general principle that the publication of scientific information is intended to move science forward. More specifically, the act of publishing is a quid pro quo in which authors receive credit and acknowledgment in exchange for disclosure of their scientific findings. An author’s obligation is not only to release data and materials to enable others to verify or replicate published find- ings (as journals already implicitly or explicitly require) but also to provide them in a form on which other scientists can build with further research. All members of the scientific community—whether working in academia, government, or a commercial enterprise—have equal responsibility for upholding community standards as participants in the publication system, and all should be equally able to derive benefits from it. The committee also identified five corollary principles associated with sharing publication-related data, software, and materials. For example, the committee stated that “authors should include in their publications the data, algorithms, or other infor- mation that is central or integral to the publication—that is, whatever is necessary to support the major claims of the paper and would enable one skilled in the art to verify or replicate the claims.” The committee noted that its purview extended only to the biological sciences. It also stated, however, that “in the committee’s view, there should be a single scientific community that operates under a single set of principles regarding the pursuit of knowledge. This includes a common ethic with regard to the integrity of the scientific process and a long-held commitment to the validation of concepts of experimentation and later verification or refutation of published observations.” SOURCE: National Research Council. 2003. Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. Washington, DC: The National Academies Press. data and other information supporting research results that emphasize open- ness and expanded access, including research performed by companies.  Although the charge to our committee excluded privacy and other issues  National Research Council. 2003. Sharing Publication-Related Data and Materials: Responsibili- ties of Authorship in the Life Sciences. Washington, DC: The National Academies Press.

68 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA related to human subjects from our study, it is important to note that these issues can act as barriers to data access. Some data are not released because of confidentiality or privacy considerations, such as data related to biomedical research or the social sciences. For example, the 1996 Health Insurance Por- tability and Accountability Act established rules for disclosure of individually identifiable health information (known as protected health information, or PHI). If PHI is used in research, the researcher must comply with regulations regarding its use and storage in the project. There are instances where PHI may be disclosed, but the need to support published research is not among them. For PHI to be made publicly available, a subject must agree to the disclosure of the information. For some medical research data, privacy and confidentiality obstacles can be overcome by removing identifiers prior to the private sharing of data or the public release of data. However, this remains an area of ongoing concern and investigation. Efforts are now under way to make medical research data avail- able while ensuring that the data cannot be used to identify individuals. Research data also can be kept private because they pertain to intelligence, military, or terrorist activities. Examples include research related to nuclear, radiological, and biological threats; human and agricultural health systems; chemicals and explosives; and information technology infrastructure. National Security Decision Directive 189 (NSDD 189), which was issued by President Ronald Reagan in 1985, states that the policy of the U.S. government is not to restrict, to the maximum extent possible, the products of unclassified funda- mental research.10 The challenge to policy makers and researchers is where to draw the line between classified and unclassified information and how to bal- ance restrictions on access to sensitive information with the potential costs of such restrictions. Our committee was not asked to examine national security issues in depth. Other National Research Council committees, including the Committee on Sci- entific Communication and National Security (CSCANS), are directly focused on issues such as classified information, export controls, and nonimmigrant visa policies. A recent CSCANS report points out that many federal government policies and practices since the September 11 attacks have effectively reversed NSDD 189.11 The report calls for a standing entity to review policies in order to  Instituteof Medicine. 2006. Effect of the HIPAA Privacy Rule on Health Research: Proceedings of a Workshop Presented to the National Cancer Policy Forum. Washington, DC: The National Academies Press.  National Research Council. 2007. Science and Security in a Post 9/11 World: A Report Based on Regional Discussions Between the Science and Security Communities. Washington, DC: The National Academies Press. 10 National Policy on the Transfer of Scientific, Technical and Engineering Information. Sep- tember 21, 1985. 11 Ibid.

ENSURING ACCESS TO RESEARCH DATA 69 ensure that the small risks of basic research being misused are balanced with the enormous benefits that accrue from the free exchange of information. Another National Research Council Committee examined the national security implica- tions of access to genomic databases and found that unrestricted access, com- bined with the development of education programs by professional societies, is the best approach to balancing the advancement of knowledge with protecting the public from misuse of genomic data for bioterrorism threats.12 The federal government’s creation in 2008 of a new category—“Controlled Unclassified Information”—illustrates that restrictions on the sharing of research based on national security concerns will continue to pose challenges to the research enterprise.13 When research is carried out or sponsored by public agencies, the general presumption in the United States is that data generated as part of that research should be publicly available.14 Different considerations apply for research funded by a private company, whether that research occurs within a company or in the academic sector. Though some companies have been experimenting with the benefits of freely sharing results from proprietary research,15 many companies carefully guard this information as a trade secret and a potential source of com- mercial advantage. Similarly, an academic researcher may temporarily withhold data in order to file a patent or develop a commercial product, even when the research is publicly funded. These issues are discussed later in this chapter. The cost of disseminating data can be a barrier to its use. Circular A-130 from the Office of Management and Budget (OMB) stipulates that ­government- generated data should be available to users at cost sufficient to recover the expense of dissemination but not higher.16 However, data from private sources, even when purchased by the federal government for research purposes, fre- quently have high distribution costs and restrictions on redistribution. These costs can be a significant problem for academic researchers who need access to large databases for modeling or data analysis. Finally, research data may be kept private because the resources are lack- ing to make data collections available to the public. A project might generate data that could be valuable to researchers in the same or other fields, but the 12 National Research Council. 2004. Seeking Security: Pathogens, Open Access, and Genome ­ atabases. Washington, DC: The National Academies Press. D 13 George W. Bush. 2008. “Designation and Sharing of Controlled Unclassified Information (CUI).” Memorandum for the Heads of Executive Departments and Agencies. May 9. 14 Paul F. Uhlir and Peter Schröder. 2007. “Open data for global science.” Data Science Journal 6:OD36–OD53. 15 Bernard Munos. 2006. “Can open-source R&D reinvigorate drug research?” Nature Reviews Drug Discovery 5:723–729. 16 Office of Management and Budget. No date. Management of Federal Information Resources. Circular A-130. Memorandum for Heads of Executive Departments and Agencies. Available at http://www.whitehouse.gov/omb/circulars/a130/a130trans4.html.

70 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA investigators who generated those data may not have the resources or capabili- ties needed to make them available. This is frequently the case in small-scale research that does not have funding set aside for such functions or does not have a robust data management component in place. Alternatively, the data may be available, but the essential metadata needed to understand and use those data may be missing, making the data useless for anyone outside the immediate research team. In general, researchers have a strong incentive to release the results of research. Their own recognition and advancement in their field generally depend on public dissemination of those results. In contrast, researchers have traditionally had few incentives to make publicly available the data they gener- ate in the course of research. However, those data may have great value for other researchers, and making data publicly accessible can speed the advance of knowledge. THE COSTS OF LIMITING ACCESS TO DATA Barriers that restrict access to data, such as withholding data or delaying their release, can result in substantial costs.17 Once data have been gathered from an instrument or compiled from other sources, it is obviously more cost- effective to share the data than to reconstruct or recompile them. Furthermore, resources spent accessing data then are not available for other research uses. Limitations on research data also can be barriers to innovation, which incurs costs in the broader society.18 In today’s economy, the creation of new goods and services often depends on access to research data. When access is withheld, economic innovation slows, reducing the returns to investments in research. Limiting access to research data also hinders the kinds of interdisciplinary and international cooperation that has proven so productive in recent research. When data are restricted to a particular research team or field, other researchers not only cannot use the data but often cannot even ascertain the value of those data to their own research. Similarly, if students are unable to work with new research data, their education and training may be adversely affected. Limitations on the accessibility of data invariably retard, and can even block, the process of verifying the accuracy of those data. As a result, the quality of the data could be lower than would be the case if they were freely available, again reducing the return on the investment in producing the data. Finally, researchers who are deprived of access to data are disadvantaged in conducting research and possibly seeking support to do research. This 17 Uhlir and Schröder, op. cit., pp. OD42–OD43. 18 National Research Council. 1999. A Question of Balance: Private Rights and the Public Interest in Scientific and Technical Databases. Washington, DC: National Academy Press.

ENSURING ACCESS TO RESEARCH DATA 71 problem can be particularly acute in developing countries, where lack of access to data from developed countries can stymie not only the development of research capacity but advances in economic productivity, public health, and well-being. DATA ACCESS ISSUES IN RESEARCH AFFECTING PUBLIC POLICY OR PRIVATE INTERESTS Restricting access to data can be costly and wasteful, but there also are circumstances in which providing access to data can entail substantial costs and waste. There are situations in which responding to requests for data could actually slow the progress of research, and there have been instances in which requests for data have been intended to inhibit research. It is not uncommon for a small research group to lack the resources to make data readily accessible. Especially as data collections grow in size and complexity, small groups may have difficulty providing data to other ­researchers in the same field, much less making data readily accessible to researchers in fields less directly connected to the research, or to the public. Access to data can also become an issue of contention in cases where research has important implications for public policy or has a potential for affecting private interests in such areas as the environment or health. An early example was the case of Paul Fischer, who was subpoenaed in the early 1990s by a tobacco company after publication of his research showing widespread recognition among young children of the “Joe Camel” character used in ciga- rette advertising.19 Fischer initially was subpoenaed in a lawsuit to which he was not subject. In addition to requesting details about the research that would be considered reasonable and necessary to replicate the results, the subpoena con- tained more problematic demands, such as personal details about the subjects. According to his own account, Fischer’s institution, the Medical College of Georgia, refused to provide legal support. After Fischer, using his own ­attorney, had quashed the subpoena, the Medical College of Georgia’s counsel wrote an article that had the effect of alerting R.J. Reynolds to an alternative legal mechanism, the Georgia Open Records Act. Under this act, Fischer ultimately was compelled to turn over all the information except the children’s names. Perhaps the most famous recent example involved a research project to reconstruct global temperature trends over the last two millennia. A 1998 paper by Michael Mann of Pennsylvania State University and two co-authors made extensive use of proxy studies in which paleoclimatic conditions were inferred from measurements of tree rings, sediments, coral, glaciers, oxygen isotopes, and other phenomena, concluding that global surface temperatures 19 Paul M. Fischer. 1996. “Science and subpoenas: When do the courts become instruments of manipulation?” Law & Contemporary Problems 29:159–167.

72 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA were relatively stable for 900 years and then rose rapidly between 1900 and 2000, providing a fingerprint for human-caused climate change. 20 After the release of the Third Assessment Report of the Intergovernmental Panel on Climate Change that cited this finding in 2001, it became a point of contention in debates over the reality and causes of global warming. Mann resisted when researchers skeptical of his work requested access to the underly- ing data and computer programs used in the reconstruction, and controversy ensued.21 Two Members of the U.S. House of Representatives issued a letter requesting a wide variety of information from each of the three co-authors of the paper, giving them 18 days to provide, among other things, a curriculum vitae with a list of all studies they authored on climate change and the specific sources of funding; a list of all financial support received from private, state and federal sources for climate-related work; the location of all underlying data archives related to such research and its specific availability; correspondence regarding requests for such data from other researchers, responses to such requests and the researchers’ reasons for their decisions, and in-depth responses to inquiries about their work on bristlecone pines and the Intergovernmental Panel on ­Climate Change.22 This request was viewed by some as intimidation.23 The National Research Council released a study in 2006 that examined the rap- idly emerging field of multiproxy paleoclimate studies.24 The report ultimately affirmed some, but not all, of the key results of Mann’s work, while stating that “all research benefits from full and open access to published datasets and . . . a clear explanation of analytical methods is mandatory.” The report also points to the need for researchers, professional societies, journals, and research sponsors involved in paleoclimate research to improve access to data and methods. 25 This is not an isolated example of a research field with highly charged policy implications. Research data and findings have a substantial influence on a growing number of issues, ranging from arms control to air quality, endangered species, environmental toxins, and school vouchers.26 In many of these cases, researchers are being asked to contribute information in areas 20 Mann, Michael E., Raymond S. Bradley, and Malcolm K. Hughes. 1998. “Global-scale tempera- ture patterns and climate forcing over the past six centuries.” Nature 392: 779–787. 21 Geoff Brumfiel. 2006. “Academy affirms hockey-stick graph.” Nature 441:1032. The research- ers requesting the data and other information were Stephen McIntyre and Ross McKitrick. 22 Letter from Representatives Joe Barton and Ed Whitfield to Dr. Raymond S. Bradley, June 23, 2005. Available at http://republicans.energycommerce.house.gov/108/Letters/062305_Bradley.pdf. 23 Letter from Dr. Alan I. Leshner to Representative Joe Barton, July 13, 205. Available at http:// www.aaas.org/spp/cstc/docs/05-7–13climatebarton.pdf. 24 National Research Council. 2006. Surface Temperature Reconstructions for the Last 2,000 Years. Washington, DC: The National Academies Press. 25 The wording in this paragraph has been changed to correct some factual errors. 26 See the list of “Examples of Political Interference in Science” maintained by the Union of Concerned Scientists at http://www.ucsusa.org/scientific_integrity/interference/a-to-z-­alphabetical. html.

ENSURING ACCESS TO RESEARCH DATA 73 where government has responsibility for public health and well-being, such as environmental quality regulations or the legal responsibility of manufacturers for product harms. In these areas, the role of research is increasingly being challenged by those who oppose particular regulations, laws, or legal rulings. 27 These cases raise important and difficult questions: When are researchers justified in withholding underlying data and methods? What recourse do col- leagues, policy makers, and the public have when data or methods underlying research on important policy issues are withheld? What is the line between harassment that unreasonably slows the pace of research and justified requests for information? These trends point to the need for clearer standards and understand- ings between researchers, their employers, and the public about the overarch- ing value of openness, as well as the circumstances under which requests or demands for data are reasonable and when they cross the line into the realm of harassment that can slow the advance of knowledge. There are important and complex questions about how to balance the need for important data to be widely accessible, with fundamental issues of academic freedom, ­confidentiality, and the need for researchers to carry out their studies free of harassment, intimidation, or outside pressure. OWNERSHIP OF RESEARCH DATA AND RELATED PRODUCTS Addressing the question of “who owns research data” is a key element of the authoring committee’s charge. There is a range of possible answers, includ- ing the researcher, the institution, the sponsor, or nobody, depending on the particular meaning of “ownership” and the context. This section will review the laws and policies relevant to the ownership of research data and related rights to control its dissemination and use. The next section will cover other laws and policies related to research data, focusing on obligations to keep or share data. To begin with, general principles of property law apply to the media on which data are stored and may also apply to the bits themselves in the case of digital data. One analogy is to the master recording in the music business when analog technology dominated. The owner of the master tape has a property right in the object but does not necessarily own the copyright that controls the copy- ing and distribution of the data stored in the recording. Similarly, the researcher, his or her institution, or the sponsor (depending on the terms of the research grant or contract) may own the medium on which the data are stored. More important, for the purposes of this discussion, than ownership of the physical storage media are intellectual property rights in a database (some 27 Wendy Wagner and Rena Steinzor, eds. 2006. Rescuing Science from Politics: Regulation and the Distortion of Scientific Research. New York: Cambridge University Press.

74 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA specific arrangement or organization of the data), in a publication whose central ideas are based on the data, or in an invention that is based on the data. We will consider each of these related issues in turn. Copyright, Database Protections, and Licensing In the United States, copyright protection is extended to “original works of authorship fixed in any tangible medium of expression. . . .”28 Copyright holders enjoy the exclusive right to disseminate their creations and to earn a profit by selling or licensing them. Raw data and other facts, however, are not protected as copyrightable subject matter. Databases are copyrightable if the selections or arrangement are original; the mere compilation of facts or data into a collection does not entitle them to protection. These provisions were reinforced by the 1991 Supreme Court ruling in Feist Publications, Inc. v. Rural Telephone Service Co., which limited copyright protection for databases to those arranged and selected in an original manner.29 In addition, the federal government is prohibited from exerting copyright protection over its own publications, including data generated by government entities. Finally, copy- right law includes provisions for “fair use” exceptions in which portions of a copyrighted work may be used without permission in teaching, research, and other specified pursuits. This basic framework has served to support the open flow of research data. Federal agencies have been central in sustaining a strong public domain in data.30 With regard to research data, private companies and nonprofit enti- ties play an important role in creating databases and information services that are utilized by researchers. The existence of copyright protection for creative and original data collections provides an incentive for investments in valuable products and services in the private sector. Digital technologies have introduced new considerations into copyright laws and enforcement.31 Technological barriers to violating copyrights have fallen, posing challenges to copyright-based industries such as music, newspapers, and motion pictures. Before the digital age, the trigger for a copyright violation of a printed document was the act of copying. A photocopy of a document for per- sonal use falls under the fair-use provisions, but copying now can be done almost effortlessly. If a document is made into a PDF file that can be circulated on the Internet, the distinction between private use and publication vanishes. 28 U.S. Code, Title 17, Chapter 1, Section 102. Available at http://www4.law.cornell.edu/uscode/ html/uscode17/usc_sec_17_00000102----000-.html. 29 499 U.S. 340 (1991).Available at http://laws.findlaw.com/us/499/340.html. 30 National Research Council. 2003. The Role of Scientific and Technical Data and Information in the Public Domain: Proceedings of a Symposium. Washington, DC: The National Academies Press. 31 National Research Council. 2000. The Digital Dilemma: Intellectual Property in the Information Age. Washington, DC: National Academy Press.

ENSURING ACCESS TO RESEARCH DATA 75 Digital technologies have also made possible new approaches to com- mercializing the provision of data and data services.32 Several legal and policy changes of recent years have strengthened the position of copyright holders. These include lengthening of the term of copyright protection and the passage of the Digital Millennium Copyright Act of 1998 (DMCA). The DMCA imple- mented the World Intellectual Property Organization treaty on copyrights, and criminalized the circumvention of technical measures to prevent copying of digital materials, even in the absence of actual copying. These technical measures include hardware and software-based access controls, increasingly effective forms of encryption, and other forms of digital rights management that limit access to or copying of data. In addition, in 1996 the European Community enacted a Directive on the Legal Protection of Databases that established a framework for new propri- etary rights specific to databases.33 Experts have warned that a combination of expanded copyright protections, advances in technological means of restrict- ing access to digital content, and database protections of the type that Europe has adopted could enable the assertion and enforcement of proprietary claims to factual matter that previously entered the public domain as soon as it was disclosed.34 The United States and many other countries have not followed the European Union in establishing a new intellectual property regime for databases. An area where advancing technology and the increased use of contracts and licensing have changed the environment for access is remote sensing and geographic data and services.35 Federal agencies have traditionally acquired full ownership rights to geographic data (such as maps and books) from private entities and have allowed that information to enter the public domain so as to be accessible without restrictions to other uses. However, as digital media have become more prevalent, private data providers have moved to business models focused on selling multiple licenses and access subscriptions to databases. A 2004 National Research Council report recommended approaches agencies should take to licensing geographic data and services in order to maximize their utility, including a recommendation that the federal government should foster 32 National Research Council. 2004. Licensing Geographic Data and Services. Washington, DC: The National Academies Press. 33 Commission of the European Communities. 2005. First Evaluation of Directive 96/9/EC on the Legal Protection of Databases. DG Internal Market and Services Working Paper. Decem- ber 12. Available at http://ec.europa.eu/internal_market/copyright/docs/databases/­evaluation_ r ­ eport_en.pdf. 34 J. H. Reichman and Paul F. Uhlir. 2003. “A contractually reconstructed research commons for scientific data in a highly protectionist intellectual property environment.” Law and Contemporary Problems 66:315–462. 35 See National Research Council. 2002. Toward New Partnerships in Remote Sensing: Govern- ment, the Private Sector, and Earth Science Research. Washington, DC: The National Academies Press.

76 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA the creation of a National Commons and Marketplace in Geographic Informa- tion.36 Such an approach might be relevant to other fields where commercial entities play a major role in data collection and dissemination. Certainly, copyright protection, licensing, and an active commercial data- base market can coexist with a strong public domain in digital data. In recent years, efforts have been undertaken to utilize licensing to actively foster an expanded public domain. Although, as noted above, data are not subject to copyright protection, uncertainties about what data users are legally allowed to do with them can inhibit sharing and reuse. For example, it may not be clear whether a particular data collection is copyrightable or whether the creator intends to assert copyright. The fact that copyright persists for many years—whether it is asserted or not—means that a database may need to be actively placed into the public domain in order for users to be certain that it is free from copyright restric- tions and any type of reuse is permitted. Creative Commons and its offshoot, Science Commons, have developed a number of innovations in the area of licensing aimed at facilitating open dissemination, sharing, and use of a wide v ­ ariety of information, including data. For example, Creative Commons recently launched its CC0 (“CCZero”) protocol that allows creators of copyrightable work, including database generators, to waive all rights they may have to a given work, to the extent possible in the applicable jurisdiction. 37 Patents Patents give researchers, nonprofit organizations, companies, and other entities the right to profit from an innovation. In return, the property owner must make the innovation public, which enables others to build on it. Once intellectual property is patented, it can be freely disseminated while still main- taining its commercial value to a company or research institution. The Bayh-Dole Act of 1980 has had a major influence on the develop- ment of products from publicly funded research. The act granted the rights to inventions with the university, small-business, or nonprofit institution that accepted the research grant supporting the work. To accept this ownership, the university, small business, or nonprofit institution must: • Report each disclosed invention to the funding agency; • Elect to retain title in writing within a statutorily prescribed time frame; • File for patent protection; • Grant the federal government a nonexclusive, nontransferable, irrevo- cable, paid-up license to the invention; 36 National Research Council, Licensing Geographic Data and Services. 37 http://wiki.creativecommons.org/CCZero.

ENSURING ACCESS TO RESEARCH DATA 77 • Actively promote and attempt to commercialize the invention; • Not assign the rights to the technology, with a few exceptions; • Share royalties with the inventor; • Use any remaining income for education and research; • Give preference to U.S. industry and small business. For research that is supported exclusively by nonfederal money, the title to any inventions resulting from those data is owned according to the condi- tions established by the funder. For instance, corporate employees must assign their intellectual property rights to their employer, even sometimes for work done outside the scope of their employment. When research in an academic institution is supported by corporate money, the conditions of ownership must be clearly specified. The conditions often include proprietary control over the outputs of that research. In the case of academic research that is supported by nonprofit organizations, control is established by the granting organization. One example is the Howard Hughes Medical Institute, which retains title to all inventions arising from its support but frequently assigns its rights to the associated university or nonprofit institution. Trade secrecy may be used as an alternative to patenting. In some cases, inventions and underlying data have been held as proprietary trade secrets by companies and even universities and thus are treated as protected information as long as reasonable efforts are made to maintain secrecy. Researchers and their employers have this option, particularly if they do not plan to seek credit for the findings by reporting or publishing the results. Also, in cases where research at a university is supported by a private company, a research contract may provide for a short delay in publication or sharing data until the patent- ability of the research findings can be evaluated and, if appropriate, patent applications are filed. As noted above, academic researchers may have incentives to transfer their research findings to the private sector. These include the desire to see their discoveries translated into useful products or to profit themselves from commercial opportunities made possible by research. If these incentives cause researchers to withhold data, the net effect can be for research data to become less available. In 2006, a National Research Council Committee examined whether changes in patenting and licensing practices by companies and research insti- tutions pose a threat to continued progress in the rapidly advancing areas of genomics and proteomics research.38 The committee found that although diffi- culties in accessing proprietary research materials are clearly burdening research 38 National Research Council. 2006. Reaping the Benefits of Genomic and Proteomic Research: Intellectual Property Rights, Innovation, and Public Health. Washington, DC: The National Acad- emies Press.

78 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA efforts, limited access to data is currently not a serious problem. The committee recommended that the National Institutes of Health (NIH) continue efforts to encourage the free exchange of data and materials through mechanisms such as requiring grantees to develop and adhere to data-sharing plans. The committee also called for efforts on the part of the U.S. Patent and Trademark Office to improve understanding of rapidly emerging technologies in order to avoid the extension of patent protection to inventions that do not meet the patentability standards of novelty, utility, and nonobviousness. Journals and Access to Data The interest of scientific, technical, and medical (STM) journals in the integrity of research data, and their role in ensuring it, was discussed in Chap- ter 2. Because journal articles are the primary means of communicating the results of research, and rely on data to support their findings, journals also play an important role in facilitating access to data. Although research data are not copyrightable, papers incorporating those data are. The conventional arrangement in traditional STM publishing has been for authors to transfer their copyright in the article they have written to the publisher, generally with some retention of rights to use the article.39 The environment for STM journal publishing has changed considerably in recent years, as it has for nearly all publishing and media businesses. 40 Tradi- tional subscription-access STM journals are published by both commercial and nonprofit entities. Commercial STM publishing has seen significant consolida- tion, with fewer companies publishing larger numbers of journals. Subscription prices for traditional STM journals have seen steep increases, putting severe pressure on research library budgets.41 Concurrently, open access STM journals have emerged as a significant part of the scholarly publishing world.42 One prominent example of an open access publisher is Public Library of Science (PLoS), which publishes several high-impact journals in the life sciences. 43 39 Some universities assert copyright in selected categories of work by faculty, but often grant rights back to faculty for the purpose of traditional academic scholarship. See National Academy of Sciences, National Academy of Engineering, and Institute of Medicine. 2004. Electronic Scientific, Technical, and Medical Journal Publishing and Its Implications. Washington, DC: The National Academies Press. 40 Ibid. 41 Judith M. Panitch and Sarah Michalak. 2005. The Serials Crisis: A White Paper for the UNC-Chapel Hill Scholarly Communications Convocation. January. Available at http://www.unc. edu/scholcomdig/whitepapers/panitch-michalak.html. 42 “Open access” refers to publications, data collections, and other digital resources that are available to anyone without charge, and to the scholarly movement that advocates for policies and p ­ ractices supporting such digital resources. The advocacy movement is referred to in the report as “Open Access,” and the publications, data collections, and other digital resources as “open access.” 43 See the PLoS homepage at www.plos.org.

ENSURING ACCESS TO RESEARCH DATA 79 Another relevant trend is the growth in open access mandates for pub- lished research that have been initiated by research sponsors and research institutions. The most significant of these was adopted in early 2008 by NIH, having been mandated by Congress in the Consolidated Appropriations Act of 2008 and made permanent in the Omnibus Appropriations Act of 2009.44 The NIH policy provides that: The Director of the National Institutes of Health (“NIH”) shall require in the current fiscal year and thereafter that all investigators funded by the NIH submit or have sub- mitted for them to the National Library of Medicine’s PubMed Central an electronic version of their final, peer-reviewed manuscripts upon acceptance for publication, to be made publicly available no later than 12 months after the official date of publication: Provided, that the NIH shall implement the public access policy in a manner consistent with copyright law.” 45 Research institutions are also adopting open access recommendations for faculty research, encouraging faculty to provide electronic copies of their arti- cles for submission to an institutional or other open access repository, generally with an embargo period of 6 to 12 months. This is an international trend, with research institutions or sponsors (both public and private) adopting open access publication recommendations in Europe, Canada, Australia, and India. 46 The issues raised by the changing environment for scholarly publishing are the subject of continued, vigorous debate. Although they are not within the task statement of this study, it is necessary to review them in this context because access to scholarly publications is related to access to research data at several levels. For example, institutional and governmental repositories that support access to, and stewardship of, faculty articles may serve the same function for data (the data stewardship function of repositories is discussed in Chapter 4). It is also important to note the distinctions between open access to data and open access to publications. Traditional access STM publishers that might look unfavorably on open access publication mandates might support practices and guidelines encouraging open access to data.47 Open access mandates for data, to be discussed in the next section, are distinct from open access mandates for publications. 44 National Institutes of Health. 2009. The Omnibus Appropriations Act of 2009 Makes the NIH Public Access Policy Permanent: NOT-OD-09-071. March 19. Available at http://grants.nih. gov/grants/guide/notice-files/NOT-OD-09-071.html. 45 Ibid. 46 A continuously updated list of open access publication mandates is available at http://www. eprints.org/openaccess/policysignup/. 47 See International Association of Scientific, Technical & Medical Publishers. 2009. Brief- ing Document (for Publishing Executives) on Institutional Repositories and Mandated Deposit Policies. January; International Association of Scientific, Technical & Medical Publishers. 2006. Databases, Datasets, and Data Accessibility—Views and Practices of Scholarly Publishers. June. Available at http://www.stm-assoc.org/documents-statements-public-co/.

80 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA LEGAL AND POLICY REQUIREMENTS FOR ACCESS TO DATA The Data Access Act and the Information Quality Act Various government laws, regulations, and policies influence the accessibility of research data. Among these are the Data Access Act (DAA) of 1999 and the Information Quality Act (IQA) of 2001, also known as the Data Quality Act.48 The DAA is also known as the “Shelby Amendment,” after its sponsor, Senator Richard Shelby of Alabama. It was passed as a rider to an appro- priations bill in 1999. The DAA requires that data from federally funded research be made available to requesting parties under Freedom of Informa- tion Act procedures if the research is: (1) used to support an agency action, and (2) performed by a university or other nonprofit institution.49 In response, OMB modified its Circular A-110 to read as follows: [I]n response to a FOIA [Freedom of Information Act] request for research data relating to published research findings under an award that were used by the federal government in developing an agency action that has the force and effect of law, the federal awarding agency shall request, and the recipient will provide within a reason- able amount of time, the research data so that they can be made available to the public under FOIA. The provision established which types of research data are subject to dis- closure and the procedures, standards, and exemptions that apply in requesting and disclosing those data. Before the provision was published, persons could only obtain raw data that were in possession of a federal agency, whereas the revised provision provided access to data that are in possession of a grantee institution. If even a small amount of public money was used to produce data, those data may be subject to DAA requests. However, studies conducted by industry or by others without the use of public funds are not covered by the data-sharing requirements, even if the studies are employed in the formulation of public policy or regulations. Also, as interpreted by OMB, the provision applies only to data supporting regulations with a “major” impact on the economy and is prospective, covering studies launched after the OMB guide- lines were put into effect. The DAA was controversial at the time the legislation passed and when OMB was developing the specific changes to Circular A-110. Participants in a 2001 National Research Council workshop pointed out future problems that 48 National Research Council. 2002. Access to Research Data in the 21st Century: An Ongoing Dialogue Among Interested Parties: Report of a Workshop. Washington, DC: The National Acad- emies Press. 49 Wendy Wagner and David Michaels. 2004. “Equal treatment for regulatory science: Extending the controls governing the quality of public research to private research.” American Journal of Law & Medicine 30:119–154.

ENSURING ACCESS TO RESEARCH DATA 81 might be encountered in implementing the DAA, suggesting that this approach might not be an ideal way to ensure public access to data underlying federal policies and regulations.50 At the same time, the DAA does not appear to have led to any contentious cases during the decade since it went into effect. For example, a 2003 General Accounting Office report found that two agencies had received a total of 42 requests under the DAA up to that time, and that none of the requests had actually met the Circular A-110 criteria. 51 The IQA was passed as a two-sentence rider to the 2001 Consolidated Appropriations Act. The IQA called on OMB to issue regulations for “ensur- ing and maximizing the quality, objectivity, utility, and integrity of informa- tion (including statistical information) disseminated by Federal agencies.” In response, OMB issued guidelines that all agencies “must embrace a basic stan- dard of quality as a performance goal, and agencies must incorporate quality into their information dissemination practices.”52 The guidelines state that “if an agency is responsible for disseminating influential scientific, financial, or statistical information, agency guidelines shall include a high degree of transparency about data and methods to facilitate the reproducibility of such information by qualified third parties.”53 For “original and supporting data,” agencies are to consult with “relevant scientific and tech- nical communities” and determine which data are subject to the reproducibility requirement.54 “Reproducibility” here means a high level of transparency about research design and methods, which is meant to negate any need to replicate work before dissemination. For “analytic results” there must be “sufficient transparency about data and methods that an independent reanalysis could be undertaken.”55 This means that “independent analysis of the original or sup- porting data using identical methods would generate similar analytic results, subject to an acceptable degree of imprecision or error.”56 In cases where the public does not have access to data and methods (privacy, security, trade 50 See National Research Council, Access to Research Data in the 21st Century. In particular, Chapter 6, which reports on workshop chair Richard Merrill’s summary remarks, is a concise state- ment of the longer-term shortcomings of DAA. 51 General Accounting Office. 2003. University Research: Most Federal Agencies Need to Better Protect against Financial Conflicts of Interest. GAO-04-31. November. Washington, DC: General Accounting Office. 52 Office of Management and Budget. 2002. “Guidelines for ensuring and maximizing the q ­ uality, objectivity, utility, and integrity of information disseminated by federal agencies; Notice; R ­ epublication.” Federal Register 67(36):8451–8460. Available at http://www.noaanews.noaa.gov/ stories/feb22.pdf. This Federal Register entry includes the final guidelines as well as a discussion of the comments received. 53 Ibid., p. 8455. 54 Ibid. 55 Ibid., p. 8456. 56 Ibid.

82 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA secrets), “agencies shall apply especially rigorous robustness checks to analytic results and document what checks were undertaken.”57 A committee organized by the National Research Council’s Committee on Science, Technology, and Law held several workshops in 2002 to discuss OMB’s IQA guidance and the agency responses that were being developed. The summary of those workshops reviews a number of the issues agencies faced in developing their own implementing guidelines.58 Federal and Journal Policies Affecting the Availability of Data Table 2-2 shows federal agency policies toward availability of data gener- ated directly by agencies as well as data generated by external grantees. In 2008 the federal government released its Principles for the Release of Scientific Research Results in response to the America COMPETES Act of 2007.59 These principles promote sharing of data from research undertaken by federal civilian agency employees. For federally sponsored research performed by external organizations, the grants guides of agencies vary in how strongly data sharing is encouraged or required. A 2007 Government Accountability Office (GAO) assessment of agency policies toward grantees in climate science found that although agen- cies encouraged data sharing, the specific requirements varied from program to program.60 For example, the National Science Foundation (NSF) grants guide states the expectation that grantees make their data “widely available and useful” within a “reasonable time.” Specific NSF programs might require that data be deposited in a specific repository within a set time period follow- ing data collection. The GAO report also found that agencies generally do not monitor whether data-sharing requirements are being met and have not overcome ­barriers to sharing, such as the lack of appropriate data archives in some subfields of climate science. Although specific federally sponsored research programs include a range of data-sharing mandates, no federal agency has yet adopted an agencywide open access data mandate, analogous to NIH’s open access publication man- date. NIH does require that grant proposals above a certain size include a data management plan consistent with NIH’s Data Sharing Policy, which is discussed further below in the section on “Responsibilities of Research Institu- 57 Ibid., p. 8457. 58 National Research Council. 2003. Ensuring the Quality of Data Disseminated by the Federal Government: Workshop Report. Washington, DC: The National Academies Press. 59 John H. Marburger, III. 2008. Principles for the Release of Scientific Research Results. Memo- randum. May 28. Available at www.arl.org/bm~doc/ostp-scientific-research-28may08.pdf. 60 Government Accountability Office. 2007. Climate Change Research: Agencies Have Data- Sharing Policies but Could Do More to Enhance the Availability of Data from Federally Funded Research. September. Available at http://www.gao.gov/new.items/d071172.pdf.

ENSURING ACCESS TO RESEARCH DATA 83 tions, Research Sponsors, Professional Societies, and Journals.” Federal agen- cies are creating new open access data resources, such as the Department of Energy’s Data Explorer program, an open access repository of data from DOE- sponsored research, and National Library of Medicine efforts such as GenBank, which is discussed elsewhere in the report.61 Some private research sponsors such as the Wellcome Trust have adopted open access data mandates for their grantees.62 As shown in Table 2-1, an increasing number of STM journals have adopted open access data mandates for authors.63 THE INTERNATIONAL DIMENSIONS OF ACCESS TO RESEARCH DATA The advent of digital networks has enabled and stimulated global access to all types of digital information, including research data. Access to research data online means that researchers can use the data on a global basis, enhancing the universal progress of science to solve common problems and develop new knowledge. Both the benefits and the costs of unrestricted and restricted access are thus amplified in the international context. The United States has been a leader in promoting openness to public s ­ ector information, as well as to publicly funded research data. Despite the trends in fields with commercial potential toward more proprietary treatment of academic research and the post-September 11 increase in national security restrictions on some sensitive data sources and types, the overall policy trend may be seen as moving toward greater access to both governmental and aca- demic research data sources. The international dimensions of access to research data are being shaped both from the bottom up and the top down. At the informal working level of the individual investigator, data are now shared across geographic boundaries as easily as they once were with the col- league next door. Countless international data exchanges are made among scientists on a daily basis, or through the posting of datasets on individual researchers’ Web sites. At a more formal level, international research projects establish data-­sharing protocols that reflect the norms of the fields in which they are operating. Some of the larger research or infrastructure programs are establishing data centers or federated networks for sharing of data resources. The first international net- work of such data centers, the World Data Center system, was formed following the 1957 International Geophysical Year to help bridge the gap in cooperation and data exchanges during the cold war. 61 See http://www.osti.gov/dataexplorer/. 62 Seehttp://www.wellcome.ac.uk/About-us/Policy/Policy-and-position-statements/WTX035043.htm. 63 See, for example, http://www.nature.com/authors/editorial_policies/availability.html.

84 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA With the advent of global digital networks over the past two decades, both international cooperation in research and the formation of networked data resources on regional and global levels have become commonplace. Examples include the Global Biodiversity Information Facility, the International Federa- tion of Digital Seismograph Networks, the International Nucleotide Sequence Database Collaboration, the International Virtual Observatory Alliance, and the Global Earth Observation System of Systems, to name but a few. Almost all fields of inquiry have some data centers or networks designed to provide access to data. In most cases, the U.S. research community has been the organizing force for the collaborative data-sharing networks. Greater access to research data from public funding also is receiving more attention at the national policy levels of many countries, in part because such data resources are now seen as being major research infrastructure components. For example, the Research Councils of the United Kingdom adopted a more open policy for their data holdings in 2006. The Ministry of Science and Tech- nology in China initiated the Scientific Data Sharing Project in 2002, in recog- nition of the fact that “[t]he insufficient use of China’s massive data holdings has been an urgent problem.”64 Many other countries are similarly reviewing or revising their national policies and myriad institutional ones to make better use of their data resources. Finally, some international scientific, engineering, and medical organiza- tions at both the intergovernmental and nongovernmental levels, such as the International Council of Scientific Unions, the Committee on Data for Science and Technology, and the OECD, are developing data-sharing policies and guidelines for adoption by members and the international research commu- nity. For example, the OECD in 2007 published its Principles and Guidelines for Access to Research Data from Public Funding, which are summarized in Box 3-2. The InterAcademy Panel, an organization of national science acad- emies, supports a program to expand access to digital scientific information to researchers in developing countries.65 GENERAL PRINCIPLE FOR ENHANCING ACCESS TO RESEARCH DATA Because of the huge increase in the quantity of research data being gener- ated, it is possible to say both that more data are being publicly disseminated than have ever been before and that more data are being withheld from public 64 Jinpei Cheng. 2006. The development of China’s scientific data sharing policy. In National Research Council. Strategies for Preservation of and Open Access to Scientific Data in China: Sum- mary of a Workshop. Washington, DC: The National Academies Press. Available at: http://www. nap.edu/catalog.php?record_id=11710. 65 See the program’s Web site: http://www.interacademies.net/CMS/Programmes/4704.aspx.

ENSURING ACCESS TO RESEARCH DATA 85 access today than have ever been before. Many fields of research have moved toward more open data-sharing policies as the value of data has increased and as digital technologies have enabled information to be disseminated more broadly. At the same time, heightened interest in the commercial applications of research data has caused some forms of data to be more restricted. As described earlier in this chapter, there are legitimate reasons why some research data are not made publicly available, ranging from privacy concerns to technical barriers. Yet the basic principle that should guide decisions involving research data supporting publicly reported research results is clear: Data Access and Sharing Principle: Research data, methods, and other infor- mation integral to publicly reported results should be publicly accessible. This principle applies throughout research, but in some cases the open dis- semination of research data may not be possible or advisable when viewed from the perspective of enhancing research in science, engineering, or medicine. Access to research data prior to reporting results based on those data might undermine the incentives to pursue the research. There might also be technical barriers, such as the sheer size of datasets, that make sharing problematic, or legal restrictions on sharing as discussed in the section on “Legal and Policy Requirements for Access to Data.” Also, “accessible” does not necessarily imply that data should be disseminated for free, though free or marginally priced distribution is the ideal. Nor are researchers responsible for providing data users with instruction or training in the use of their data, though they do have a responsibility to provide metadata, analysis software, models (including code and input data) and other information necessary for practitioners to vali- date and build on the results. Where researchers have proprietary interests in such tools, they have the option of protecting those interests through applying for patents and/or asserting copyright, as appropriate, in advance of publicly reporting results. This principle is a standard that is not currently being met in some areas of research. Yet it provides a yardstick against which to measure current initiatives and future plans. Researchers know that the information they generate should be available to others to advance the frontiers of knowledge. The objective therefore must be to implement policies and promote practices that allow this principle to be realized as fully as possible. This principle may seem to apply only to publicly funded research, but a strong case can be made that much data from privately funded research should be made publicly available as well. In many cases, making such data available can produce societal benefits while not threatening the commercial opportunities that led to the data’s generation. Note that this principle covers data underlying publicly reported results. When a researcher working at a cor- porate lab seeks to publish results, patent applications can be filed in advance

86 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA of publication, so that making data accessible at the time of publication will not compromise commercialization of the invention in question. If a company decides to protect an invention as a trade secret, it might be assumed that researchers will not publish papers about the invention and the question of providing access to data will not arise. In the past few years we have also seen private companies announce plans to make significant data resources available on an open access basis. For exam- ple, Merck has spun off a nonprofit, open access platform known as Sage. 66 Sage is aimed at helping researchers to build new databases aimed at more effectively modeling disease. Where possible, public policies should encourage the release of such data, and privately funded researchers and their managers should explore possible means of making data available. The Access and Sharing Principle is consistent with recommendations from National Academies committees that have previously addressed data access. A 2003 report, Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences, puts forward the “uniform principle for sharing integral data and materials expeditiously (UPSIDE).”67 The UPSIDE principle calls on researchers employed in the academic, government, and commercial sectors to provide data and materials needed to support published findings, and to “provide them in a form on which other scientists can build with further research.” The 1997 report Bits of Power: Issues in Global Access to Scientific Data states that “full and open access to scientific data should be adopted as the international norm for the exchange of scientific data derived from publicly funded research.”68 RESPONSIBILITIES OF RESEARCHERS As with the integrity of research data, the primary responsibility for shar- ing data lies with the researchers who produced them. (In addition, other parts of the research enterprise have responsibilities for sharing data, as described later in this chapter and in the next chapter.) Only researchers know their data well enough to ascertain what information must be publicly available to allow others to verify their results and build on their work. Only researchers are in a position to work with research institutions, research sponsors, and journals to make data available in a way that they can be understood and used effectively by others. Thus, our committee recommends that: 66 Bryn Nelson. 2009. “Something wiki this way comes.” Nature 458(13, March 4). doi:10.1038/ 458013a. 67 National Research Council. 2003. Sharing Publication-Related Data and Materials: Responsibilities of Authorship in the Life Sciences. Washington, D.C: The National Academies Press. 68 National Research Council. 1997. Bits of Power: Issues in Global Access to Scientific Data. Washington, DC: National Academy Press.

ENSURING ACCESS TO RESEARCH DATA 87 Recommendation 5: All researchers should make research data, methods, and other information integral to their publicly reported results publicly accessible in a timely manner to allow verification of published findings and to enable other researchers to build on published results, except in unusual cases in which there are compel- ling reasons for not releasing data. In these cases, researchers should explain in a publicly accessible manner why the data are being withheld from release. Making data available does not necessarily mean providing them at no cost. The next chapter discusses the need for research projects to develop plans for the management and sharing of data from the initial stages of a research pro- gram. Chapter 4 also describes the evolving infrastructure for providing data access and stewardship, whose components include institutional and disciplin- ary repositories. Fulfilling this recommendation also requires that researchers be familiar with any possible constraints on the release of data. Although this information is usually known to researchers and their managers from the outset of a research project, agreements may be informal, may be understood differently by different parties (such as principal investigators and graduate students), or may change during the course of a research project. Requiring that researchers clarify and agree to these arrangements places the responsibility on researchers to oversee the accessibility of research data and to decide whether to participate in research where data accessibility is limited. Researchers who are considering becoming involved in a project where data accessibility is restricted need to ask themselves whether the benefits of participating in that project outweigh the benefits of transparency in generating and disseminating data. Research thrives under conditions where data are available to others. If data are not available, there should be a clear and public reason why those data are being withheld from dissemination. Indeed, justifications for not making data available should be understood by the researcher, sponsor, and institution. Dis- semination of the reasons why data are being withheld could be published with journal articles, posted on Web sites, stated in the publicly accessible award state- ments of research sponsors or research institutions, or made available by some other means. The important point is that the reasons should be publicly available so that others can review and comment on the grounds for withholding data. As discussed in the following section, the committee believes that research fields, research sponsors, research institutions, and journals have considerable ability to set appropriate standards and expectations regarding data access and sharing, and to develop the necessary incentives. Some are taking leadership roles in setting standards and instituting incentives. The committee believes that continued efforts taken by these stakeholders can create an environment in which the Data Access and Sharing Principle is widely followed in the research enterprise, and in which a bureaucratic framework of regulations and enforce- ment will not need to be imposed.

88 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA RESPONSIBILITIES OF RESEARCH FIELDS As emphasized earlier, there are major differences between research fields in the handling of data, including technological infrastructure, publication practices, and data-sharing expectations. In some fields, aspects of their data culture act as barriers to access and sharing of data. Because of the growing importance of research data and the rate at which practices are changing in research, it is important for various fields and disciplines to examine their standards and practices regarding data and to make these explicit. The development of plans for data management and sharing is greatly facilitated when a field of research has standards and institutions in place designed to promote the accessibility of data. Recommendation 6: In research fields that currently lack standards for the shar- ing of research data, such standards should be developed through a process that involves researchers, research institutions, research sponsors, professional societ- ies, journals, representatives of other research fields, and representatives of public interest organizations, as appropriate for each particular field. The development of standards and institutions can occur in different ways depending partly on the field of research in which it occurs. The process can be led by journal editors, professional societies, ad hoc bodies of researchers established to solve particular problems, or permanent institutions charged with overseeing data management issues. National Academies committees and advisory committees to federal agencies can play constructive roles. In large, complex fields, multiple initiatives may be undertaken to address various aspects of standard setting. Input and participation from international stake- holders will often be needed. The life sciences provide useful examples of the standards-setting process. As described in Box 3-4, a National Academies committee developed broad standards for the sharing of research data in the life sciences. Similarly, as described in Box 3-5, a journal-led effort incorporating community input devel- oped the Paris Guidelines for the management of protein data. Both examples demonstrate how standards can be put in place to deal with existing or new issues affecting the management of research data. The Principles for the Release of Scientific Research Results, released in 2008 and discussed in the earlier section on “Federal and Journal Policies Affecting the Availability of Data,” establish data-sharing standards for research conducted by employees of federal civilian agencies.69 One section of the prin- ciples states: 69 John H. Marburger, III. 2008. Principles for the Release of Scientific Research Results. Memo- randum. May 28. Available at www.arl.org/bm~doc/ostp-scientific-research-28may08.pdf.

ENSURING ACCESS TO RESEARCH DATA 89 BOX 3-5 The Paris Guidelines In some fields, journals have played a major role in developing standards for data collection, sharing, and preservation. In 2004, for example, the journal Molecular and Cellular Proteomics (MCP) developed standards for the management of protein data.a These standards were revised 1 year later based on community input, resulting in the “Paris Guidelines.”b These guidelines were made available in a checklist format, in a tutorial, and in MCP-hosted workshops to educate researchers about the details of the requirements for publication and data submission.c MCP’s standard requires all relevant quantitative data to be made available at a level in which it is possible to reproduce the reported results. Methods can reference previously published standards but any deviations must be explained. In p ­ articular, authors must submit along with the manuscript the data that have the greatest potential for misinterpretation—for instance, mass spectrographic spectra for post-­translationally modified proteins—for the journal to publish. Data considered less important but worthy of access are recommended for submission to the journal as supplementary material to be deposited in a nonjournal repository, which therefore may not be archival.d In addition, an institutionally based government-funded data depository was recommended (“Tranche”) that has a dis- tributed storage system similar to Bit Torrent, thereby lessening costly bandwidth problems caused by downloading large amounts of data over the Internet. In this way the Paris guidelines ensure that the most important data are depos- ited for perpetual and accessible storage while second-tier data also are accessible without placing too large a burden on the journal as the sole repository for data. a Steven Carr, Ruedi Aebersold, Michael Baldwin, Al Burlingame, Karl Clauser, and Alexey N ­ esvizhskii. 2004. “The need for guidelines in publication of peptide and protein identification data: Working Group on Publication Guidelines for Peptide and Protein Identification Data.” Molecular and Cellular Proteomics 3:531–533. b Ralph A. Bradshaw, Alma L. Burlingame, Steven Carr, and Ruedi Aebersold. 2006. “Reporting protein identification data: The next generation of guidelines.” Molecular and Cellular Proteomics 5:787–788. c See http://www.mcponline.org/misc/Tutorial_MCP_final.pdf. d For an example of supplementary data, see http://www.mcponline.org/cgi/content/abstract/6/7/1123. Research data produced by scientists working within Federal agencies should, to the maximum extent possible and consistent with existing Federal law, regulations, and Presidential directives and orders, be made publicly available consistent with established practices in the relevant fields of research. This principle is consistent with the Data Sharing and Access Principle stated above. This report advocates that the principle apply not just to federal scientists but to all research where results are publicly reported.

90 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA A wide range of issues must be considered in setting data standards, includ- ing dissemination, usage restrictions, periods of exclusive use, documentation requirements, financial provisions, ownership, licensing terms, infrastructure needs, technological compatibility, and sustainable preservation. These issues vary greatly from field to field, depending on particular traditions and require- ments. Although it is not impossible to prescribe a standard set of practices to which all researchers should adhere—indeed, the general principles stated in this report apply to all researchers—every field collectively and every researcher individually must address issues of data accessibility. RESPONSIBILITIES OF RESEARCH INSTITUTIONS, RESEARCH SPONSORS, PROFESSIONAL SOCIETIES, AND JOURNALS For researchers to make their data accessible, they need to work in an environment that promotes data sharing and openness. Recommendation 7: Research institutions, research sponsors, professional ­societies, and journals should promote the sharing of research data through such means as publication policies, public recognition of outstanding data-sharing efforts, and funding As noted earlier in this chapter, research institutions, research sponsors, professional societies, and journals are undertaking a range of initiatives to promote the sharing of research data. In taking the next steps, research institu- tions and research sponsors need to create incentives for researchers to share data, just as they have incentives to maintain the integrity of research data and to publish their findings. Researchers need both formal and informal ways of being acknowledged and rewarded for making research data accessible and usable. For example, in some cases tenure and promotion decisions could take into account efforts to promote the accessibility of data, the creation of p ­ ublication-based metrics, or service to a community or institution. Data professionals also have an important role to play in ensuring the accessibility of research data. In close cooperation with researchers in a field, data professionals can anticipate the needs of data users and establish data management systems that meet those needs. Their contributions to making data accessible, as well as ensuring the integrity of data, need to be recognized. One way for research sponsors and journals to promote data accessibility is to establish the terms of access and sharing expected of institutions and inves- tigators. For example, NIH explicitly requires that all grant applications for more than $500,000 in direct costs in a single year must include a data manage- ment plan that embodies the principles of the NIH Data Sharing Policy. This policy says that “data should be made as widely and freely available as possible while safeguarding the privacy of participants, and protecting confidential and

ENSURING ACCESS TO RESEARCH DATA 91 proprietary data.” The data management plan becomes part of the proposal, and “NIH expects that plan to be enacted. . . . In the case of noncompliance (depending on its severity and duration) NIH can take various actions to pro- tect the Federal Government’s interests.”70 These actions are not specified but may affect the review of future proposals. As discussed above, research institutions, research sponsors, and journals have considerable leverage in encouraging data access and sharing on the part of researchers. Several leading research institutions have announced open access publication recommendations, which encourage faculty to deposit their publica- tions in their institutional repository. Such recommendations could be extended to data. Some federal research programs and journals have adopted open access data policies that require or encourage researchers to deposit underlying data in a disciplinary or institutional repository (see Tables 2-1 and 2-2). Depending on the program or discipline, adopting and effectively enforcing such open access data policies may be an appropriate way for research institutions, research spon- sors, and journals to implement this recommendation. The Council on Government Relations points out that “few institutions have formal policies and procedures for access to and retention of research data.”71 As described above, the terms of research contracts and grants and other regulations often specify that research institutions are responsible for retaining data and providing access. Given the current lack of formal policies and procedures, we make the following recommendation. Recommendation 8: Research institutions should establish clear policies regard- ing the management of and access to research data and ensure that these policies are communicated to researchers. Institutional policies should cover the mutual responsibilities of researchers and the institution in cases in which access to data is requested or demanded by outside organizations or individuals. The knowledge needed to develop data access policies is not widespread or fully developed. Research institutions and sponsors may need to come together to identify best practices and policy models. Organizations such as the Asso- ciation of American Universities, the Association of Public and Land-Grant Universities, the Association of Research Libraries, and the Council on Govern- ment Relations can contribute to this process. Disputes between researchers and their institutions regarding control of data are not unusual. For example, faculty members may be denied tenure and seek to take their research data with them, while the institution may seek 70 National Institutes of Health Office of Extramural Research. 2003. NIH Data Sharing Policy and Implementation Guidance. 71 Council on Government Relations. 2006. Access to and Retention of Research Data: Rights and Responsibilities. March. Washington, DC: Council on Government Relations.

92 ENSURING THE INTEGRITY, ACCESSIBILITY, AND STEWARDSHIP OF DATA to keep it. Or researchers and institutions may have different perspectives on how to respond to outside requests for access to data, including requests made under the auspices of the DAA or in connection with litigation. As described earlier in this chapter, requests for information can go beyond research data to information about a researcher’s personal life. Procedures for handling requests for data that either intentionally or inad- vertently hamper the progress of research need special attention. Although the data from publicly funded research should be accessible in general, exploiting the norms of science to slow or stop the progress of research harms society. For example, institutional policies might stipulate that an institution will come to the aid of researchers in disputes with third parties, but researchers also must comply with institutional policies. Many journals play a critical role in ensuring access to the data that sup- port the publications appearing in those journals (see Box 3-6 for an example). Access to those data may be lost as journals evolve under the pressures of dra- matic changes being catalyzed by digital technologies. The following chapter covers the responsibilities of journals to make data accessible in the context of the long-term preservation of research data.

ENSURING ACCESS TO RESEARCH DATA 93 BOX 3-6 Promoting Reproducibility in Medical Research As of April 1, 2007, the Annals of Internal Medicine instituted a new policy designed to help the research community evaluate and build on published results. Authors of original research articles in the Annals are required to include a state- ment indicating whether the study protocol, data, and statistical code are available to readers and under what terms the authors will share this information. Sharing is not mandatory, but authors are required to state whether they are willing to share the protocol, data, and statistical code. Authors are not asked whether they are willing to make this information available until after a manuscript is accepted for publication. According to an article announcing the new policy, the goal of the new require- ment is to promote “reproducible research” in which independent researchers can reproduce results using the same procedures and data as the original investigators. Reproducible research does not require unlimited access to data and methods, but it requires access to as much of the dataset and statistical procedures as is necessary to reproduce the published results. As the article states: Major cultural shifts in research must occur before a world of completely reproducible research can exist. These shifts include increasing the technical c ­ apacity of many research teams, further developing acceptable data-sharing mechanisms, and supporting—both professionally and financially—the pub- lishing of reproducible research. . . . We hope that shining a spotlight on the availability of the study protocol, data, and statistical code for every Annals research report will be seen as a small but important step toward biomedical research that the public can really trust. At the same time, it will enhance what is perhaps the main function of a journal: to provide a transparent medium for a conversation about science.a aFor more information, see Christine Laine, Steven N. Goodman, Michael E. Griswold, and Harold C. Sox. 2007. “Reproducible research: Moving toward research the public can really trust.” Annals of Internal Medicine 146:450–453.

Next: 4 Promoting the Stewardship of Research Data »
Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age Get This Book
×
Buy Paperback | $44.95 Buy Ebook | $35.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

As digital technologies are expanding the power and reach of research, they are also raising complex issues. These include complications in ensuring the validity of research data; standards that do not keep pace with the high rate of innovation; restrictions on data sharing that reduce the ability of researchers to verify results and build on previous research; and huge increases in the amount of data being generated, creating severe challenges in preserving that data for long-term use.

Ensuring the Integrity, Accessibility, and Stewardship of Research Data in the Digital Age examines the consequences of the changes affecting research data with respect to three issues - integrity, accessibility, and stewardship-and finds a need for a new approach to the design and the management of research projects. The report recommends that all researchers receive appropriate training in the management of research data, and calls on researchers to make all research data, methods, and other information underlying results publicly accessible in a timely manner. The book also sees the stewardship of research data as a critical long-term task for the research enterprise and its stakeholders. Individual researchers, research institutions, research sponsors, professional societies, and journals involved in scientific, engineering, and medical research will find this book an essential guide to the principles affecting research data in the digital age.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!