The National Academies Press

Currently Skimming:

4 Reproducibility
Pages 55-70

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 55... ... . For example, public health researchers data mine large databases looking for patterns, earth scientists run massive simulations of complex systems to learn about geological changes in our planet, and psychologists use advanced statistical analyses to uncover subtle effects from randomized controlled experiments. Read the entire page →
From page 56... ... 8) explicitly defined reproducible computational research as that in which "all details of the computation -- code and data -- are made conveniently available to others." The Yale Law School Roundtable on Data and Code Sharing (2010) Read the entire page →
From page 57... ... For studies and longstanding collaborations that have not 1 Journals that require data to be shared generally allow some exceptions to the data sharing rule. For example, PLOS publications allow researchers to exclude data that would violate participant privacy, but they will not publish research that is based solely on proprietary data that are not made available or if data are withheld for personal reasons (e.g., future publication or patents) Read the entire page →
From page 58... ... FINDING 4-2: When results are produced by complex computational processes using large volumes of data, the methods section of a tradi tional scientific paper is insufficient to convey the necessary information for others to reproduce the results. RECOMMENDATION 4-1: To help ensure the reproducibility of computational results, researchers should convey clear, specific, and complete information about any computational methods and data products that support their published results in order to enable other researchers to repeat the analysis, unless such information is restricted by nonpublic data policies. Read the entire page →
From page 59... ... Researchers applying high-performance algorithms thus recognize (Diethelm, 2012) that when different runs with the same input data produce slightly different numeric outputs, each of these results is equally credible, and the output must be understood as an approximation to the correct value within a certain accepted uncertainty. Read the entire page →
From page 60... ... Thus, there is a tension between computational performance and strict numerical reproducibility of the results in parallel computing. Read the entire page →
From page 61... ... Such a result is achievable for short time spans and individual locations and is essential for model testing and software debugging, but the dominance of this definition as a paradigm in the field is giving way to a more statistical way of understanding model output. Historically, climate modelers believed that they needed the more rigid defi nition of bitwise reproduction because the nonlinear equations governing Earth systems are chaotic and sensitive to initial conditions. Read the entire page →
From page 62... ... The computational results were presented in the form of model parameter estimates, computed using statistical software and custom scripts. A consistent computational result, in this case, means obtaining the same model parameter estimates and measures of statistical significance within some degree of sampling variation. Read the entire page →
From page 63... ... J acoby described the standing contract of the American Journal for Political Science with a university to computationally reproduce every article prior to publication; he reported to the committee that each article requires approximately 8 hours to reproduce. In Moraila's effort, software could be built for fewer than one-half of the 231 studies, highlighting the challenges of reproducing computational environments. Read the entire page →
From page 64... ... Cross- A randomly selected Fewer than one-half of (2018b) disciplinary, sample of 204 the articles provided data: computation- computation-based 24 articles had data, and based research articles published in an additional 65 provided Science, with a data- some data when requested. Read the entire page →
From page 65... ... of a standing contract 8 were reproduced on the between American first attempt. Journal for Political Science and universities to reproduce all articles submitted to the journal Gunderson et al. Read the entire page →
From page 66... ... Rather, the committee's collection of reproducibility attempts across a variety of fields allows us to note that a number of systematic efforts to reproduce computational results have failed in more than one-half of the attempts made, mainly due to insufficient detail on digital artifacts, such as data, code, and computational workflow. Expecting computational reproducibility is considered by some to be too low of a bar for scientific research, yet our data in Table 4-1 show that many attempts to reproduce results initially fail. Read the entire page →
From page 67... ... In addition to lack of access to nonpublic data and code, mentioned previously, the contributors include the following: • Inadequate recordkeeping: The original researchers did not prop erly record the relevant digital artifacts such as protocols or steps followed to obtain the results, the details of the computational environment and software dependencies, and/or information on the archiving of all necessary data. • Nontransparent reporting: The original researchers did not trans parently report, provide open access to, or archive the relevant digital artifacts necessary for reproducibility. Read the entire page →
From page 68... ... . Meticulous and complete recordkeeping is increasingly challenging and potentially time consuming as scientific workflows involve ever more intricate combinations of digital and physical artifacts and entail complex computational processes that combine a multitude of tools and libraries.5 Satisfying all of these challenging conditions for transparent computation 4 Final datasets used in analysis are the result of data collection and data culling (or clean ing) Read the entire page →
From page 69... ... . Nontransparent Reporting A second barrier to computational reproducibility is the lack of sharing or insufficient sharing of the full compendium of artifacts necessary to rerun the analysis, including the data used,6 source code, information about the computational environment, and other digital artifacts. Read the entire page →
From page 70... ... Even when the original study qualifies as reproducible research, because all the relevant protocols were automated and the digital artifacts are available such that it is capable of being checked, another researcher without proper training and capabilities may be unable to use those artifacts. Barriers in the Culture of Research While interest in open science practices is growing, and many stakeholders have adopted policies or created tools to facilitate transparent sharing, the research enterprise as a whole has not adopted sharing and transparency as near-universal norms and expectations for reproducibility (National Academies of Sciences, Engineering, and Medicine, 2018) Read the entire page →

From page 55...

... . For example, public health researchers data mine large databases looking for patterns, earth scientists run massive simulations of complex systems to learn about geological changes in our planet, and psychologists use advanced statistical analyses to uncover subtle effects from randomized controlled experiments.

4 Reproducibility Pages 55-70

4 Reproducibility
Pages 55-70