Skip to main content

Currently Skimming:

6 Improving Reproducibility and Replicability
Pages 105-142

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 105...
... The chapter presents a number of the committee's key recommendations. STRENGTHENING RESEARCH PRACTICES: BROAD EFFORTS AND RESPONSIBILITIES Improving substandard research practices -- including poor study design, failure to report details, and inadequate data analysis -- has the potential to improve reproducibility and replicability by ensuring that research is more rigorous, thoughtful, and dependable.
From page 106...
... For example, in hypothesis-testing inquiries, good research practices include conducting studies that are designed with adequate statistical power to increase the likelihood of finding an effect when the effect exists. This practice involves collecting more and better observations (i.e., reducing sampling error by increasing sample size and reducing measurement error by improving measurement precision and reliability)
From page 107...
... The association also offers workshops and presentations about research practices at its annual convention. • The Society for the Improvement of Psychological Science was formed in 2016 with the explicit aim of improving research meth ods and practices in psychological science.
From page 108...
... For example: • A new course at the University of California, Berkeley, "Repro ducible and Collaborative Data Science," introduces students to "practical techniques and tools for producing statistically sound and appropriate, reproducible, and verifiable computational an swers to scientific questions."3 The university has also created the Berkeley Initiative for Transparency in the Social Sciences.4 3 For the course description, see https://berkeley-stat159-f17.github.io/stat159-f1. 4 See https://www.bitss.org.
From page 109...
... • Librarians at NYU hold office hours for questions about reproduc ibility and offer tutorials, such as "Citing and Being Cited: Code and Data Edition," which teaches students how and why to share data and code. • Also at NYU, the Moore-Sloan Data Science Environment5 has cre ated the Reproducible Science website6 to serve as an open direc tory of reproducibility resources for issues beyond computational reproducibility.
From page 110...
... Improving computational reproducibility involves better capturing and sharing information about the computational environment and steps required to collect, process, and analyze data. All of the sources of non-reproducibility also impair replicability efforts, since they deprive researchers the information that is useful in designing or undertaking replication studies.
From page 111...
... After the results are published, a detailed provenance of the process needs to be included to enable others to reproduce and extend them. This information includes the description of the data, the computational steps followed, and information about the computational environment.
From page 112...
... A physicist may perform further data reduction and selection procedures, which are followed by a statistical analysis on the data. The analysis assets being used by the individual researcher include the information about the collision and simulated datasets, the detector conditions, the analysis code, the computational environments, and the computational workflow steps used by the researcher to derive the histograms and the final plots as they appear in publications.
From page 113...
... FIGURE 6-2  Components of the computational research include data, computational steps, and computational environment for a paper on Galois conjugates of quantum double models. NOTE: A full description of each, or the study's provenance, is required for reproducible research.
From page 114...
... . Source Code and Data Version Control In computational environments, several researchers may be working on shared code or data files.
From page 115...
... , which arose in ecology and environmental communities, leverages this formal approach for creating, analyzing, and sharing complex scientific workflows.15 It can access and connect disparate data sources (e.g., streaming sensor data, satellite images, simulation output) and integrate software components for analysis or visualization.
From page 116...
... . Virtual machines encapsulate an entire computational environment, from the operating system up through all the layers of software, including the whole file system.
From page 117...
... The researchers published Jupyter notebooks that reproduced the analysis of the data displaying the signature of a binary black-hole merger.19 Recent technological advances in version control, virtualization, computational notebooks, and automatic provenance tracking have the potential to simplify reproducibility, and tools have been developed that leverage these technologies. However, given that computational reproducibility requirements vary widely even within a discipline, there are still 17 Docker is a free and open source; see https://www.docker.com/resources/what-container.
From page 118...
... send the data and code to be checked for reproducibility may make authors more careful to document their data and analyses clearly and to make sure the reported results are free of errors. Another important benefit of this approach is that verifying all manuscripts for reproducibility before publication should lead to a high rate of reproducibility of published results.
From page 119...
... OVERCOMING TECHNOLOGICAL AND INFRASTRUCTURE BARRIERS TO REPRODUCIBILITY Even if complete information about the computational environment and workflow of a study are accurately recorded, computational reproducibility is only practically possible if this information is available to other researchers. The open and persistent availability of digital artifacts, such as data and code, are also essential to replicability efforts, as explained in Chapter 5.
From page 120...
... Most commonly, to meet these requirements, a Digital Object Identifier (DOI, see below) is used as a unique global identifier, long-term preservation guarantees are at least 10 years, and FAIR principles are used.
From page 121...
... and deposit large files; the default is 50GB, but one can request larger capacity. Researchers use it to deposit larger research datasets, such as discretization meshes, as well as to archive a full code base from its GitHub repository and to get a Digital Object Identifier (DOI)
From page 122...
... assign a DOI to all artifact deposits, whether they are data, figures, or a snapshot of the complete archive of a research software. Journals with a data sharing policy often require the data that a paper relied on or produced be deposited in an archival-quality repository with a DOI.
From page 123...
... Research institutions, research funders, and disciplinary groups all recognize that they have responsibilities for the long-term stewardship of digital artifacts, and they are developing strategies for meeting those needs. For example, university libraries are developing strategies for covering the associated costs (Erway and Rinehart, 2016)
From page 124...
... designate a Chief Data Officer who shall be responsible for lifecycle data management and other specified functions." The act also establishes in the U.S. Office of Management and Budget a Chief Data Officer Council for establishing government-wide best practices for the use, protection, dissemination, and generation of data and for promoting data sharing agreements among agencies.
From page 125...
... Implementation Challenges Efforts to support sharing and persistent access to data, code, and other digital artifacts of research in order to facilitate reproducibility and replicability will need to navigate around several persistent obstacles. For example, to the extent that federal agencies and other research sponsors can harmonize repository requirements and data management plans, it will simplify the tasks associated with operating repositories and perhaps even help to avoid an undue proliferation of repositories.
From page 126...
... These archives could be based at the institutional level or be part of, and harmonized with, the NSF-funded Public Access Repository; • consider extending NSF's current data management plan to include other digital artifacts, such as software; and • work with communities reliant on nonpublic data or code to de velop alternative mechanisms for demonstrating reproducibility. Through these repository criteria, NSF would enable discoverability and standards for digital scholarly objects and discourage an undue proliferation of repositories, perhaps through endorsing or providing one go-to Website that could access NSF-approved repositories.
From page 127...
... Second, for researchers who may want to replicate a study, transparency means that sufficient details are provided so that the researcher can adhere closely to the original protocol and have the best opportunity to replicate the results. Finally, transparency can serve as an antidote to questionable research practices, such as hypothesizing after results are known or p-hacking, by encouraging researchers to thoroughly track and report the details of the decisions they made and when they made them.
From page 128...
... preregistration of studies 7. preregistration of analysis plans 8.
From page 129...
... . ACM has introduced a set of badges for journal articles that certify whether the results have been replicated or reproduced, and whether digital artifacts have been made available or been verified.
From page 130...
... Introducing Prepublication Checks for Errors and Anomalous Results Several methods and tools can be used by researchers, peer reviewers, and journals to identify errors in a paper prior to publication. These methods and tools support reproducibility by strengthening the reliability and rigor of results and also deter detrimental research practices such as inappropriate use of statistical analysis as well as data fabrication and falsification.
From page 131...
... For example, in psychology and the social sciences, several mathematical tools have been developed to check the statistical data and analyses that use null hypothesis tests, including statcheck and p-checker: • statcheck independently computes the p-values based on reported statistics, such as the test statistic and degrees of freedom. This tool has also been used to assess the percentage of reported p-values that are inconsistent with their reported data across psychology journals from 1985 to 2013 (Epskamp and Nuijten, 2016; Nuijten et al., 2016)
From page 132...
... One proposed solution is registering the analysis plan before the research is conducted, or at least before the data are examined. This practice goes by different names in different fields, including "pre-analysis plan," "preregistration," and "trial registration." These plans include a precise description of the research design, the statistical analysis that will test the key hypothesis or research question, and the team's plan for specific analytic decisions (e.g., how sample size will be determined, how outliers will be identified and treated, when and how to transform variables)
From page 133...
... This lack of acceptance of preregistration may be due, at least in part, to the fact that its effectiveness in changing research practices and improving replication rates is unknown. It could also be due to the fact that tools for preregistering studies have not been around as long as tools for calculating effect sizes or statistical power, so norms surrounding preregistration have had less time to develop.
From page 134...
... . In addition to these new specialized journals, some traditional journals have advertised that they welcome submissions of replication studies and studies reporting negative results.
From page 135...
... A growing number of journals offer the option for submissions to be reviewed on the basis of the proposed research question, method, and proposed analyses, without a reviewer knowing the results.42 The principal argument for this type of approach is to separate the outcome of the study from the evaluation of the importance of the research question and the quality of the research design. The idea is that if the study is testing an important question, and testing it well, then the results should be worthy of publication regardless of the outcome of the study.
From page 136...
... One such example is the Psychology and Cognitive Neuroscience section of Royal Society Open Science, which is committed to publishing replications of empirical research particularly if published within its journal.44 There are new journals that print only negative results, such as New Negatives in Plant Science, as well as efforts by other journals to highlight negative results and failures to replicate, such as PLOS's "missing pieces" collection of negative, null, and inconclusive results. IEEE Access is an open access journal that publishes negative results, in addition to other articles on topics that do not fit into its traditional journals.45 Some journals have created specific protocols for conducting and publishing replication studies.
From page 137...
... By making funding contingent on researchers' following specific requirements, funders can make certain practices mandatory for large numbers of researchers. For example, a funding agency or philanthropic organization could require that grantees follow certain guidelines for design, conduct, and reporting or could require grantees to preregister hypotheses or publish null results.
From page 138...
... has not yet introduced standards and guidance directly aimed at enhancing reproducibility and replicability in its ap plication process, as NIH has. NSF does have a data sharing policy and requires that proposals include a data management plan.
From page 139...
... announced in 2016 that it would commit $3.3 million over 3 years for TABLE 6-1  Assessment of the Desirability of Replication Studies Criteria The desirability of a replication study: Knowledge • is higher when results from a previous study seem more implausible • is higher when there are more doubts about the validity of the methods or the proper execution of a previous study • is higher when its results may have a major impact on scientific knowledge • is higher when it may help improve research methods Impact • is higher when its results may have a major societal impact • is higher when it may help avoid wasting research resources on a scientific dead end • is higher when it may improve the functioning of a whole discipline (replication series) Cost • is lower when it requires more resources and time investment by researchers • is lower when it places a heavier burden on human and animal test subjects Alternatives • must be weighed against performing innovative studies • must be weighed against taking other measures to improve reproducibility SOURCE: Royal Netherlands Academy of Arts and Sciences (2018, Table 4)
From page 140...
... This development would counter the current problem of widespread noncompliance with some open science mandates. Finally, the nature of data, code, and other digital objects used or generated by different disciplines varies widely.
From page 141...
... If private or public funders choose to invest in initiatives on reproducibility and replication, two areas may benefit from additional funding: • education and training initiatives to ensure that researchers have the knowledge, skills, and tools needed to conduct research in ways that adhere to the highest scientific standards; describe methods clearly, specifically, and completely; and express accurately and appropriately the uncertainty involved in the research; and • reviews of published work, such as testing the reproducibility of published research, conducting rigorous replication studies, and publishing sound critical commentaries. RECOMMENDATION 6-9: Funders should require a thoughtful dis cussion in grant applications of how uncertainties will be evaluated, along with any relevant issues regarding replicability and computa tional reproducibility.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.