Skip to main content

Currently Skimming:

4 The Way Forward: Using Statistics to Improve Reproducibility
Pages 68-96

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 68...
... . The final panel discussion on research as the way forward from the data sciences perspective was moderated by Constantine Gatsonis (Brown University, planning committee co-chair, and chair of the Committee on Applied and Theoretical Statistics)
From page 69...
... . Open Problems, Needs, and Opportunities for Methodologic Research Giovanni Parmigiani began the first panel discussion by noting that there should be a more integrated approach to several issues, beginning with termi nology.
From page 70...
... According to Anestidou, four key themes arose at that workshop: • Transformation of the research enterprise, specifically systemic issues, sci entific training and culture, public perceptions, and incentives for research integrity; • Interactive assessment of published research; • Improvement in the reliability of published results; and • Enhanced understanding of animals and animal models, specifically from clinical research, and proactive planning in preclinical research. This in cludes reproducibility and the "3Rs" (reduce the number of animals used, refine the methodology, and replace animal models with in vitro and in silico approaches)
From page 71...
... research community and involve the laboratory animal veterinary community in the reproducibility conversation. Tim Errington, Center for Open Science Tim Errington began by describing some reproducibility issues that arise from researchers' degrees of freedom and explaining how they can essentially short circuit the scientific process, including a lack of replication (Makel et al., 2012)
From page 72...
... 1    The Open Science Framework and the Many Labs and Science Exchange website is https://osf. io/8mpji/, accessed January 12, 2016.
From page 73...
... Huo noted that the effort's focus on computation and data has a strong connection with reproducibility. The NSF solicitation for Critical Techniques and Technologies for Advancing Foundations and Applications of Big Data Science and Engineering (BIGDATA)
From page 74...
... Peng said that 5    The IPython Notebook website is http://ipython.org/notebook.html, accessed January 12, 2016. 6    The Galaxy website is https://galaxyproject.org, accessed January 12, 2016.
From page 75...
... Panel Discussion A participant noted that in the life sciences, Internal Review Boards (IRBs) in many academic institutions are currently reviewing research proposals before or after funding is received and these IRBs typically have statistical committees.
From page 76...
... How can incentives truly be changed? How can 7    The International Initiative for Impact Evaluation website is http://www.3ieimpact.org, accessed January 12, 2016.
From page 77...
... A participant stated that prereview of research plans, as would occur with Errington's proposal to revise the publication process, does not allow science to innovate freely. However, the existing IRB and IACUC systems are places where the improvements to analysis could be identified.
From page 78...
... She also commented that it is important for the community to consider the public perception of reproducibility. Keith Baggerly, MD Anderson Cancer Center Keith Baggerly explained that he has been associated with reproducibility efforts for a few years, motivated by a number of cases where he encountered process failures.
From page 79...
... Ronald Boisvert, Association for Computing Machinery and National Institute of Standards and Technology Ronald Boisvert discussed some of the efforts he has been involved with in the course of his position as a member and former co-chair of the Association for Computing Machinery (ACM) Publications Board, where issues related to reproducibility and data sharing are currently being considered.
From page 80...
... Over the years, the standards and procedures for doing the review and the terminologies have changed as the com 9    The SIGMOD Reproducibility website is http://db-reproducibility.seas.harvard.edu, accessed January 12, 2016.
From page 81...
... 11    ACM Digital Library, "Editorial: ACM TOMS Replicated Computational Results Initiative," http://dl.acm.org/citation.cfm? doid=2786970.2743015, accessed January 12, 2016.
From page 82...
... 13    The Image Processing On Line website is http://www.ipol.im, accessed January 12, 2016. 14    Other journals focused on publishing software include the ACM Transactions on Mathematical Software, SIAM Journal on Scientific Computing (Software Section)
From page 83...
... 16    The Figshare website is http://figshare.com, accessed January 12, 2016. 17    The GitHub website is https://github.com, accessed January 12, 2016.
From page 84...
... He hopes that 300 years from now, people will look back and see this as a transition time when science moved into doing things very differently. Marcia McNutt, Science Magazine Marcia McNutt began by noting that the scientific community is embracing the concept of reproducibility quickly.
From page 85...
... A goal was that improved transparency of these four experimental protocols would allow reviewers and readers to gain a level of confidence in the results. McNutt noted that authors are not required to follow these protocols; they are only required to state whether or not they did so.
From page 86...
... In conclusion, McNutt noted that Science added several statisticians to its board of reviewing editors to help screen and identify papers that may need extra scrutiny for the use of statistics or numerical analysis. She said this addition has raised the journal's standards.
From page 87...
... McNutt noted that eLife, for biomedical sciences, and the Center for Open Sciences, for the social sciences, are already making such efforts. A participant noted that many of the journal-sponsored workshops on re producibility focus on operational issues, such as having transparency, making data available, cataloging, and developing computing infrastructure that allow for the data to become available.
From page 88...
... He noted that assessing variation in larger databases is a form of sensitivity analysis and may be about as good as can be done in those cases. McNutt noted that there is new laboratory software entering beta testing that can track laboratory results to re veal systemic issues, such as equipment degradation, and help identify sources of bias and error in results (Gardner, 2014)
From page 89...
... l The Way Forward from the Data Sciences Perspective: Research Constantine Gatsonis opened the final workshop panel by highlighting some of the previously identified themes. The first relates to statistical thinking and deter­ mining evidence of reproducibility.
From page 90...
... An important theme that emerged from that workshop 18    The NSF's Big Data Strategic Initiative Workshop website is http://workshops.cs.georgetown. edu/BDSI-2015/, accessed January 12, 2016.
From page 91...
... He com mented that while he is not an expert in reproducibility, he has been working in biomedical data science for 20 years, helping to manage data and make discoveries. During this time, biomedical sciences have become data intensive and many re searchers must now be proficient in data management, data wrangling, computer algorithm optimization, and software development to implement methods.
From page 92...
... Efforts such as these are important in the biomedical sciences and also in other fields that are moving from data poor to data driven. His final point was that statisticians should not shy away from teaching students how to do applied statistics.
From page 93...
... 27    The Johns Hopkins University Data Science Program website is https://www.coursera.org/ specialization/jhudatascience/1, accessed January 12, 2016.
From page 94...
... A participant noted the existence of a generational problem where new data scientists are being trained but there is not a mechanism for current researchers to improve their existing training. Irizarry and Bourne both agreed that ongoing professional development is needed and wanted across fields, and NIH is funding initiatives to support this development.
From page 95...
... A participant noted that the default in large genomic data sets is to resort to multiple hypothesis testing to correct for really small p-values, while keeping the same p-value thresholds, but wondered whether that is a reasonable thing to do. 28    The NIH's Gene Expression Omnibus website is http://www.ncbi.nlm.nih.gov/geo/, accessed January 12, 2016.
From page 96...
... 96 S tat i s t i c a l C h a l l e n g e s in the Reproducibility of S c i e n t i f i c R e s u lt s Leek responded that correcting for multiple testing is a good idea, particularly using measures such as the false discovery rate or other error rates, but there are tricky issues when going to higher dimensions in terms of dependence, when to do multiple hypothesis tests, p-value hacking, and selective inference. There are many ways to get things wrong, even if one corrects for multiple testing, but this testing is generally recommended.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.