Skip to main content

Currently Skimming:

1 Introduction
Pages 1-7

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 1...
... , environmental exposure, activities, and genetic and proteomic information is expected to help guide the development of personalized medicine. However, producing actionable scientific knowledge from such large, complex data sets requires ­ tatistical models that produce reliable inferences (NRC, 2013)
From page 2...
... BOX 1.1 Statement of Task An ad hoc committee appointed by the National Academies of Sciences, Engineering, and Medicine will plan and organize a workshop to examine challenges in applying scientific inference to big data in biomedical applications. To this end, the workshop will explore four key issues of scientific inference: • Inference about causal discoveries driven by large observational data, • Inference about discoveries from data on large networks, • Inference about discoveries based on integration of diverse data sets, and • Inference when regularization is used to simplify fitting of high-dimensional models.
From page 3...
... Outside of the identified themes, many other important questions were raised with varying levels of detail as described in the summary of individual speaker presentations. Big Data Holds Both Great Promise and Perils Many presenters called attention to the tremendous amount of information available through large, complex data sets and described their potential to lead to new scientific discoveries that improve health care research and practice.
From page 4...
... More broadly, when analyses of big data are used for scientific discovery, to help form scientific conclusions, or to inform decision making, statistical reasoning and inferential formalism are required. Inference Requires Evaluating Uncertainty Many workshop presenters described significant advances made in develop ing algorithms and methods for analyzing large, complex data sets.
From page 5...
... Examples included connecting subcellular descriptions of gene and protein expression with longitudinal EHRs and combining neuroscience technologies and methods spanning the individual neuron scale to whole brain regions. Alfred Hero said that the challenges associated with creating integrative statistical models informed by known biology are substantial because
From page 6...
... Another suggestion was to organize undergraduate curricula around fundamental prin ciples rather than introducing students to a series of statistical tests to match with data. Many pitfalls faced in analysis of large, heterogeneous data sets result from ­inappropriate application of simplifying assumptions that are used in introductory statistics courses, suggested Shalizi.
From page 7...
... Introduction 7 ORGANIZATION OF THIS WORKSHOP PROCEEDINGS Subsequent chapters of this publication summarize the workshop presentations and discussions largely in chronological order. Chapter 2 provides an overview of the workshop and its underlying goals, Chapter 3 focuses on inference about discoveries based on integration of diverse data sets, Chapter 4 discusses inference about causal discoveries from large observational data, and Chapter 5 describes inference when regularization methods are used to simplify fitting of high-dimensional models.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.