The National Academies Press

Currently Skimming:

6 Panel Discussion
Pages 62-68

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 62... ... For example, Hero described the challenge and potential value of integrating data-tracking phenomena on the subcellular and cellular levels with observational data from individual patients or cohorts. Creating models that combine these disparate types of data across different scales is a critical challenge that has many researchers stuck and does not receive sufficient attention from funding agencies. Read the entire page →
From page 63... ... Yu noted that simulation and computational approaches could be valuable to study dependent model structures; disciplines such as chemistry and physics have established strong computational subfields, whereas statistics has not, and she concluded that targeted investments from the National Science Foundation in computational statistics could efficiently advance understanding. Complementary to rigorous studies on model structure, Daniels said that there is a critical need to develop methods and approaches to identify and address messiness in large, heterogeneous data sets, which occurs before, and therefore underlies, model selection and inference. Read the entire page →
From page 64... ... She described that her research group embeds graduate students and postdoctoral scholars in domain science labs, which helps statisticians understand what questions collaborators are pursuing, how the data being evaluated are generated, and what useful knowledge is statistically supported with available data. She stated that collaborators do not always just need a p-value or confidence interval, and there is a broader opportunity to engage collaborators in creating an evolving, systematic approach to defining and pursuing statistical problems. Read the entire page →
From page 65... ... Shalizi commented that big data does not seem to reveal any problems with the concept of statistical inference, but rather that big data exposes the limitations of the simplifying assumptions used in introductory statistics classes. For example, the statistics community has always known that the linear model with Gaussian noise is too simplified; that p-values combine information on the size of a coefficient, how well it can be measured, and how large the sample size is but does not indicate variable importance; and that no amount of additional data will help if the quantity of interest is not identified in the collected variables. Read the entire page →
From page 66... ... He encouraged funding agencies to develop graduate and postdoctoral training programs that specifically identify statistics as a necessary component of data science and to call out statistics explicitly in large program announcements. Shifting to future research needs, Hogan said it is critically important to make the distinction between inten tionally collected data and "found data," such as electronic health records (EHRs) Read the entire page →
From page 67... ... Based on criteria such as these, the participant believes that it could be possible to identify those questions for which big data will help and those that hold little promise. FACILITATION OF DATA SHARING AND LINKAGE Yu urged funding agencies to help improve and incentivize data sharing -- particularly referring to EHRs -- across multiple institutions, saying that this Read the entire page →
From page 68... ... Hero commented that development and wide dissemination of statistics software packages could reduce the barriers to identifying and applying the appropriate tools and would advance both statistics and domain sciences. Lin brought up the challenges of data sharing, saying that efforts need to go beyond simply sharing data by promoting linkages across different data sets. Read the entire page →

From page 62...

... For example, Hero described the challenge and potential value of integrating data-tracking phenomena on the subcellular and cellular levels with observational data from individual patients or cohorts. Creating models that combine these disparate types of data across different scales is a critical challenge that has many researchers stuck and does not receive sufficient attention from funding agencies.

Read the entire page →

From page 63...

... Yu noted that simulation and computational approaches could be valuable to study dependent model structures; disciplines such as chemistry and physics have established strong computational subfields, whereas statistics has not, and she concluded that targeted investments from the National Science Foundation in computational statistics could efficiently advance understanding. Complementary to rigorous studies on model structure, Daniels said that there is a critical need to develop methods and approaches to identify and address messiness in large, heterogeneous data sets, which occurs before, and therefore underlies, model selection and inference.

Read the entire page →

From page 64...

... She described that her research group embeds graduate students and postdoctoral scholars in domain science labs, which helps statisticians understand what questions collaborators are pursuing, how the data being evaluated are generated, and what useful knowledge is statistically supported with available data. She stated that collaborators do not always just need a p-value or confidence interval, and there is a broader opportunity to engage collaborators in creating an evolving, systematic approach to defining and pursuing statistical problems.

Read the entire page →

From page 65...

... Shalizi commented that big data does not seem to reveal any problems with the concept of statistical inference, but rather that big data exposes the limitations of the simplifying assumptions used in introductory statistics classes. For example, the statistics community has always known that the linear model with Gaussian noise is too simplified; that p-values combine information on the size of a coefficient, how well it can be measured, and how large the sample size is but does not indicate variable importance; and that no amount of additional data will help if the quantity of interest is not identified in the collected variables.

Read the entire page →

From page 66...

... He encouraged funding agencies to develop graduate and postdoctoral training programs that specifically identify statistics as a necessary component of data science and to call out statistics explicitly in large program announcements. Shifting to future research needs, Hogan said it is critically important to make the distinction between inten tionally collected data and "found data," such as electronic health records (EHRs)

Read the entire page →

From page 67...

... Based on criteria such as these, the participant believes that it could be possible to identify those questions for which big data will help and those that hold little promise. FACILITATION OF DATA SHARING AND LINKAGE Yu urged funding agencies to help improve and incentivize data sharing -- particularly referring to EHRs -- across multiple institutions, saying that this

Read the entire page →

From page 68...

... Hero commented that development and wide dissemination of statistics software packages could reduce the barriers to identifying and applying the appropriate tools and would advance both statistics and domain sciences. Lin brought up the challenges of data sharing, saying that efforts need to go beyond simply sharing data by promoting linkages across different data sets.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

6 Panel Discussion Pages 62-68

6 Panel Discussion
Pages 62-68