Skip to main content

Currently Skimming:

4 Inference About Causal Discoveries Driven by Large Observational Data
Pages 30-43

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 30...
... remain relatively open in the context of large, complex data sets containing many treatment variables with possible interactions. Comparative effectiveness research using EHRs faces challenges related to potentially large amounts of missing data ("missingness")
From page 31...
... More recently there is growing interest in modeling the entire HIV care cascade, typically using micro­ imulation s techniques based on complex, nonlinear state-space mathematical models. Typically these approaches assume an underlying parametric model, aggregate data from numerous different sources to quantify relevant parameters, calibrate the model against known target outcomes, and then explore the effects of alternative interventions through iterative simulation.
From page 32...
... With growing availability of EHRs containing longitudinal data for thousands of patients, Hogan explained that it is possible to develop statistical models of the HIV care cascade that are representative of a well-defined population in actual care settings. However, he cautioned that using observational EHR data can be chal lenging compared to using data from a cohort study due to irregular observation times and abundance of confounding factors.
From page 33...
... Building a statistical model allows for formal quantification of uncertainty and application of standard methods, such as confidence intervals and hypothesis tests, to support inferences about effects of interest. In comparing statistical and mathematical approaches to modeling the HIV care cascade, Hogan described the former as beginning with as much data as are available and building a simple mathematical model to describe the data, whereas mathematical models typically focus on building a more complex process model and then using select historical data for calibration.
From page 34...
... Hogan concluded that as complex, messy, information-rich EHR data become increasingly available and are potentially used to inform treatment decisions, practice patterns, and health care policy, "Statistical principles could hardly be more important." DISCUSSION OF CAUSAL INFERENCES ON THE HUMAN IMMUNODEFICIENCY VIRUS CARE CASCADE FROM ELECTRONIC HEALTH RECORDS DATA Elizabeth Stuart, Johns Hopkins University In mental health research there is increasing interest in comprehensive sys tems modeling, said Elizabeth Stuart, which is an area ripe for combining math ematical and statistical modeling. However, large-scale mathematical models typically require more assumptions than statistical models and may contain unnecessary complexity that is irrelevant to the specific decision context.
From page 35...
... , she said, and when estimating population effects it is possible that a small nonrepresentative randomized trial actually has more bias than a large nonexperimental study in a representative sample. Stuart was optimistic that with increasing access to big data such as population-wide EHRs, there would be the opportunity to conduct welldesigned nonexperimental studies that provide better evidence about treatment effectiveness in the real world.
From page 36...
... Regarding the broader ques tion, empirical selection of confounders is another example where there is tension between statistical theory and practice, said Hogan, because in theory it is impos sible to know whether a sufficient set of confounders has been selected. He said he is hesitant to apply data-driven approaches to confounder selection using data such as those from an EHR, in part because the data structure is so irregular, but perhaps more importantly because it might not be a representative sample from 1    ayesian B melding combines mathematical modeling with data (e.g., Poole and Raftery, 2000)
From page 37...
... He encouraged participants to think of causal inference in terms of factoring a joint distribution of observed and unobserved potential outcomes, and he noted that clearly separating these two components in a model makes untestable assumptions clear and leads to more coherent and transparent inferences. A GENERAL FRAMEWORK FOR SELECTION BIAS DUE TO MISSING DATA IN ELECTRONIC HEALTH RECORDS-BASED RESEARCH Sebastien Haneuse, Harvard University Sebastien Haneuse began by reiterating a fundamental difference in the scientific goals of comparative effectiveness research -- for example, Hogan's presentation comparing two HIV treatment strategies -- and exploratory analyses, as discussed in the workshop's first session.
From page 38...
... Although some of these challenges are not new and are encountered in tradi­ tional observational studies, existing methods for addressing them are ill suited for the scale, complexity, and heterogeneity of EHR data, said Haneuse. There is an emerging literature focused on statistical methods for comparative effectiveness research with EHRs that has focused largely on resolving confounding bias, whereas problems of selection bias and missing data are underappreciated.
From page 39...
... Conceiving of the ideal study and resulting data allows for a concrete definition of complete and missing data, which analysts can use to characterize why any given patient has complete or missing data. Specifically, Haneuse presented the general strategy of decomposing the single-mechanism model of why a patient has missing data into a series of more manageable submechanisms, with each submechanism representing a single decision.
From page 40...
... Showing odds ratios from logistic regression models for the single mechanism and three-mechanism models, Haneuse emphasized that different covariates can have different effects on each submechanism, which makes it diffi cult to interpret the significance of coefficients from the single-mechanism model. While the preceding example focused on three specific mechanisms, Haneuse said there are many alternative submechanisms that cause missing data that could be considered.
From page 41...
... DISCUSSION OF COMPARATIVE EFFECTIVENESS RESEARCH USING ELECTRONIC HEALTH RECORDS Dylan Small, University of Pennsylvania Dylan Small reminded participants that using EHRs for comparative effectiveness research can be cheaper, faster, more representative of real-world effectiveness, and more statistically powerful (due to large sample sizes) than randomized trials.
From page 42...
... . Such confounding can be quite subtle and complex in the clinical health care context, and comparative effectiveness studies using EHR data need to critically consider why two patients with similar observed covariates received different treatments.
From page 43...
... PANEL DISCUSSION In a follow-on panel discussion with Sebastien Haneuse and Dylan Small, a participant described Haneuse's strategy of comparing EHR data to data that would result from the ideal randomized trial as the appropriate way for statisticians and other researchers to reason through causal discoveries made from big data, as ­ pposed to simply extracting as much as possible using advanced methods o such as machine learning. The participant asked if Haneuse had considered a way to quantify the extent of missingness, similar to a missing information ratio, relative to the ideal study design.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.