An Evidence Review and Evaluation Process to Inform Public Health Emergency Preparedness and Response Decision Making
The committee was charged with developing a methodology for and subsequently conducting a systematic review and evaluation of the evidence base for public health emergency preparedness and response (PHEPR) practices.1 Specifically, the committee was asked to establish a tiered grading scheme to be applied in assessing the strength or certainty of the evidence (COE)2 for specific PHEPR practices and in developing recommendations for evidence-based practices. This chapter describes the committee’s approach to developing a transparent process for making judgments about the evidence for cause-and-effect relationships and understanding the balance of benefits and harms of PHEPR practices.
The chapter begins with a discussion of the evolving philosophies regarding the identification of evidence-based practices, the challenges of evaluating interventions that are complex or implemented in complex systems, and the developing methodologies to address those complexity issues. It then describes the established evidence evaluation frameworks that informed the committee’s methodology. Next, the chapter details the key elements and approaches of the methodology developed and applied by the committee for reviewing and evaluating PHEPR evidence to inform decision making. Finally, the chapter concludes with lessons learned from the development and application of the committee’s methodology and recommendations for supporting ongoing efforts to build a cumulative evidence base for PHEPR.
1 The committee defined PHEPR practices as a type of process, structure, or intervention whose implementation is intended to mitigate the adverse effects (e.g., morbidity and mortality, economic impacts) of a public health emergency.
2 “Strength of evidence” and “certainty of the evidence” are often used interchangeably. While the committee’s charge used “strength of evidence,” the committee uses the phrase “certainty of the evidence” throughout this report (except when referring to the grading of qualitative evidence, for which the field-accepted term “confidence” is used). “Certainty of the evidence” can be defined in different ways, depending on the context in which the term will be used. For the purposes of making recommendations, it represents the extent of confidence that the estimates of an effect are adequate to support a particular recommendation or decision. When it is not possible or helpful to generate an estimate of effect size, the certainty of the evidence may reflect the confidence that there is a non-null effect (i.e., the intervention is effective) (Hultcrantz et al., 2017).
EVOLVING PHILOSOPHIES FOR EVALUATING EVIDENCE TO INFORM EVIDENCE-BASED PRACTICE: IMPLICATIONS FOR PHEPR
Systems for evaluating the evidence supporting given practices and interventions are a valued resource for practitioners, policy makers, and others who seek to use the best available evidence for decision making, but who lack the time, resources, or expertise needed to review and interpret a large and potentially inconsistent body of evidence. Moreover, the conduct of such reviews by reputable expert groups can increase the efficiency and consistency of the process. As discussed in Chapter 1, knowledge regarding evidence-based practice is critically needed in PHEPR given the mandate of the PHEPR system to mitigate the health, financial, and other impacts of public health emergencies. To date, however, there has been little effort to develop a rigorous and transparent process for identifying evidence-based PHEPR practices. The development of such a process requires an understanding of the methodological foundation for evidence-based practice, which continues to evolve to meet the evidentiary needs of more complex problems. The following sections describe this evolution and the implications for PHEPR given the complex nature of the PHEPR system, the kinds of questions that are of interest to PHEPR practitioners, and the volume and types of evidence that exist to answer those questions.
Issues concerning how to reach conclusions about cause-and-effect relationships have long been deliberated in the fields of science and health. The foundation for the primacy of the experimental clinical trial in a hierarchy that ranks sources of evidence of effect, generally based on experimental design, and underpins the rise of evidence-based medicine (EBM) dates back nearly 100 years (Fisher, 1925). This foundation has served the stakeholders in clinical care very well, as a plethora of advances in medicine have been shown to be beneficial in clinical trials (e.g., use of beta blockers following myocardial infarction, colorectal cancer screening in older adults), while other, once-popular interventions have been shown to be ineffective and hence discarded (e.g., extracranial–intracranial bypass to prevent stroke, Lorenzo’s oil to treat cancer). Following on the successes of the EBM model, the evidence hierarchy was subsequently applied in other fields (e.g., public health, education) to support evidence-based practice and policy (Boruch and Rui, 2008; Briss et al., 2000, 2004). In its broader application, however—and increasingly within clinical medicine as well—limitations of the traditional evidence hierarchy were recognized (Durrheim and Reingold, 2010). What works well when the intervention is an immunization or a medication may not work as well when evaluating a multicomponent quality improvement intervention or a systemic organizational change (Walshe, 2007). Notably, the application of EBM methods to research and reviews of public health practice has been challenging because, in addition to variation in effects across population groups and settings, the context in which an intervention is implemented can alter the intervention itself (as may be the case, for example, in organizational interventions) (Booth et al., 2019). This context sensitivity makes it difficult to draw conclusions about the relevance of findings from an intervention studied in one set of circumstances to the use of the same intervention under different circumstances. The early application of evaluation methods in medicine focused primarily on achieving impact estimates with high internal validity—a goal that is better suited to well-controlled clinical settings and relatively homogenous physiological systems than to public health (Green et al., 2017). Demonstrating effectiveness in a controlled setting is important, but so, too, is
knowing the likelihood that the findings from a study or set of studies conducted in particular contexts would apply to other settings (Leviton, 2017).
Moreover, not all interventions and practices can be studied in the context of a randomized controlled trial (RCT), for practical and/or ethical reasons (WHO, 2015). For example, communities cannot be randomized and assigned to experience a public health emergency, and in many instances, best practices for emergency response have been developed over time and cannot ethically be replaced with a placebo or no response. Nor is it always necessary to conduct an RCT to evaluate whether a cause-and-effect relationship exists. Thus, it is useful to consider any evidence that provides credible estimates of a causal impact (or lack thereof) between an intervention and the outcome of interest.
In 1965, Sir Austin Bradford Hill proposed a set of factors to apply when assessing whether an observed epidemiologic association is likely to be causal in nature. These factors draw on evidence from multiple sources and include (1) the strength of an association; (2) the consistency of the association (i.e., replicability across different studies, settings, and populations); (3) the specificity of the association; (4) the temporality of the association (i.e., whether the hypothesized cause precedes the effect); (5) the existence of a biological gradient (i.e., observation of a dose–response relationship); (6) the plausibility of the causal mechanism; (7) the coherence of the data with other evidence; (8) the availability of supporting evidence from experiments; and (9) the analogy or similarity of the observed associations with any other associations (Hill, 1965). Together these factors make up one of the earliest frameworks for evaluating evidence to reach conclusions about causal effects, and it is still widely applied for the purposes of causal inference. Hill’s criteria, however, were proposed in the context of simple exposure–disease relationships and may be less directly applicable to the evaluation of cause-and-effect relationships for complex or system-level interventions.
Since Hill’s time, the concept of frameworks for evaluating evidence has received increasing attention, and numerous such frameworks have been developed. Importantly, however, some authorities, including Hill himself, have argued that a rigid application of evidence criteria cannot and should not replace a global assessment of the evidence by someone with skills and training in the subject matter and methods used (Hill, 1965; Phillips and Goodman, 2004).
As policy makers and practitioners have increasingly recognized the importance of having an evidence base to tackle complex challenges, there has been a growing movement among those who conduct systematic reviews and develop guidelines to embrace methods that take a complexity perspective and use multiple sources and types of evidence. Early efforts to overcome methodologic challenges related to evaluating evidence for complex, multicomponent, and community-level public health interventions were undertaken during the development of The Guide to Community Preventive Services (The Community Guide) (Truman et al., 2000). More recently, three seminal report series were published that address these complexity issues and informed the committee’s work: the Cochrane series on Considering Complexity in Systematic Reviews of Interventions, the Agency for Healthcare Research and Quality’s (AHRQ’s) series on Complex Intervention Systematic Reviews, and the World Health Organization’s (WHO’s) series on Complex Health Interventions in Complex Sys
tems: Concepts and Methods for Evidence-Informed Health Decisions.3 It should be noted, however, that methods for evaluating complex interventions and systems represent an active area of ongoing development.
The complexity perspective reflects a shift away from a focus on simple, linear cause- and-effect models and has been used increasingly in the health sector, particularly in public health, to “explore the ways in which interactions between components of an intervention or system give rise to dynamic and emergent behaviors” (Petticrew et al., 2019, p. 1). Multiple dimensions of intervention complexity may be considered in the evaluation of evidence, including
- intervention complexity—for interventions with multiple, often interacting, components;
- pathway complexity—for interventions characterized by complicated and nonlinear causal pathways that may feature feedback loops, synergistic effects and multiple mediators, and/or moderators of effect;
- population complexity—for interventions that target multiple participants, groups, or organizational levels;
- contextual complexity—for interventions that are context-dependent and need to be tailored to local environments; and
- implementation complexity—for interventions that require multifaceted adoption, uptake, or integration strategies (Guise et al., 2017).
A complex intervention perspective is different from a complex system perspective, and the choice of which to adopt when conducting a review is appropriately determined by the needs of the policy makers and practitioners. A complex system perspective is appropriate when the focus is on the system and how it changes over time and interacts with and adapts in response to an intervention (Petticrew et al., 2019). In such cases, the objective of the review may shift from determining “what works” to understanding “what happens” and to formulating theories on how those effects are produced (Petticrew, 2015).
Addressing the issues of the complexity of an intervention, the details of the implementation process, and the context in which the intervention is implemented requires the adaptation of existing or the development of new frameworks for assessing evidence. Reviewers and guideline developers have been developing and testing novel quantitative, qualitative, and mixed methods for systematic reviews and evidence synthesis and grading to better capture complexity (Briss et al., 2000; Guise et al., 2017; Noyes et al., 2019; Petticrew et al., 2013a; Waters et al., 2011). The starting point for complex reviews is commonly to develop a logic model as the analytic framework that represents an intervention and how it works in the complex system in which it is implemented as the theoretical basis for subsequent reviews (Anderson et al., 2011; Rohwer et al., 2017). In addition to quantitative reviews of intervention effects using novel methods (Higgins et al., 2019), standalone qualitative evidence syntheses are particularly useful for gaining an understanding of intervention complexity, and of how various aspects of complexity affect the acceptability, feasibility, and implementation of interventions and the way they work in specific contexts with specific populations (Flemming et al., 2019). There exist approximately 20 different
3 In 2013, the series Considering Complexity in Systematic Reviews of Interventions was published by the Cochrane Review in the Journal of Clinical Epidemiology. The series Complex Intervention Systematic Reviews, which was published in 2017 in the Journal of Clinical Epidemiology, resulted from an expert meeting convened by AHRQ. In 2019, WHO released the series Complex Health Interventions in Complex Systems: Concepts and Methods for Evidence-Informed Health Decisions, which was published in BMJ Global Health.
qualitative synthesis methods, some of which enable theory development. Given this wide choice of methods, the European Union recently published guidance on criteria to consider when choosing a qualitative evidence synthesis method for use in health technology assessments of complex interventions (Booth et al., 2016).
Additionally, review methods for complex interventions and systems have focused on the integration of diverse and heterogeneous types of evidence. Qualitative4 and quantitative evidence may both contribute to understanding an intervention or practice and ultimately what works, necessitating synthesis approaches that combine these different types of evidence (Noyes et al., 2019; Thomas and Harden, 2008). In some instances, guideline groups have synthesized across diverse evidence streams by mapping qualitative to quantitative findings or vice versa, so as to better understand the phenomenon of interest (Glenton et al., 2013; Harden et al., 2018; WHO, 2018). For example, to better understand how lay health worker programs work, and particularly how context affects implementation, Glenton and colleagues (2013) mapped findings on barriers and facilitators (mediators and moderators) from a qualitative evidence synthesis onto a causal model derived from a previously conducted quantitative effectiveness review. The authors suggest that this integrative synthesis approach may help decision makers better understand the elements that may promote program success. Realist review methods (a mixed-method approach) are also gaining traction as an alternative to the traditional positivist approach5 (Gordon, 2016), focused on explaining the interactions among context, mechanisms, and outcomes (Wong et al., 2013). Realist review methods yield an evidence-informed theory of how an intervention works. By helping to understand the intervention mechanisms and the contexts in which those mechanisms function, realist reviews can assist decision makers in judging whether an intervention is likely to be useful in their own context(s), considering context-specific tailoring, and determining whether an intervention is likely to scale (Berg and Nanavati, 2016; Greenhalgh et al., 2011; Pawson et al., 2005).
The evolving methods described above for the review and evaluation of interventions that are complex or implemented in complex systems are of particular relevance to the PHEPR context. As discussed in Chapter 2, the PHEPR system, with its multifaceted mission to prevent, protect against, quickly respond to, and recover from public health emergencies (Nelson et al., 2007b), is inherently complex and encompasses policies, organizations, and programs. This complexity also stems in part from the nature of public health emergencies, which are often unpredictable, may evolve rapidly, and are highly heterogeneous with respect to setting and type (e.g., weather events, disease outbreaks, terrorist events) (Hunter et al., 2013). Setting is not limited to geographic location, but also encompasses the sociocultural and demographic environment, as well as the characteristics of the communities and the responding entities (e.g., organizational structure, managerial experience, staff capabili-
4 While there is general understanding of quantitative evidence as numerical data derived from quantitative measurements, misconceptions regarding what constitutes qualitative research and qualitative evidence are common. Qualitative research uses “qualitative methods of data collection and analysis to produce a narrative understanding of the phenomena of interest. Qualitative methods of data collection may include, for example, interviews, focus groups, observations and analysis of documents” (Noyes et al., 2019, p. 2). Qualitative evidence can also be extracted, for example, from free-text boxes in questionnaires, but this type of qualitative data tends to be less useful as it is thin and lacks context. A questionnaire survey would not, however, be considered a qualitative research study.
ties, social trust, and other resources). PHEPR practices themselves may also be complex, featuring multiple interacting components that target multiple levels (e.g., individual, population, system), and with implementation that is often tailored to local conditions (Carbone and Thomas, 2018).
The questions prioritized by PHEPR stakeholders are not limited to the effectiveness of policies and practices as measured by their effects on health and system outcomes. PHEPR practitioners have identified important knowledge gaps related to implementation, such as understanding the barriers to using information-sharing systems to share data between and among states and localities (Siegfried et al., 2017) and knowing when an emergency operations center (EOC) should be activated. Addressing this wide range of operations-related questions requires assessing evidence beyond that generated through RCTs and other quantitative impact studies: evidence from qualitative studies and other sources is needed to supplement that from quantitative studies to illuminate the “hows” and “whys” in complex systems (Bate et al., 2008; Greenhalgh et al., 2004; Hohmann and Shear, 2002; Petticrew, 2015).
A considerable challenge when reviewing evidence to determine the effectiveness of PHEPR practices and implementation strategies relates to the often indirect links between the practices and primary health outcomes (e.g., morbidity and mortality) (Nelson et al., 2007a). Simple one-to-one linear cause-and-effect relationships between PHEPR practices and outcomes are the exception rather than the rule. In most circumstances, multiple pathways link practices to outcomes. Intermediate outcomes that reflect the array of potential harms and benefits may be organizational or operational, and the balance of benefits and harms is influenced by the various stakeholders’ values and perceptions regarding feasibility and acceptability. Moreover, multiple interacting interventions are often implemented simultaneously, making it difficult to assess the effect of each in isolation and their additive effects, and to distinguish those that are necessary from those that are sufficient, or at least contributory, for any given event (Nelson et al., 2007a). For example, a suite of non-pharmaceutical interventions, including isolation of sick patients, quarantine of contacts, and school closures, may be implemented simultaneously during an epidemic to reduce transmission and morbidity, making the effect of any one intervention difficult to measure. Moreover, for some PHEPR practices, it may be that there is no true effect that is replicable, as effects may be inextricable from the contexts in which a practice is implemented (Walshe, 2007). This way of thinking is a departure from most EBM, which assumes there is an underlying true effect of measurable size. In such cases, traditional evidence evaluation frameworks based on a positivist approach may not be well suited to addressing the review question(s) at hand. Questions about when and in what circumstances such practices as activating public health emergency operations is effective, for example, may be better assessed using the realist approach described above.
The PHEPR system draws on a wide range of evidence types, from RCTs to after action reports (AARs),6 and the approach to evaluating the evidence needs to reflect that diversity. In addition to research-based evidence, both quantitative and qualitative, it is important for the approach to make use of experiential evidence from past response scenarios, which offers the potential for validation of research findings in practice settings, as well as improved understanding of context effects, trade-offs, and the range of implementation approaches or components for a given practice.
6 AARs are documents created by public health authorities and other response organizations following an emergency or exercise, primarily for the purposes of quality improvement (Savoia et al., 2012). They contain narrative descriptions of what was done, but may also contain “lessons learned” (i.e., what was perceived to work well and not well) and recommendations for future responses.
Finally, public health interventions often lie at the intersection of science, policy, and politics, which means that decision-making processes around implementation need to reflect not only scientific evidence but also information related to social and legal norms, ethical values, and variable individual and community preferences. Accordingly, any systematic review of the evidence necessary to make informed decisions related to PHEPR needs also to include an explicit assessment of underlying ethical, legal, and social considerations.
To inform its methodology, the committee began by reviewing existing frameworks for evaluating different sources and types of evidence, both in health care and in other areas in which experimental clinical trials may be impossible or impractical (such as aviation safety), to determine their potential to accommodate the diverse PHEPR evidence base and questions of interest to PHEPR stakeholders. These existing frameworks are described below.
The charge to the committee specified that in developing its methodology, the committee should draw on accepted scientific approaches and existing models for synthesizing and assessing the strength of evidence. Thus, the committee reviewed the published literature and held a 1-day public workshop on evidence evaluation frameworks used in health and nonhealth fields. (This workshop is reported separately in a Proceedings of a Workshop—in Brief [see Appendix E].7) During the public workshop, the committee also heard from experts on how evidence is assessed in other areas of policy, such as transportation safety and aerospace medicine, where making decisions about cause and effect is crucial for safety but conducting randomized trials, or even concurrently controlled experimental studies, is in most cases impractical. The models for evidence evaluation reviewed and considered by the committee are summarized in Table 3-1.8
For each approach, the committee identified some aspects relevant to a framework for PHEPR evidence evaluation. For example, the framework used by the What Works Clearinghouse (WWC) practice guides includes a mechanism for drawing on the real-world experience of experts to inform recommendations while making clear the limitations of such evidence (WWC, 2020), which the committee thought would be applicable to evaluating evidence from AARs and integrating PHEPR practitioner input. Additionally, the user-oriented presentation of information in the WWC practice guides and the inclusion of implementation guidance was of interest given practitioners’ emphasis on the importance of translation and implementation issues in PHEPR. For these reasons, the committee also carefully considered the Clearinghouse for Labor Evaluation and Research (CLEAR) approach to evaluating implementation studies.
The committee considered the causal chain of evidence approach, which employs analytic frameworks and is used by several groups, including the U.S. Preventive Services Task Force (USPSTF), the Community Preventive Services Task Force (CPSTF), and the Evaluation of Genomic Applications in Practice and Prevention (EGAPP), to be particularly relevant to PHEPR, as it was expected that there would be few, if any, studies that would provide direct evidence demonstrating the effect of a PHEPR practice on morbidity or mortality following a public health emergency. Instead, in most cases, evidence from across a chain
8 This table is not intended to serve as an exhaustive list of all existing evidence evaluation frameworks. The committee’s objective was not to review every published framework but to understand the breadth of approaches in use across diverse fields and their potential application to PHEPR.
|Field||Evaluation Framework or Approach||Brief Description|
|Education||What Works Clearinghouse (WWC)||The Institute of Education Sciences founded the WWC to provide consistent methods for evaluating interventions, policies, and programs in education. The WWC has published standards, which vary by experimental design, for studies used to determine the strength of evidence.* The WWC publishes two kinds of products: intervention reports, which evaluate the effectiveness of an intervention based on studies that meet the WWC standards, and practice guides. The latter, which draw on expert input in addition to published evidence, are designed to serve as user-friendly guides for educators and provide recommendations on effective education practices, as well as implementation guidance (WWC, 2017a,b).|
|Labor||Clearinghouse for Labor Evaluation and Research (CLEAR)||The U.S. Department of Labor’s clearinghouse adopted and adapted the WWC’s methods to summarize research on topics relevant to labor, such as apprenticeships, workplace discrimination prevention, and employment strategies for low-income adults. Findings from the evidence reviews are made accessible through the agency’s clearinghouse to inform decision making. Individual studies are reviewed and assigned a rating for the strength of causal evidence. Synthesis reports evaluate the body of evidence from only those studies within a given topic area that achieved high or moderate causal evidence ratings and do not make recommendations (CLEAR, 2014, 2015).|
|Transportation||Countermeasures That Work||The National Highway Traffic Safety Administration publishes Countermeasures That Work periodically to inform state highway safety officials and help them select evidence-based countermeasures for traffic safety problems, such as interventions to reduce alcohol-impaired driving. The guide, which does not use a transparent evidence evaluation framework, reports on effectiveness, cost, how widely a countermeasure has been adopted, and how long it takes to implement (Richard et al., 2018).|
|National Transportation Safety Board (NTSB) Accident Reports||NTSB investigates aviation and other transportation accidents, reaching conclusions about causes and making safety recommendations, which are detailed in its accident reports (NTSB, 2020). Investigators identify probable causes and make recommendations based on mechanistic reasoning (e.g., theories of action based on knowledge regarding physics or chemical properties of materials), modeling, logic, expert opinion, and after action reporting from those involved in an incident.|
|Aerospace Medicine||National Aeronautics and Space Administration (NASA) Integrated Medical Model||NASA needs to predict and prepare for health issues that arise in space, but conducting experimental studies in this area is often infeasible for a number of logistical and ethical reasons. To overcome that barrier, empirical evidence from past experiences with space travel is integrated with a variety of other evidence sources, including longitudinal studies of astronaut health, evidence from analogous contexts (e.g., submarines), and expert opinion, in a complex simulation model that informs decision making. Each parameter in the model may be adjusted, which allows for analysis of a wide range of decisions (Minard et al., 2011).|
|Field||Evaluation Framework or Approach||Brief Description|
|Health Care and Public Health||U.S. Preventive Services Task Force (USPSTF)||The Agency for Healthcare Research and Quality convenes USPSTF to review evidence and make recommendations on evidence-based practices for clinical preventive services (e.g., screening tests, preventive medications). USPSTF draws on evidence summaries from systematic reviews, which are conducted by evidence-based practice centers, to determine the effectiveness of a service based on the balance of potential benefits and harms. Graded recommendations are made based on the certainty of net benefit (USPSTF, 2015).|
|Community Preventive Services Task Force (CPSTF)||CPSTF is a Centers for Disease Control and Prevention (CDC)supported task force that reviews the evidence base for community preventive services and programs aimed at improving population health. Its findings and recommendations are published in The Guide to Community Preventive Services (The Community Guide). CPSTF developed its own methodology for evaluating and assessing the quality of individual studies and bodies of evidence. Because randomized controlled trials (RCTs) are often difficult to conduct for public health interventions, The Community Guide does not automatically downgrade the strength of evidence from non-RCT designs, but considers the suitability of the study design and the quality of execution for each study included in the body of evidence. CPSTF also considers the applicability of the evidence (e.g., to different populations and settings) in developing its recommendations (Briss et al., 2000; Zaza et al., 2000b).|
|Grading of Recommendations Assessment, Development and Evaluation (GRADE) and GRADE-Confidence in the Evidence from Reviews of Qualitative Research (GRADE-CERQual)||GRADE is a method used to evaluate bodies of evidence to assess the certainty of the evidence (COE), up- and/or downgrading COE based on eight defined domains. In contrast to most other frameworks, GRADE does not set explicit quality standards for study inclusion, but instead adjusts the COE based on the quality and risk of bias of studies included in the analysis. GRADE also utilizes an Evidence to Decision framework for making transparent, evidence-based recommendations in the form of guidelines, considering evidence beyond that related to effect (e.g., feasibility, acceptability). Many international review and guideline groups use GRADE, and the methods are continually updated. Recently, GRADE was adapted for the assessment of qualitative evidence (GRADE-CERQual) (Guyatt et al., 2011a; Lewin et al., 2015).|
|Evaluation of Genomic Applications in Practice and Prevention (EGAPP)||EGAPP, a CDC initiative, published guidelines on evidence-based processes for genetic testing and implementation in clinical practice. To generate an overall strength-of-evidence rating for a body of evidence, the EGAPP methods use different hierarchies of data sources and study designs for three distinct components of the evaluation (analytic validity, clinical validity, and clinical utility), thereby explicitly linking different evidence types to questions they are well suited to answering. EGAPP methods also consider the ethical, legal, and social implications of the genetic tests (Teutsch et al., 2009).|
* While the committee uses the term “certainty of the evidence” throughout the report, some frameworks report on “strength of evidence.” The summaries in this table reflect the specific terminology used in each framework.
of intermediate outcomes would need to be linked together to reach health and other downstream outcomes. Analytic frameworks (examples of which can be found in Chapters 4–7) depict the hypothesized links between an intervention/practice and intermediate and health or other final outcomes. They also provide a conceptual approach for evaluating interventions, guiding the search and analysis of evidence (Briss et al., 2000).
The committee considered the framework developed and continually updated by the Grading of Recommendations Assessment, Development and Evaluation (GRADE) group to be most applicable to those kinds of PHEPR practices for which a biomedical focus is most relevant. Examples of such practices include quarantine or the use of potassium iodide for radiological incidents. The committee’s approach was also informed by a 2018 WHO report containing guidelines for emergency risk communication, which provided a timely example of how GRADE might be adapted and used in conjunction with GRADE-Confidence in the Evidence from Reviews of Qualitative Research (GRADE-CERQual) to evaluate evidence and develop recommendations on a wider range of PHEPR practices (WHO, 2018). The 2018 WHO report presents a model for synthesizing and grading evidence from quantitative and qualitative research studies, and includes guidance on inclusion of such other evidence streams as case reports and gray literature reports with similarity to AARs (e.g., governmental and nongovernmental reports containing lessons learned and improvement plans).
Although the National Transportation Safety Board (NTSB) does not rely on explicit evidence evaluation frameworks, the committee believed that organization’s use of mechanistic evidence to determine the cause of an aviation disaster was relevant to the evaluation of evidence to support decision making in PHEPR. NTSB’s investigation into the cause of the midair explosion of TWA Flight 800 illustrates that process. For example, an examination of the direction in which metal from the fuselage was bent and knowledge of the physics of explosions contributed to a conclusion that the explosion happened within the plane (rather than originating outside the plane, as in the case of a missile) (Marcus, 2018). This conclusion did not depend on a hypothesis-testing study with statistical tests for differences between what was observed and an alternative. This same kind of reasoning has been used to explain why one can have confidence that parachutes are better than uninhibited free fall when jumping out of a plane: it is known from physics that the rate of descent of an object dropped from the sky is slowed by the drag resistance of air, and that a parachute increases that drag such that with a big enough parachute, the descent of a 200-pound man can be slowed sufficiently for him to survive the fall.
For the purposes of this report, the committee defined mechanistic evidence as evidence that denotes relationships for which causality has been established—generally within other scientific fields, such as chemistry, biology, economics, and physics (e.g., the accelerating effect of the gravitational attraction of Earth and the slowing effect of air resistance)—and that can reasonably be applied to the PHEPR context through mechanistic reasoning, defined in turn as “the inference from mechanisms to claims that an intervention produced” an outcome (Howick et al., 2010, p. 434). For some interventions, such as the placement of auxiliary power units in hospitals at heights above expected water levels in the event of flooding, mechanistic evidence may be a significant contributor to decision making. Such evidence has not traditionally been incorporated into evidence evaluation frameworks, although processes for integrating biological mechanisms with more traditional evidence sources (e.g., data from clinical trials or epidemiological studies) have been developed (Goodman and Gerson, 2013; Rooney et al., 2014) and applied, for example, in systematic reviews of the toxicological effects of exposures (NASEM, 2017). The use of mechanistic evidence, however, can be seen as incorporating principles of a realist approach to evidence synthesis (discussed earlier in this chapter), for which it is established practice to develop theories of how an
intervention works and to use diverse types of evidence to explore interactions among context, mechanisms, and outcomes to better understand causal pathways.
Also of interest to the committee was the National Aeronautics and Space Administration’s (NASA’s) use of modeling to understand the trade-offs among different decisions constrained by the weight and volume limitations of a space capsule. Decision models such as the Integrated Medical Model used by NASA may have utility for considering practices, such as quarantine, for which the consequences of trade-offs can be modeled in advance of having to make decisions during emergencies. Although the development of such models was beyond the scope of this study, methods for integrating evidence from existing model-based analyses with empirical evidence were examined. It should be noted, however, that this is a nascent area of methodological development (CDC, 2018a; USPSTF, 2016).
From its review of the literature and discussions with experts, the committee concluded that none of the evidence evaluation frameworks it reviewed were sufficiently flexible, by themselves, to be universally applicable to all the questions of interest to PHEPR practitioners and researchers without adaptation. Furthermore, no one framework was ideally suited to the context-sensitive nature of PHEPR practices and the diversity of evidence types and outcomes of interest, many of which are at the organizational or systems level and thus often difficult to measure. Therefore, the committee developed a mixed-method synthesis methodology9 that draws on (and in some cases adapts) those elements of existing frameworks and approaches that the committee concluded were most applicable to PHEPR. As a starting point, the committee adopted the analytic frameworks from CPSTF and USPSTF and the GRADE evidence evaluation and Evidence to Decision (EtD) frameworks (see Box 3-1), while allowing sufficient flexibility to bring in other evidence types (e.g., mechanistic, experience-based, and qualitative) that are not accommodated by the traditional GRADE approach to the assessment of certainty in quantitative evidence. This approach allowed the committee to use the appropriate methodology to answer different types of questions of interest to PHEPR stakeholders. The development of this methodology and its application to the evaluation of evidence for four exemplar PHEPR review topics were undertaken in parallel using a highly iterative process, the steps of which are described in the sections below.
This section outlines the key elements and approaches of the methodology developed and applied by the committee for reviewing and evaluating PHEPR evidence to inform decision making (summarized in Box 3-2). This description is intended to inform future PHEPR evidence reviews and to serve as a foundation for future improvements and modifications to the PHEPR review methodology needed to promote its long-term sustainability.
The sections below briefly describe the committee’s approach to
- formulating the scope of the review and searching the literature,
- synthesizing and assessing the certainty of the evidence, and
- formulating the practice recommendations and implementation guidance.
To allow for a more comprehensive description of the committee’s processes for synthesizing the evidence, grading the evidence, and developing recommendations in this chapter,
the relatively standard steps of the systematic review process (formulating the scope of the reviews, searching the literature, inclusion and exclusion, and quality assessment) are only briefly mentioned herein but are described in more detail in Appendix A.
The committee was charged with developing and applying criteria for the selection of PHEPR practices on which it could apply its systematic review methodology to assess the evidence of effectiveness. Rather than a sequential approach that would involve developing the evidence review and evaluation methodology in the abstract and then applying it to the PHEPR practices selected for review, the committee judged that it would be more fruitful to develop the methodology and test it on the selected PHEPR topics simultaneously. This approach was intended to result in a methodology that would be applicable across a range of different practices for which the evidence base would be expected to differ in nature. As a first step, the committee needed to select a set of review topics that would be illustrative of the diversity of PHEPR practices.
Consistent with its charge, the committee started its topic selection process with a list of the Centers for Disease Control and Prevention’s (CDC’s) 15 PHEPR Capabilities (CDC, 2018b) and developed criteria for prioritizing the Capabilities to select specific PHEPR practices. In considering its selection criteria, the committee sought to select test cases that would capture the expected diversity of the evidence base for various PHEPR practices result-
ing from different research and evaluation methodologies, as well as variability in practice characteristics. Such characteristics were defined as classification dimensions and included, for example, the type and scope of event in which a practice is implemented, the practice setting, whether the practice is complex or simple, whether it is under the direct purview of public health agencies, and whether it is preparedness or response oriented. The committee applied the classification dimensions to each PHEPR Capability to identify a set of Capabilities that were diverse with respect to those variables (see Figure 3-1). In addition to such diversity, the committee considered as criteria for selection of review topics the current needs for evidence-based guidance among key stakeholders, the potential of the review to change practice, and the relevance of a topic to national health security.10 The committee engaged with stakeholders (PHEPR practitioners and policy makers) to inform topic selection and referred to published literature that identifies practitioners’ research needs. Applying this approach, the committee, in consultation with PHEPR practitioners, selected the following four practices as topics for review:
- engaging with and training community-based partners to improve the outcomes of at-risk populations after public health emergencies (falls under Capability 1, Community Preparedness);
- activating a public health emergency operations center (Capability 3, Emergency Operations Coordination [EOC]);
- communicating public health alerts and guidance with technical audiences during a public health emergency (Capability 6, Information Sharing); and
- implementing quarantine to reduce or stop the spread of a contagious disease (Capability 11, Non-Pharmaceutical Interventions).
This chapter describes the application of the committee’s evidence review and evaluation methodology to these four review topics; the details of the review findings for each topic are presented in Chapters 4–7.
The next steps, standard practice for most systematic reviews and described in more detail in Appendix A, included the development of analytic frameworks and the identification of key questions11 for each topic area to further define the scope of the reviews; the development and execution of a comprehensive search of the peer-reviewed and gray literature; and the screening of titles, abstracts, and full-text articles by two reviewers to identify articles meeting the committee’s inclusion criteria. Of note, determining the eligibility of studies required iterative discussions as the review methods, the scope of the four topics, and the outcomes used to assess effectiveness were refined over time. The analytic frameworks and the key questions were reviewed and informed by a panel of PHEPR practitioners serving as consultants to the committee (the processes for appointing the panel of PHEPR practitioner consultants and for developing the analytic frameworks and key questions are described in Appendix A).
10 As noted earlier in this report, the review topics were selected prior to the COVID-19 pandemic.
11 Key questions define the objective of an evidence review. In some guideline development processes (e.g., that of USPSTF), each linkage depicted on the analytic framework (between intervention and outcome or between two outcomes) is represented with a separate key question. The committee did not develop separate key questions for linkage in the analytic frameworks, but instead defined an overarching review question that guided the review process and sub-questions of interest generally related to benefits and harms, as well as barriers and facilitators.
To maximize the efficiency of the evidence review and evaluation process for each review topic, different component steps, described in the sections that follow, were commissioned to outside groups and individuals with the appropriate expertise. The initial classification of studies and the data abstraction and quality assessment for quantitative studies (except modeling studies) were performed by the Center for Evidence Synthesis in Health, an AHRQ-funded evidence-based practice center (EPC) at Brown University. The quality assessment and synthesis of qualitative studies were conducted by a commissioned team at Wayne State University. The evaluation and synthesis of selected modeling studies were performed by a modeling expert at Stanford University, and the evaluation and synthesis of AARs and case reports were conducted by a PHEPR expert in evaluation at Columbia University.
Classification of Studies into Methodological Streams
An overview of the evidence classification process is presented in Figure 3-2, from the point where the studies for inclusion had been identified. The evidence for the four PHEPR test cases was classified into the following categories: quantitative studies, qualitative studies, mixed-method studies, and case reports and AARs. These categories were defined by the methods employed rather than the subject of investigation, and thus encompassed the full range of evaluative studies (e.g., systems research and quality improvement studies in addition to more traditional impact studies). Mixed-method studies could be used in both quantitative and qualitative evidence syntheses, as depicted in Figure 3-2. The committee determined that no single method could be applied across these different types of evidence, and therefore describes later in this chapter separate processes for evaluating the quality and strength of each type.
Quantitative studies included articles and reports with quantitative results from the evaluation of a PHEPR practice. This included quantitative comparative studies, for which there was an explicit comparison of two or more groups (or one group at two or more time points) to assess whether they were similar or different, usually with a statistical test, as well as quantitative noncomparative studies (i.e., studies that provided only postintervention results, such as posttraining knowledge scores). Modeling studies were treated as a subset of quantitative studies, as were surveys, which were further classified on the basis of the method and questions asked. Surveys that did not include an evaluation of a practice during or following a public health emergency were categorized as descriptive surveys and were not used in the evaluation of effectiveness (but could be used, for example, to populate the EtD framework or to inform implementation considerations). Modeling studies were identified only for the evidence review on quarantine. Given the diversity of purposes of the modeling studies captured in the review, the committee opted to perform an in-depth assessment of a selected group of models judged to be highly relevant. Twelve modeling studies were selected for detailed analysis based on an assessment of their modeling techniques, data sources, relevance to key review questions, potential implications for public health practice, and disease condition studied. Following a review and assessment of the selected models (described below), a commissioned modeling expert conducted a narrative synthesis of the findings of the models, with attention to common results and themes related to the circumstances in which quarantine was effective. Many other modeling studies have been conducted and may have important findings relevant to the use of quarantine; however, a detailed analysis of a representative subset was pursued based on the resources available for the study.
Studies were classified as qualitative if they explicitly described the use of qualitative research methods, such as interviews, focus groups, or ethnographic research, and used an accepted method for qualitative analysis (Miles et al., 2014). If studies did not report the application of qualitative research methods but nonetheless collected some qualitative data, they were generally classified as case reports or AARs, depending on the context in which the data were collected (described below). In the classification process, studies were identified that contained a qualitative analysis of free-text responses to a survey. Such studies were not classified as qualitative research studies, but their findings were extracted and considered separately in the qualitative evidence synthesis to affirm or question the findings of the more complete qualitative studies.
AARs and case reports
The committee sought to include a synthesis of AARs for two of its reviews (the EOC and Information Sharing Capability test cases) as an exercise in gauging the potential value of this evidence source to reviews of PHEPR practices. Case reports,12 which included program evaluations and other narrative reports describing the design and/or implementation of a practice or program (generally in practice settings), usually with lessons learned, were grouped with AARs because of similarity of methods and intent. A synthesis of case reports was conducted and included in the evidence reviews for all four test cases. Of note, commentaries and editorials were not included as case reports; such articles were excluded in the committee’s bibliographic database search and during the screening process (see Appendix A). Some case reports and AARs reported quantitative (e.g., from surveys) and/or qualitative (e.g., from interviews or focus groups) data, but such data were not collected in the context of research, and there generally was little to no description of the methods by which the data were collected. While the distinction from quantitative and qualitative studies was considered necessary for the committee’s reviews given the current limitations of these two sources, should their methods and reporting be strengthened, they could conceivably be combined with quantitative or qualitative studies in the future.
Data Extraction and Quality Assessment for Individual Studies
After the included studies13 had been sorted into one of the categories described above, individual studies were extracted and assessed for their risk of bias and/or other aspects of study quality, as described below. The full list of data extraction elements is included in Appendix A. For most studies of PHEPR practices, details about the practice itself, the context, and the implementation are necessary, and thus the committee selected for extraction some elements from the Template for Intervention Description and Replication checklist (Hoffman et al., 2014).
The quality assessment approach was determined based on study design. Many standardized tools for assessing quality or risk of bias are available, each with its own merits and shortcomings, and new tools continue to be developed. Described here is the approach taken by the committee and the groups commissioned to assess study quality and risk of bias; however, different tools and methods could reasonably be applied in future PHEPR evidence reviews. Studies were not excluded based on an assessment of the risk of bias or
12 The synthesis of case reports, as described later in this chapter, is distinct from case study research, which is an established qualitative form of inquiry by which an issue or phenomenon is analyzed within its context so as to gain a better understanding of the issue from the perspective of participants (Harrison et al., 2017).
13 The term “study” is used broadly here to include research studies and reports that may be descriptive in nature (e.g., AARs, case reports, program evaluations).
of methodological limitations, but this information instead was considered in the assessment of certainty for the body of evidence.
For quantitative impact studies, the Brown University EPC developed an assessment tool by drawing selected risk-of-bias domains from existing tools, including the Cochrane Risk of Bias version 2.0 tool (Sterne et al., 2019), Cochrane’s suggested risk-of-bias criteria for Effective Practice and Organisation of Care reviews (Cochrane, 2017), and the Cochrane Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) tool (Sterne et al., 2016). The Brown University EPC developed and applied a separate tool for the assessment of descriptive surveys, drawing on published methods (Bennett et al., 2010; Davids and Roman, 2014). For qualitative studies, methodological limitations were assessed using the Critical Appraisal Skills Programme qualitative tool (CASP, 2018). Additional detail on these tools and their use in quality assessment of individual quantitative and qualitative research studies is provided in Appendix A.
An expert in modeling methodology assessed the selected group of quarantine modeling studies in detail, including the specific model structures/equations and how the interventions were instantiated within these structures/equations. This assessment was intended to determine whether assumptions encoded in such structures/equations could plausibly have had a strong impact on the results reported in the studies. Likewise, a careful reading of the methods section of each paper was focused on extracting explicitly documented assumptions, as well as other implicit assumptions based on methodological decisions (e.g., no change in mixing rates as the epidemic grows because of such processes as social distancing, perfect versus imperfect case finding to be eligible for quarantine, asymptomatic transmission).
Descriptive case reports do not fit any specific analytic study design and generally report few details concerning methods, and thus are not amenable to quality assessment using tools designed for research studies. Case reports and AARs were categorized as “high” or “low” priority using the significance criterion of the AACODS (authority, accuracy, coverage, objectivity, data, significance) checklist (Tyndall, 2010), an evaluation tool used in the critical appraisal of gray literature sources. This process mirrored the general principles of the approach outlined by Cochrane for selecting qualitative studies for syntheses when a large pool of sources needs to be reduced to a manageable sample amenable to synthesis that is most likely to address the review questions (Noyes et al., 2018). Rigor was not required as a sorting criterion because the primary purpose was to synthesize experiential data to add weight to findings from research studies, provide a different perspective from that of research studies, or provide the only available perspective concerning the specific phenomena of interest. An appraisal tool for evaluating the methodological rigor of AARs published in 2019 (ECDC, 2018) was applied by the commissioned PHEPR expert to the AARs included in the committee’s analyses (the tool’s criteria are described in Appendix A). While the results of this analysis informed the committee’s recommendations on improving the future evidentiary value of AARs (see Chapter 8), the appraisal tool was not useful in selecting reports to include in the synthesis of AARs and case reports because of the generally low scores for the majority of reports captured in the search. With improvements in the methodological rigor of AARs, however, such tools could be helpful in selecting high-quality AARs for inclusion in future evidence reviews.
Assessment of the Certainty and Confidence in Synthesized Quantitative and Qualitative Findings
After individual studies had been assessed for their quality and risk of bias, the next step was synthesizing and assessing the COE (or confidence in the case of qualitative evidence)
across the body of evidence, specific to each key question, outcome, or phenomenon of interest identified in the analytic framework.
Initially, certainty of the evidence (for synthesized quantitative impact findings) and confidence in the synthesized findings from qualitative bodies of evidence were assessed separately using the GRADE and GRADE-CERQual frameworks, respectively, as discussed below. Subsequently, the coherence of evidence from across methodological streams (including evidence from cross-sectional surveys that evaluated practices,14 modeling studies, mechanistic evidence,15 qualitative studies, case reports, and AARs) was considered in developing summary findings for each key question. The committee employed two similar but distinct processes to integrate evidence from across methodological streams—one for assessing evidence of effectiveness, and the other for populating the EtD framework and developing implementation guidance. For evaluating evidence of effectiveness, the coherence of evidence from other streams was considered in rating the COE for each outcome.
Quantitative evidence synthesis and grading
For each of the test cases, the committee first assessed the body of quantitative impact studies using the GRADE approach (see Box 3-2 earlier in this chapter for a description of the GRADE assessment domains). The committee determined that a quantitative meta-analysis was neither feasible nor warranted based on the expected context sensitivity of the PHEPR practices. Thus, the committee undertook a synthesis without meta-analysis (Campbell et al., 2020) to draw conclusions regarding effect direction. These conclusions represented global judgments based on the number, size, and methodologic strengths of the individual studies, as well as the consistency of the results. If study authors performed statistical testing of a hypothesis, the committee considered the results of such testing when drawing its conclusions about the directionality of effect. However, statistical testing was neither a necessary nor a sufficient condition for drawing these conclusions. The committee did not prespecify a minimum meaningful effect size, as it was generally unclear what would be considered meaningful for the diverse set of outcomes examined by the committee. This poses a challenge for interpreting the importance of an intervention and represents an area for future development.
Existing guidance on the application of GRADE to a narrative synthesis (Murad et al., 2017) was followed to evaluate the certainty that a practice was effective for a given outcome. Consistent with the GRADE methodology, bodies of evidence that included RCTs started at high COE, which was downgraded as appropriate based on the committee’s judgment regarding risk of bias, indirectness, inconsistency, imprecision, and publication bias. Bodies of evidence that comprised only nonrandomized studies started at low COE and could be further downgraded or upgraded.16 Modeling studies were not included in the bodies of evidence assessed with the GRADE domains, but were considered in the COE determination as discussed later in this chapter.
14 There was no synthesis of noncomparative, descriptive surveys.
15 As discussed earlier in this chapter, the committee defined mechanistic evidence as evidence that denotes relationships for which causality has been established—generally within other scientific fields, such as chemistry, biology, economics, and physics—and that can reasonably be applied to the PHEPR context through mechanistic reasoning, defined in turn as “the inference from mechanisms to claims that an intervention produced” an outcome (Howick et al., 2010, p. 434).
16 According to GRADE, bodies of evidence comprising nonrandomized studies that were assessed with ROBINS-I (Sterne et al., 2016) could start as high COE, but would then generally be rated down by default by two levels because of risk of bias unless there was a clear reason for not downgrading (Schünemann et al., 2018). However, because ROBINS-I was not used for the quality assessment of nonrandomized studies per se (although domains from the ROBINS-I tool were considered by the Brown EPC in developing its quality assessment tool), the committee started bodies of evidence that comprised only nonrandomized studies at low COE.
|High||We are very confident that, in some contexts, there are important effects (benefits or harms). Further research is very unlikely to change our conclusion.|
|Moderate||We are moderately confident that, in some contexts, there are important effects, but there is a possibility that there is no effect. Further research is likely to have an important impact on our confidence and could alter the conclusion.|
|Low||Our confidence that there are important effects is limited. Further research is very likely to have an important impact on our confidence and is likely to change the conclusion.|
|Very Low||We do not know whether the intervention has an important effect.|
Table 3-2 defines the four levels of the COE used in the committee’s evidence reviews. Of note, the differences among the levels are not quantitative, and there is no algorithm or set of rules for determining the COE (e.g., based on the number and quality of included studies). As with other systematic review and guideline development processes, the assessment of the COE is based on the judgment of the evaluators. In some cases, a single high-quality study may provide a high COE, while in others, having multiple RCTs with consistent effects could yield a lower COE (e.g., because of indirectness). Transparency is key so that the rationale for up- and/or downgrading decisions and the ultimate COE rating are clear. While this judgment-based approach allows the evaluators flexibility in the COE determination process, a potential limitation is poor interrater reliability (i.e., others could arrive at different judgments given the same set of evidence).
Two operational decisions made by the committee regarding the GRADE process warrant additional explanation. First, for those key questions and outcomes for which the only serious limitation was in the imprecision domain and the evidence came from a single, nonrandomized study of modest size, the committee considered the upgrading domains, in particular the domain for large effect size. The second decision relates to upgrading for nonrandomized studies based on large effect size.17 The quantitative evidence was rated for risk of bias by the Brown University EPC, and based on these ratings, an overall assessment of quality was made using a “good/moderate/poor” set of categories. Studies rated as good quality were considered to have no serious limitations for the risk-of-bias domain in GRADE, whereas those rated as poor quality were considered to have serious or very serious limitations for this domain. For studies that were rated by the EPC as having “moderate” risk of bias and had a large effect size, the committee asked the EPC to assess whether the factors contributing to the “moderate” risk-of-bias rating were likely or unlikely to be responsible for the large effect size. For those cases in which the EPC judged this to be likely, the committee did not upgrade the COE based on the large effect size. For those cases in which the EPC judged this to be unlikely, the committee considered whether to upgrade the COE based on the large effect size.
Qualitative evidence synthesis and grading
For the qualitative evidence synthesis, the primary studies were uploaded into Atlas.ti (Version 8.1, Atlas.ti Scientific Software Development GmbH, Berlin, Germany), and the key findings and supporting information from
17 Consistent with GRADE guidelines on rating the COE (Guyatt et al., 2011b), the committee upgraded for large effect when nonrandomized studies showed at least a two-fold increase or decrease in relative risk (or other measure of effect size) associated with implementation of a PHEPR practice.
each study were extracted in the form of key phrases, sentences, and direct quotations. This approach allowed researchers to identify and note evidence that mapped onto the phenomena of interest. The specific phenomena of interest were prespecified as questions around what happened when the practice was implemented, what was perceived to work, and what was perceived not to work. The EtD domains (e.g., acceptability, feasibility, equity) were also phenomena of interest for the qualitative evidence synthesis.
The Wayne State University team conducted the extraction and used the pragmatic framework synthesis method (Barnett-Page and Thomas, 2009; Pope et al., 2000), which employs an iterative deductive and inductive process to analyze and synthesize the findings. Framework synthesis is a matrix-based method that involves the a priori construction of index codes and thematic categories into which data can be coded. The method allows
- themes identified a priori to be specified as coding categories from the start,
- application of an a priori theoretical framework or logic model to inform the development of index codes and themes,
- incorporation of researcher experience and background literature and expert opinion, and
- combining with other themes emerging de novo by subjecting the data to inductive analysis.
A five-step process was used for the synthesis: (1) familiarization to create a priori descriptive codes and codebook development, (2) first-level in vivo coding18 using descriptive codes, (3) second-level coding into descriptive themes (families of descriptive codes), (4) analytic theming (interpretive grouping of descriptive themes), and (5) charting/mapping and interpretation (the authors’ more detailed description of each of these steps is provided in Box 3-3). A lead author from the two-person Wayne State University team was assigned for each review topic and was responsible for the synthesis of findings, which were developed through ongoing discussions with the other Wayne State team member and the committee.
GRADE-CERQual was used to assess the confidence in synthesized qualitative findings (analytic and some descriptive themes). CERQual provides a systematic and transparent framework for assessing confidence in individual review findings, based on consideration of four components:
- methodological limitations—the extent to which there are concerns about the design or conduct of the primary studies that contributed evidence to an individual review finding;
- coherence—an assessment of how clear and compelling the fit is between the data from the primary studies and a review finding that synthesizes those data;
- adequacy of data—an overall determination of the degree of richness and quantity of data supporting a review finding; and
- relevance—the extent to which the body of evidence from the primary studies supporting a review finding is applicable to the context (perspective or population, phenomenon of interest, setting) specified in the review question (Lewin et al., 2018).
Based on these ratings, each synthesized finding was then assigned an overall assessment as follows:
- High confidence—It is highly likely that the finding is a representation of the phenomenon.
- Moderate confidence—It is likely that the finding is a representation of the phenomenon.
- Low confidence—It is possible that the finding is a representation of the phenomenon.
- Very low confidence—It was not clear whether the finding is a representation of the phenomenon.
Confidence in the synthesized findings was assessed by the lead author for that review topic. The second author reviewed the assessments, queried the lead author for additional information, and offered suggestions. The discussion culminated in the final assessment of confidence.
Synthesis of evidence from case reports and AARs
For the framework synthesis of findings from case reports and AARs, report characteristics (e.g., type of event, type of report, location) were extracted from the reports, which were then coded using a codebook developed based on the key areas of interest and adapted from the codebook used for the qualitative evidence synthesis (see Box 3-3) to facilitate alignment between the two evidence streams when feasible. Although case reports and AARs were analyzed jointly, findings were considered by report type to assess for any differences. No assessment of the confidence in the findings from the synthesis of case reports and AARs was conducted.
Integration of Effectiveness Evidence from Across Methodological Streams
As noted earlier in this chapter and depicted in Figure 3-3, the committee took a pragmatic, layering approach to synthesizing and grading the full body of evidence to determine the COE for the effectiveness of a given PHEPR practice and to inform practice recommendations. After evaluating the body of evidence from quantitative impact studies using the GRADE domains to determine the initial COE for each outcome of interest, the committee reviewed and considered the coherence of evidence from other methodological streams, including findings from the qualitative evidence syntheses (generally related to harms) with associated CERQual confidence assessments; findings from the modeling study analysis; quantitative data from individual cross-sectional surveys, case reports, and AARs regarding practice effectiveness in a real public health emergency; mechanistic evidence (defined earlier in this chapter); and parallel evidence. Although the committee did not undertake to do so, findings from a Delphi-type activity or other systematically collected expert evidence (Schünemann et al., 2019) could be brought to bear in grading the overall body of evidence.
“Parallel evidence,” as the committee uses the term in this report, is evidence on the effectiveness of similar practices from outside the PHEPR context. The consideration of supporting evidence from analogy (e.g., similar interventions or analogous contexts) was proposed by Sir Austin Bradford Hill (Hill, 1965) and has been resurrected in more recent discussions on evidence grading (Howick et al., 2009). As PHEPR is a transdisciplinary field, foundational research may have been undertaken by other disciplines (within and outside of public health) for many PHEPR practices. Consequently, it is important to consider for all PHEPR evidence reviews (at the start of the process) whether there is likely an existing body of parallel evidence that should be captured in the review process. For the committee’s review on engaging community-based partners to improve outcomes for at-risk populations, for example, the committee recognized that there would be a much broader but relevant evidence base related to community engagement from outside the PHEPR context. Also important to consider are factors that might contribute to different outcomes when an intervention is applied in the PHEPR context as compared with the context from which the parallel evidence was derived. For example, educational programs aimed at reducing a known health risk (e.g., cardiovascular disease) may be more effective at motivating behavior change relative to similar programs addressing the lower-probability risk of a disaster. Such factors should influence the weight given to parallel evidence in the evidence grading process. Rather than searching for and synthesizing primary studies, it may be more expedient to conduct targeted searches of the literature for existing systematic reviews on the effectiveness of similar interventions from other contexts that could be considered in determining the COE.
Of note, the consideration of parallel evidence is also consistent with the GRADE concept of applying indirect evidence when there is a paucity of direct evidence to inform a recommendation, and including parallel evidence in the body of evidence assessed with GRADE is an alternative approach that could be taken by future PHEPR review groups. However, the COE is downgraded for indirectness in GRADE, which conflicts conceptually with the committee’s view—and that of others who have undertaken similar reviews (Bruce et al., 2014; Movsisyan et al., 2016)—of parallel evidence as a construct that may in some circumstances increase certainty in the effectiveness of an intervention.
Each additional source of evidence was judged to be supportive, very supportive, inconclusive (no conclusion can be drawn regarding coherence because either results are mixed or the data are insufficient), or unsupportive (discordant with the findings from quantitative impact research studies). The distinction between supportive and very supportive evidence was based on the magnitude of the reported effect and the directness of its application to the question and outcome of interest. Mechanistic evidence, which does not lend itself to an assessment of magnitude of effect, was determined to be supportive or very supportive based on the counterfactual (i.e., how likely it is that an alternative explanation accounts for the observed effect that has been attributed to a specified mechanism of action). For example, mechanistic reasoning is applied in the quarantine evidence review discussed in Chapter 7 and Appendix B4. While an observed reduction in disease transmission may reasonably be attributed to quarantine based on its mechanism (i.e., separating individuals at risk of becoming infectious from susceptible populations), other factors (e.g., seasonal effects related to temperature and humidity) may actually be responsible for the reduced spread. In contrast, mechanistic evidence regarding the impact of congregate quarantine was deemed very supportive as there is no good alternative explanation for why infections would increase among those quarantined in the congregate setting. Following discussion by the committee, a global judgment was made as to whether there was sufficient supportive or unsupportive evidence to warrant up- or downgrading the initial COE. Table 3-3 presents the committee’s decision criteria for its four evidence reviews (discussed further in Appendixes B1–B4), but these are not intended as standards. As with the up- and downgrading process using the GRADE domains, the adjustment of the COE up or down one or more levels was not based on an algorithm but on the committee’s judgment. The committee judged its approach to be reasonable because it was not estimating an effect size but drawing conclusions regarding whether an intervention had an important (beneficial or harmful) effect. The results of the evidence grading process, including the committee’s ratings for each of the GRADE domains, the corresponding assessment of COE, and the rationale for further up- or downgrading the COE, were captured in evidence profile tables (see Appendixes B1–B4).
An Evidence to Decision Framework for Formulation of PHEPR Practice Recommendations
The committee reviewed and adapted as necessary the criteria from the Developing and Evaluating Communication Strategies to Support Informed Decisions and Practice Based on Evidence (DECIDE) project (Alonso-Coello et al., 2016) and recently published WHO INTEGRATE (Rehfuess et al., 2019) EtD frameworks to develop a novel EtD framework for the committee’s use in formulating recommendations on evidence-based PHEPR practices. The PHEPR EtD framework comprised the following criteria:
|Certainty of the Evidence (COE) Decision||Committee Criteria|
|No change in COE||Did not upgrade based solely on evidence from case reports, surveys, supportive evidence from modeling evidence, or low-confidence findings from qualitative evidence synthesis. Did not upgrade for supportive parallel evidence when direct evidence (from the PHEPR context) was available that resulted in low or moderate initial COE (see, for example, Table B1-2 in Appendix B1). Did not upgrade if evidence raised concerns about potential harmful or undesirable effects.|
|Upgraded COE one level||Required very supportive mechanistic or modeling evidence or high-confidence findings from qualitative evidence synthesis.|
|Upgraded COE two levels||Required a combination of supportive (or very supportive) findings from mechanistic, modeling, or qualitative evidence (see, for example, Table B4-2 in Appendix B4).|
|Downgraded COE||Although the committee did not encounter this scenario, evidence of harmful or undesirable effects could warrant downgrading the initial COE.|
- balance of benefits and harms,
- acceptability and preferences,
- feasibility and PHEPR system considerations,
- resource and economic considerations,
- equity, and
- ethical considerations.
The PHEPR EtD framework enabled diverse types of evidence (e.g., from quantitative and qualitative studies, surveys, case reports, and AARs) concerning the same phenomenon of interest to be brought together in a single place. To populate this EtD framework, the committee adapted the methods described in the WHO guideline on emergency risk communication (WHO, 2018). For each EtD criterion, findings from within a methodological stream were compared and contrasted with findings from the other methodological streams (while findings were generally synthesized findings from a body of evidence, survey evidence was not synthesized and was incorporated as findings from individual studies). Whenever findings supported each other, they were combined into higher-order findings that represented syntheses across the methodological streams. These points of alignment were noted in EtD evidence summaries. Evidence from research studies was given greater weight than evidence from other sources, which was used to add weight to findings from research studies, provide a different perspective from that of research studies, or provide the only perspective concerning specific phenomena of interest in the absence of research-derived evidence. COE and confidence ratings for the within-stream findings were kept in mind during the evidence integration process, but no attempt was made to generate an overarching (across-stream) COE for the findings related to the EtD elements.
Finally, while each of the above EtD elements can be viewed through an ethics lens (i.e., harms and benefits generally map to the ethical principle of harm reduction/benefit promotion, equity to the principle of justice, and feasibility and resource considerations to the principle of stewardship, and acceptability and preferences will often include consideration of other ethical values), the committee’s description of ethical considerations was not developed from the same body of studies as that from which the other EtD criteria were developed. Though some of these studies include reflections on ethical, legal, and social factors
related to implementation, information about ethical considerations also often comes from essays and reflection pieces that were not explicitly encompassed by the committee’s review process. Rather, the inclusion of ethical considerations as a separate category reflects the committee’s discussion around general ethical principles that can help guide PHEPR practice (see Box 3-4), along with the pragmatic recognition that decision making in PHEPR is often a necessarily political matter as well as a matter of science. As such, explicit consideration of ethical, legal, and social factors has an important role in any discussion of implementation considerations and guideline development processes for PHEPR. Although the committee did not undertake to do so, analyses of regulatory and policy texts, as well as findings from a Delphi activity or other systematically collected expert evidence regarding ethical, legal, and social considerations, could be incorporated into future review processes.
In addition to the EtD criteria, the committee considered the contexts in which the practices had been evaluated and whether any serious evidence gaps with regard to context (i.e., applicability concerns) should influence a recommendation. Contextual factors that were considered and may be relevant to future PHEPR reviews include
- settings (e.g., location, population density, emergency type and scale, public health system governance structure);
- populations (e.g., target study population or affected population, demographics such as race/ethnicity, age, and socioeconomic status); and
- PHEPR practice features (e.g., organization implementing the practice).
Key contextual information reported in the studies themselves should be extracted with other important data elements, but it may be necessary to supplement the literature search to identify other contextual details (Booth et al., 2019). For example, media reports may yield valuable information regarding potential political environment influences on the effectiveness and acceptability of a practice.
For those practices for which there was sufficient evidence demonstrating a beneficial effect, the committee developed a practice recommendation. Although the committee did not encounter circumstances that warranted doing so, a recommendation against a practice could be made if there was sufficient evidence of harm or absence of effect. Other evidence review and guideline groups use multiple recommendation levels (e.g., strong and conditional recommendations), and future groups conducting reviews of PHEPR practices may determine that modifications of the committee’s methodology are needed to accommodate recommendations of different strengths.
Importantly, the committee did not believe there should be a minimum required COE to make a recommendation (although recommendations when the COE for important outcomes is very low are expected to be rare). In fact, recommendations may be most useful to guide practitioners’ decision making when there is a paucity of evidence but some action must be taken, such as in the response to a public health emergency. This is a notable difference from the field of preventive medicine, in which interventions are being applied to otherwise healthy (or at least asymptomatic) individuals who are unlikely to be harmed by inaction. The certainty about the benefits of a practice necessary to make a recommendation is expected to differ depending on the nature of the practice and the expected severity of any potential
undesirable effects. For example, the threshold for organizational practices, such as EOC activation, may be lower than that for practices that pose greater risks to an individual’s health and well-being, such as quarantine. For interventions in the preparedness phase, a higher evidentiary threshold may be warranted given the absence of pressure to act.
When there is sufficient evidence to make a recommendation but there are notable gaps in the evidence base, a practice recommendation may also include statements regarding the need for monitoring and evaluation or rigorous research. In other cases, the body of evidence may not provide enough information about the effectiveness of a practice to serve as the basis for a recommendation. In such cases, a finding of insufficient evidence may be appropriate. Insufficient evidence does not mean that the intervention does not work, but that a determination of whether or not it works cannot be made, indicating a need for additional research into the effectiveness of the intervention or the circumstances under which it might be more or less effective or even harmful. Similarly, the absence of a practice recommendation does not indicate that the practice should not be implemented, particularly in cases in which a practice is a common or standard practice.
The committee developed implementation guidance to accompany its practice recommendations. If a practice is in widespread use, available evidence may not address whether the practice should be conducted, but rather how it should be implemented. In such cases, implementation guidance may still be of value to practitioners in the absence of a practice recommendation. In future reviews, it is conceivable that the review question itself may focus solely on implementation, in which case this guidance may be the only product of the evidence review process. As depicted in the two panels in Figure 3-3, the committee intended its evidence review and grading methodology to allow for a focus on both effectiveness and implementation questions.
The committee included in its key review questions for each of the evidence reviews a question related to the factors that created barriers to and facilitators of implementation of the PHEPR practice, which informed its implementation guidance. To address those review questions, evidence was synthesized using much the same process as that described above for the EtD framework. Findings on barriers or facilitators that aligned across evidence streams were combined into higher-order findings that were featured most prominently in the guidance on facilitating implementation and/or other operational considerations (noting any associated confidence assessments from the qualitative evidence synthesis). Where information useful to practitioners was identified only in a single evidence stream, it was still captured in the guidance, but the absence of support from other evidence sources was noted for transparency. The lack of coherence across evidence streams may result because some types of evidence are more likely to be generated by certain study types or because a finding is novel and has not yet been explored using a multitude of methods. In some cases, implementation guidance may also be drawn from the evidence synthesis used to populate the EtD framework or other evidence sources. In the case of the committee’s quarantine review, for example, implementation guidance was also informed by the modeling study analysis.
The committee undertook to develop an evidence review and evaluation methodology with sufficient flexibility such that it not only could accommodate the diversity of evidence
for the four exemplar PHEPR practices but also could be applied and adapted as needed to support future PHEPR evidence reviews. In this section, the committee reflects on the limitations of its methodology, its experience in developing and applying the methodology, and the implications for future reviews of PHEPR practices. The chapter concludes with the committee’s recommendations for the development of a sustainable process for conducting reviews and generating evidence-based PHEPR guidelines on an ongoing basis and the infrastructure necessary to support that process.
The evidence review and evaluation methodology described in this chapter and applied to the four PHEPR practices discussed in the following chapters represents the culmination of 2 years of methodological development and consensus building through committee discussion. The strengths of that development process include the diverse expertise of the committee members, who represent methodologists (both U.S. and international), PHEPR practitioners, and PHEPR researchers; consultation with outside experts in systematic review methods and guideline development (during both public and closed session discussions); and iterative testing of the methodology on a small but diverse selection of PHEPR practices. A combination of existing and adapted methods and tools was used to synthesize and assess the quality of method-specific streams of evidence that were subsequently, in a process original to this committee, brought together in a single integrated mixed-method synthesis using a logic model as the analytical framework for integration.
At the same time, there are limitations to the process by which the methods and tools were used, adapted, or developed. Although the four practices to which the committee’s methodology was applied were deliberately chosen to represent a spectrum of the types of questions and evidence relevant to PHEPR, new situations are likely to be identified in which additions to or adaptations of these methods are warranted when the methodology is applied to additional practices. Ongoing methodological refinement through iterative testing on additional PHEPR practices is therefore an important next step. Another critical component of the refinement process that is employed by other groups developing methods and tools involves the use of strategies, such as surveys, Delphi processes, and small-group feedback sessions, to gather input from a much broader group of experts (e.g., methodologists, researchers, end users) and organizations in the field (Lewin et al., 2018). The committee’s time constraints and the National Academies’ confidentiality requirements limited the opportunities for extensive solicitation of feedback from the field. However, such solicitation is often conducted in phases when new methods and tools are developed, and the committee’s work should therefore be viewed as the first phase of the development process. The committee anticipates that additional phases of expert review and feedback on its methodology will follow the release of this report. As with any first-time application, the committee expects that with the methodology’s increased use, opportunities to improve it will be identified.
Another limitation relates to the current state of methodological science with regard to the integration of different types of evidence. Mixed-method syntheses like the process adopted by the committee are relatively uncommon in guideline development, and methods for synthesizing evidence from highly diverse study designs (e.g., quantitative, qualitative, case reports, modeling) are still being developed and tested (Noyes et al., 2019). There are accepted methods for assessing the quality of individual quantitative and qualitative studies and for grading the respective bodies of evidence, but methods and tools have not yet been developed for grading findings developed from the integration of quantitative and qualita-
tive evidence. An added challenge for the committee was the lack of existing quality assessment and grading methods for bodies of descriptive surveys and case reports and AARs. Consequently, some of the evidence streams used by the committee were synthesized and graded, while others were not. Other groups have adapted the GRADE and GRADE-CERQual methods for these evidence types (WHO, 2018), but in the absence of methods for integrating the assessments to generate a composite rating, the committee chose not to grade bodies of descriptive surveys and case reports and AARs. Given these gaps in evidence review methods, the committee took a pragmatic approach to integrating the diverse evidence types that were captured in its reviews, as described above. However, as the methodological science behind mixed-method synthesis continues to evolve, it will be important to update the methods. Thus, the methods presented here should not be viewed as the final word in how PHEPR topics should be systematically assessed, but rather the starting point to be built on in future efforts.
The committee recognized from the inception of this study that its evidence review and evaluation methodology needed to be aligned with the questions of interest to PHEPR stakeholders and the nature of the PHEPR evidence base. At the outset, however, the committee had an incomplete picture of what that evidence base looked like. Thus, it was unclear how well existing evidence evaluation frameworks would work, even allowing for adaptation. Accordingly, the committee undertook its work with the mindset of a learning process, allowing flexibility to adapt methods and tools as they were being applied, but also capturing the strengths and limitations of the approach and acknowledging alternatives that may be considered in the future. The committee found it was important when adapting and applying the methods and tools to its specific review questions to have input from those familiar with the subject area and the types of studies and other information available.
The committee’s methodology accommodated a wide range of evidence types, including evidence from RCTs, nonrandomized experimental studies, case reports, modeling studies, and descriptive surveys, as well as mechanistic evidence and parallel evidence from other fields. Although it is common for evidence review groups to exclude studies based on study design or methodological limitations in execution, the committee chose not to set such criteria for inclusion of studies in its review. Instead, it considered the appropriateness of the study design and the quality of execution as they related to the ability to address a specific review question. For example, qualitative research methods were considered superior to quantitative methods for certain tasks, such as describing the lived experiences of people placed under quarantine, or exploring the ways in which multiple factors coalesce or conflict in the minds of decision makers choosing whether to implement an emergency operations center. Because much learning about what works and considerations for implementation accumulates through experience, it was important for the committee’s mixed-method synthesis approach to accommodate experiential evidence, such as case reports and AARs, so as to corroborate research findings in the COE determination and help to explain differences in outcomes in practice settings (e.g., by illustrating differences in feasibility or acceptability across settings). However, integrating evidence from AARs and case reports presented its own challenges as these types of reports rarely include clear outcome measures or clearly elucidated cause-and-effect relationships. Moreover, such evidence, even when derived in accordance with high methodological standards, is subject to higher risk of bias compared with evidence from RCTs. The committee attempted to mitigate these risks by ensuring that the methods used to assess the quality of evidence were suited not just to the type of evidence being reviewed but also to the purpose to which that evidence was to be put, rather than holding every study to the same set of evalu-
ative criteria. For example, the quality threshold for applying evidence to an assessment of acceptability differed from that for assessing effectiveness.
The four review topics selected by the committee as test cases represented complex practices for which diverse types of PHEPR evidence were captured. Each raised different methodological challenges, thus providing an opportunity to test and iteratively expand the range of the committee’s methodology.
The EOC test case, for example, yielded a situation in which no quantitative evidence of effect was found, but other types of evidence provided information that would be useful to practitioners in considering when and in what circumstances to activate public health emergency operations. Recognizing that PHEPR practitioners must make decisions in the face of a public health emergency with the best information available, the committee sought to develop a process that, even in the absence of quantitative evidence of effectiveness, could present such useful information without an accompanying practice recommendation.
The community preparedness test case provided an opportunity to consider parallel evidence in the review process. As discussed earlier, the applicability of evidence from other fields will be important in reviews of PHEPR practices given that foundational knowledge for some practices may have been generated outside the PHEPR context. Other fields that may be relevant to PHEPR include behavioral economics, psychology, and sociology.
The information sharing test case highlighted the challenges of conducting systematic reviews on technology-based interventions that are evolving rapidly, thus raising concerns regarding the relevance of the findings and recommendations to contemporary practice. These challenges are not specific to PHEPR; the suitability of slow and often infrequently updated evidence reviews for research areas that are changing rapidly has been noted more broadly, giving rise to the concept of living systematic reviews (Elliott et al., 2017) and guidelines (Akl et al., 2017) that are continuously updated as new evidence is published.
The non-pharmaceutical intervention (quarantine) test case raised the issue of how effectiveness is defined (which outcomes matter). It also necessitated the incorporation of mechanistic evidence and evidence from modeling studies. The scoping review discussed in Chapter 2 found that modeling studies make up a substantial proportion of the evidence base for the Non-Pharmaceutical Interventions Capability (as well as the Medical Countermeasure Dispensing and Administration and Medical Materiel Management and Distribution Capabilities), emphasizing the importance of integrating modeling evidence into PHEPR evidence reviews. Although models have been incorporated into past evidence reviews, such as The Community Guide review of school closure to reduce transmission of pandemic influenza (The Community Guide, 2012), this remains an active area of methodological development and is also an intensive process. Consequently, the committee undertook only a limited analysis. As methods for review and integration of modeling evidence are refined, the methodology applied by the committee will need to be updated. The use of mechanistic evidence in evidence syntheses is uncommon, although evidence of biological mechanisms of action is increasingly being incorporated into reviews, for example, on pharmacological and toxicological topics. This is another area requiring further methodological development, one that would benefit from the efforts of a future guidelines development group (see Recommendation 1) to further develop and refine the definition and test the mechanistic upgrading assumptions.
Despite these challenges, the committee was able to use its evidence review and evaluation methodology to answer not only traditional questions about the effectiveness of a practice but also more operational questions of interest to PHEPR practitioners regarding implementation. The committee hopes this may encourage future reviewers to embrace unconventional questions to address important knowledge gaps in the PHEPR field.
In assessing the COE for the PHEPR practices, the committee experienced challenges applying some of the GRADE domains. GRADE is most suitable for discrete interventions as is typical in clinical trials, but less so for more complex areas where context and the effect of multiple interventions are prominent study characteristics. As discussed earlier in this chapter, for most PHEPR practices, the committee judged that it would not be conceptually appropriate to assume that an effect size existed independent of context and implementation fidelity. As others have done (Movsisyan et al., 2016; Rehfuess and Akl, 2013), the committee also considered whether all bodies of evidence comprising largely nonrandomized studies should start the GRADE process at low COE, but ultimately determined that there was value in adhering to the GRADE approach to the extent possible while acknowledging that this is an ongoing point of discussion in the field (Montgomery et al., 2019). Further consideration of potential modifications to GRADE or of alternative rating schemes that provide more emphasis on non-RCT methods is warranted.19
The committee refined its methodology as a clearer picture of the evidence base emerged. The ultimate result was a process that could be used to develop practice recommendations for three of the four exemplar review topics. It is important to note, however, that other review approaches could have been employed (e.g., the realist approach [Greenhalgh et al., 2011] described earlier in this chapter) for questions about circumstances in which a particular intervention should be implemented. Furthermore, the committee’s focus on methods for systematic literature review and evidence grading reflects its charge, but does not imply that this is the best approach to inform decision making for all PHEPR practices, particularly given the substantial investment of time (see Figure 3-4) and resources required for such reviews. It may be that for some questions, other methods, such as Delphi studies to elicit expert opinion, decision analysis, or simulation modeling, would yield sufficient and perhaps even more useful information to guide decision making, in some cases in real time. It is worth noting, especially in the current context (i.e., the coronavirus outbreak that was declared a public health emergency in January 2020), that while the time required to conduct these evidence reviews is not an issue unique to PHEPR, scenarios are more likely to arise in PHEPR that would necessitate the rapid development of guidelines (Garritty et al., 2017; Schünemann et al., 2007). Standard methods for rapid guideline development and revision are actively being pursued (Garritty et al., 2016; Kowalski et al., 2018) and may inform adaptations of the committee’s methodology to facilitate rapid review. Relatedly, it is conceivable that a public health emergency would warrant the expedited completion and/or early release of information from an in-progress review, and processes need to be established for such a contingency.
There have been repeated calls for measures and approaches for evaluation in the PHEPR field (Acosta et al., 2009; IOM, 2008). However, there has been no concerted effort to change the methodologies used in evaluating PHEPR practice or evidence. While the committee acknowledges that methods other than systematic review may be useful in addressing the evidentiary needs of PHEPR practitioners and policy makers, there remains a clear need for an evidence review process to generate evidence-based PHEPR recom-
19 As noted earlier in this chapter, had individual nonrandomized studies been assessed for risk of bias using ROBINS-I, the starting COE for bodies of such studies could have been set at high according to GRADE guidance (Schünemann et al., 2018), but would immediately have been downgraded two levels because of risk of bias from lack of randomization, ending up at low.
mendations and guidelines. Given the time and resources required to conduct systematic reviews, the committee was limited to reviewing only a small selection of PHEPR practices as proof of concept. Hundreds of such reviews could be conducted on PHEPR practices to guide practitioners in operationalizing the 15 PHEPR Capabilities. Scoping reviews such as those discussed in Chapter 2 can help guide the selection of review topics for which a systematic review is likely to be worthwhile, as can structured priority-setting activities (Zaza et al., 2000a), including such Delphi-type processes as the practitioner engagement activity the committee undertook to inform future potential PHEPR review topics (described in Box 3-5).20 Moreover, the evidence base for PHEPR practices is continually evolving with the field. As new studies and reports are published, a sustained mechanism will be needed for capturing and analyzing new evidence over time and for updating prior reviews as needed.
In addition to guiding PHEPR practice and decision making, systems that support ongoing evidence reviews have the potential to drive improvements in the evidence base over time and guide the research agenda through the identification of evidence gaps (as discussed further in Chapter 8). As limitations in study design and execution are systematically catalogued, standards and guidance to researchers can be developed to improve the evidentiary value of future studies.
Evidence review and evaluation methods are continuing to evolve, and particularly relevant to PHEPR are the emerging methods for complex interventions and complex systems. In the interest of sustainability, the committee adopted as the foundation for its layered grading approach the widely adopted GRADE framework for evaluating quantitative evidence of effectiveness, its EtD criteria, and the GRADE-CERQual method for assessing synthesized qualitative findings, although the integrated COE assessment described in this chapter went beyond the GRADE approach. The GRADE methodology is continually refined through the work of the GRADE working groups, one of which is actively developing methods for assessing the certainty of the body of evidence for complex health and social interventions (Norris et al., 2019; Rehfuess and Akl, 2013). Moreover, training courses and workshops are provided to assist users with applying the GRADE methods, and such events are noted on the GRADE Working Group’s website.21 Consequently, the use of GRADE and GRADE-CERQual gives reviewers access to widely used evidence evaluation tools that are regularly updated. Over time, and as reviewers gain more experience with PHEPR evidence reviews, it will be important to assess and refine the review methods to ensure that they are consistent with current review and guideline development practice and are meeting the needs of PHEPR stakeholders.
There are two distinct approaches for ongoing guideline development—centralized and decentralized or “franchised.” In the centralized approach, guidelines are developed by a single organization, whereas in the decentralized model, any group (often professional organizations and academic groups) can apply a standard methodology to produce a guideline (Grol, 1993; IOM, 1992). Although a decentralized approach allows for greater capacity to conduct reviews and may stimulate wider interest in evidence-based methods, the resulting products exhibit significant variability, and concerns have been raised regard-
20 The Delphi-like practitioner engagement activity described in Box 3-5 and in Appendix A was conducted after the committee’s four evidence review topics had been selected and therefore did not inform the selection process. The activity was intended to inform priorities for future PHEPR evidence reviews.
ing the potential to use the process to promote special interests. Centralized approaches, in contrast, are characterized by lower throughput and may be threatened by budget cuts when dependent on federal support, but ensure greater consistency in the application of the methods and the quality of the reports (although it should be noted that a centralized approach can mitigate but does not entirely obviate the possibility of conflicts of interest or other biases on the part of the agency or individuals charged with carrying out the reviews).
The complexity of the review methods developed by the committee has clear implications for the expertise required in the multidisciplinary group that will need to be involved in future reviews of PHEPR practices. The group’s composition will need to include PHEPR practitioners; PHEPR researchers; and experts with deep knowledge of review methodologies, including methods for synthesizing and grading both quantitative and qualitative evidence. A review group will also benefit from including legal, ethical, and social science expertise, especially when addressing issues affecting implementation. Considering the requisite diversity of expertise and the need for guidelines to issue from an authoritative source with the trust of the PHEPR community and the ability to disseminate the guidelines widely, the committee concluded that a centralized approach supported by CDC is the best model for a system for ongoing evidence reviews of PHEPR practices. Importantly, CDC has extensive experience in overseeing evidence-based review processes, with notable examples including the CPSTF; the Advisory Committee on Immunization Practices (Lee et al., 2018); the Healthcare Infection Control Practices Advisory Committee, which notably incorporates GRADE into its guideline process (CDC, 2019; Umscheid et al., 2010); and, formerly, EGAPP.
There are a number of ways to operationalize a centralized evidence review model, each with its advantages and limitations. The evidence review system can be internal to a federal agency, or centralized reviews can be conducted by independent, external groups that are
convened by and work closely with a federal sponsor. Durable exemplars of these two models are CPSTF and USPSTF (USPSTF, 2015; Zaza et al., 2000b). The 35-year-old USPSTF is an independent task force that is convened and funded by AHRQ and makes recommendations on clinical preventive services. It selects topics and oversees the evidence reviews, which are conducted by the separately funded EPCs. The task force reviews the evidence and makes recommendation statements, which are published separately from the EPC reports. CPSTF is convened and overseen by CDC, and in contrast to the USPSTF process, CDC staff have a central role in topic selection and conduct of the evidence reviews, although the recommendations that result are those of the task force.
For a complex task such as PHEPR evidence reviews, an independent, external review group could help avoid conflict of interest while ensuring the broad range of inputs and skills necessary to produce credible, rigorous recommendations. An additional advantage is that its recommendations would not need to be vetted through a government approval process. CDC could still play different roles in these processes, including funding the evidence review system, convening a task force, suggesting topics, overseeing contractors that perform the reviews, soliciting public input, and disseminating recommendations.
A sustainable evidence-based review process for PHEPR will require organizational support and leadership; multifaceted capabilities; adequate funding; and a functional, coordinated system. An initial investment in infrastructure will need to include a curated catalog/guide for the evidence reviews that is made widely available and supported by outreach, education, and implementation resources. In addition to the initial costs for standup of the evidence review group and process, annual funding will be needed to support the group’s ongoing activities. This funding can be roughly estimated, based on the USPSTF annual budget,22 at approximately $10 million annually. This annual cost is not insignificant, but pales in the context of annual spending by the National Institutes of Health (NIH) on research project grants (approximately $6 billion in 2019).23 Because of the vagaries of year-to-year priorities and changing personnel, there are significant advantages to establishing the funding and structure in legislation, although care would be necessary to ensure that the language of such legislation is not so prescriptive that those responsible for implementation lack the necessary flexibility. In addition to providing some measure of stability (i.e., protection against agency budget cuts), legislation can facilitate needed oversight. For example, both CPSTF and USPSTF send annual reports to Congress highlighting high-priority research gaps, which could also help secure funding for critical PHEPR research.
22 In 2019, the AHRQ budget for USPSTF was $11.6 million according to AHRQ’s 2020 operating plan, available at https://www.ahrq.gov/sites/default/files/wysiwyg/cpi/about/mission/operating-plan/operating-plan-2020.pdf (accessed February 21, 2020).
Acosta, J. D., C. Nelson, E. B. Beckjord, S. R. Shelton, E. Murphy, K. L. Leuschner, and J. Wasserman. 2009. A national agenda for public health systems research on emergency preparedness. Santa Monica, CA: RAND Health.
Akl, E. A., J. J. Meerpohl, J. Elliott, L. A. Kahale, H. J. Schünemann, T. Agoritsas, J. Hilton, C. Perron, E. Akl, R. Hodder, et al. 2017. Living systematic reviews: Living guideline recommendations. Journal of Clinical Epidemiology 91:47–53.
Alonso-Coello, P., H. J. Schünemann, J. Moberg, R. Brignardello-Petersen, E. A. Akl, M. Davoli, S. Treweek, R. A. Mustafa, G. Rada, S. Rosenbaum, A. Morelli, G. H. Guyatt, and A. D. Oxman. 2016. GRADE evidence to decision (ETD) frameworks: A systematic and transparent approach to making well informed healthcare choices: Introduction. BMJ 2016:353. https://doi.org/10.1136/bmj.i2016.
Anderson, L. M., M. Petticrew, E. Rehfuess, R. Armstrong, E. Ueffing, P. Baker, D. Francis, and P. Tugwell. 2011. Using logic models to capture complexity in systematic reviews. Research Synthesis Methods 2(1):33–42.
Barnett-Page, E., and J. Thomas. 2009. Methods for the synthesis of qualitative research: A critical review. BMC Medical Research Methodology 9(1):59.
Bate, P., P. Mendel, and G. Robert. 2008. Organizing for quality: The improvement journeys of leading hospitals in Europe and the United States. New York: Radcliffe Publishing.
Bennett, C., S. Khangura, J. C. Brehaut, I. D. Graham, D. Moher, B. K. Potter, and J. M. Grimshaw. 2010. Reporting guidelines for survey research: An analysis of published guidance and reporting practices. PLOS Medicine 8(8):e1001069.
Berg, R. C., and J. Nanavati. 2016. Realist review: Current practice and future prospects. Journal of Practice Research 12(1).
Booth, A., J. Noyes, K. Flemming, A. Gerhardus, P. Wahlster, G. J. Wilt, K. Mozygemba, P. Refolo, D. Sacchini, M. Tummers, and E. Rehfuess. 2016. Guidance on choosing qualitative evidence synthesis methods for use in health technology assessments of complex interventions. https://www.integrate-hta.eu/wp-content/uploads/2016/02/Guidance-on-choosing-qualitative-evidence-synthesis-methods-for-use-in-HTA-of-complexinterventions.pdf (accessed March 4, 2020).
Booth, A., G. Moore, K. Flemming, R. Garside, N. Rollins, Ö. Tunçalp, and J. Noyes. 2019. Taking account of context in systematic reviews and guidelines considering a complexity perspective. BMJ Global Health 4(Suppl 1).
Boruch, R., and N. Rui. 2008. From randomized controlled trials to evidence grading schemes: Current state of evidence-based practice in social sciences. Journal of Evidence-Based Medicine 1(1):41–49.
Briss, P. A., S. Zaza, M. Pappaioanou, J. Fielding, L. K. Wright-De Aguero, B. I. Truman, D. P. Hopkins, P. Dolan Mullen, R. S. Thompson, S. H. Woolf, V. G. Carande-Kulis, L. Andersin, A. R. Hinman, D. V. McQueen, S. M. Teutsch, J. R. Harris, and The Task Force on Community Preventive Services. 2000. Developing an evidence-based guide to community preventive services—methods. American Journal of Preventive Medicine 18(1S):35–43.
Briss, P. A., R. C. Brownson, J. E. Fielding, and S. Zaza. 2004. Developing and using the guide to community preventive services: Lessons learned about evidence-based public health. Annual Review of Public Health 25(1):281–302.
Bruce, N., A. Pruss-Ustun, D. Pope, H. Adair-Rohani, and E. Rehfuess. 2014. Methods used for evidence assessment. WHO indoor air quality guidelines: Household fuel combustion. Geneva, Switzerland: World Health Organization.
Brunetti, M., I. Shemilt, S. Pregno, L. Vale, A. D. Oxman, J. Lord, J. Sisk, F. Ruiz, S. Hill, G. H. Guyatt, R. Jaeschke, M. Helfand, R. Harbour, M. Davoli, L. Amato, A. Liberati, and H. J. Schünemann. 2013. GRADE guidelines: 10. Considering resource use and rating the quality of economic evidence. Journal of Clinical Epidemiology 66(2):140–150.
Campbell, M., J. E. McKenzie, A. Sowden, S. V. Katikireddi, S. E. Brennan, S. Ellis, J. Hartmann-Boyce, R. Ryan, S. Shepperd, J. Thomas, V. Welch, and H. Thomson. 2020. Synthesis without meta-analysis (SWIM) in systematic reviews: Reporting guideline. BMJ 368:l6890.
Carbone, E. G., and E. V. Thomas. 2018. Science as the basis of public health emergency preparedness and response practice: The slow but crucial evolution. American Journal of Public Health 108(S5):S383–S386.
CASP (Critical Appraisal Skills Programme). 2018. CASP appraisal checklists. http://casp-uk.net/casp-tools-checklists (accessed June 23, 2020).
CDC (Centers for Disease Control and Prevention). 2018a. Introduction to prevention effectiveness. https://www.cdc.gov/publichealth101/prevention-effectiveness.html (accessed February 15, 2020).
CDC. 2018b. Public health emergency preparedness and response capabilities: National standards for state, local, tribal, and territorial public health. Atlanta, GA: Centers for Disease Control and Prevention. https://www.cdc.gov/cpr/readiness/00_docs/CDC_PreparednesResponseCapabilities_October2012_Final_508.pdf (accessed March 4, 2020).
CDC. 2019. Update to the Centers for Disease Control and Prevention and the Healthcare Infection Control Practices Advisory Committee recommendation categorization scheme for infection control and prevention guideline recommendations. https://www.cdc.gov/hicpac/pdf/recommendation-scheme-update-H.pdf (accessed March 4, 2020).
CLEAR (Clearinghouse for Labor Evaluation and Research). 2014. Operational guidelines for reviewing implementation studies. https://clear.dol.gov/sites/default/files/CLEAR_Operational%20Implementation%20Study%20GuGuidelin.pdf (accessed March 4, 2020).
CLEAR. 2015. Clear causal evidence guidelines, version 2.1. https://clear.dol.gov/sites/default/files/CLEAR_Evidence-Guidelines_V2.1.pdf (accessed March 4, 2020).
Cochrane. 2017. Suggested risk of bias criteria for EPOC reviews. https://epoc.cochrane.org/sites/epoc.cochrane.org/files/public/uploads/Resources-for-authors2017/suggested_risk_of_bias_criteria_for_epoc_reviews.pdf (accessed February 15, 2020).
Davids, E. L., and N. V. Roman. 2014. A systematic review of the relationship between parenting styles and children’s physical activity. African Journal for Physical, Health Education, Recreation and Dance 228–246.
Durrheim, D. N., and A. Reingold. 2010. Modifying the GRADE framework could benefit public health. Journal of Epidemiology and Community Health 64(5):387.
ECDC (European Centre for Disease Prevention and Control). 2018. Best practice recommendations for conducting after-action reviews to enhance public health preparedness. Stockholm, Sweden: European Centre for Disease Prevention and Control. https://www.ecdc.europa.eu/sites/default/files/documents/public-health-preparedness-best-practice-recommendations.pdf (accessed March 4, 2020).
Elliott, J., A. Synnot, T. Turner, M. Simmonds, E. Akl, S. McDonald, G. Salanti, J. Meerpohl, H. MacLehose, J. Hilton, I. Shemilt, J. Thomas, T. Agoritsas, R. Hodder, and J. Yepes-Nuñez. 2017. Living systematic review: Introduction: The why, what, when and how. Journal of Clinical Epidemiology 91.
Fisher, R. A. 1925. Statistical methods for research workers. Edinburgh, Scotland: Oliver and Boyd.
Flemming, K., A. Booth, R. Garside, Ö. Tunçalp, and J. Noyes. 2019. Qualitative evidence synthesis for complex interventions and guideline development: Clarification of the purpose, designs and relevant methods. BMJ Global Health 4(Suppl 1):e000882.
Garritty, C., A. Stevens, G. Gartlehner, V. King, C. Kamel, and on behalf of the Cochrane Rapid Reviews Methods Group. 2016. Cochrane Rapid Reviews Methods Group to play a leading role in guiding the production of informed high-quality, timely research evidence syntheses. Systematic Reviews 5(1):184.
Garritty, C. M., S. L. Norris, and D. Moher. 2017. Developing who rapid advice guidelines in the setting of a public health emergency. Journal of Clinical Epidemiology 82:47–60.
Glenton, C., C. J. Colvin, B. Carlsen, A. Swartz, S. Lewin, J. Noyes, and A. Rashidian. 2013. Barriers and facilitators to the implementation of lay health worker programmes to improve access to maternal and child health: Qualitative evidence synthesis. Cochrane Database of Systematic Reviews (10).
Goodman, S. N., and J. Gerson. 2013. Mechanistic evidence in evidence-based medicine: A conceptual framework. AHRQ. https://effectivehealthcare.ahrq.gov/products/mechanistic-evidence-framework/white-paper (accessed May 23, 2020).
Gordon, M. 2016. Are we talking the same paradigm? Considering methodological choices in health education systematic review. Medical Teacher 38(7):746–750.
Green, L. W., R. C. Brownson, and J. E. Fielding. 2017. Introduction: How is the growing concern for relevance and implementation of evidence-based interventions shaping the public health research agenda? Annual Review of Public Health 38:i–iii.
Greenhalgh, T., G. Robert, F. MacFarlane, P. Bate, and O. Kyriakidou. 2004. Diffusion of innovations in service organizations: Systematic review and recommendations. The Milbank Quarterly 82(4):581–629.
Greenhalgh, T., G. Wong, G. Westhorp, and R. Pawson. 2011. Protocol—realist and meta-narrative evidence synthesis: Evolving standards (RAMESES). BMC Medical Research Methodology 11(1):115.
Grol, R. 1993. Development of guidelines for general practice care. British Journal of Medical Practice 43:146–151.
Guise, J.-M., C. Chang, M. Butler, M. Viswanathan, and P. Tugwell. 2017. AHRQ series on complex intervention systematic reviews—paper 1: An introduction to a series of articles that provide guidance and tools for reviews of complex interventions. Journal of Clinical Epidemiology 90:6–10.
Guyatt, G., A. D. Oxman, E. A. Akl, R. Kunz, G. Vist, J. Brozek, S. Norris, Y. Falck-Ytter, P. Glasziou, H. Debeer, R. Jaeschke, D. Rind, J. Meerpohl, P. Dahm, and H. J. Schünemann. 2011a. GRADE guidelines: 1. Introduction—GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 64(4):383–394.
Guyatt, G. H., A. D. Oxman, S. Sultan, P. Glasziou, E. A. Akl, P. Alonso-Coello, D. Atkins, R. Kunz, J. Brozek, V. Montori, R. Jaeschke, D. Rind, P. Dahm, J. Meerpohl, G. Vist, E. Berliner, S. Norris, Y. Falck-Ytter, M. H. Murad, H. J. Schünemann, and GRADE Working Group. 2011b. GRADE guidelines: 9. Rating up the quality of evidence. Journal of Clinical Epidemiology 64(12):1311–1316.
Harden, A., J. Thomas, M. Cargo, J. Harris, T. Pantoja, K. Flemming, A. Booth, R. Garside, K. Hannes, and J. Noyes. 2018. Cochrane Qualitative and Implementation Methods Group guidance series-paper 5: Methods for integrating qualitative and implementation evidence within intervention effectiveness reviews. Journal of Clinical Epidemiology 97:70–78.
Harrison, H., M. Birks, R. Franklin, and J. Mills. 2017. Case study research: Foundations and methodological orientations. Forum: Qualitative Social Research 18(1). http://www.qualitative-research.net/index.php/fqs/article/view/2655/4079 (accessed May 7, 2020).
Hick, J. L., D. Hanfling, M. K. Wynia, and A. T. Pavia. 2020. Duty to plan: Health care, crisis standards of care, and novel coronavirus SARS-CoV-2. National Academy of Medicine. https://nam.edu/duty-to-plan-health-care-crisis-standards-of-care-and-novel-coronavirus-sars-cov-2 (accessed May 23, 2020).
Higgins, J. P. T., J. A. Lopez-Lopez, B. J. Becker, S. R. Davies, S. Dawson, J. M. Grimshaw, L. A. McGuinness, T. H. M. Moore, E. A. Rehfuess, J. Thomas, and D. M. Caldwell. 2019. Synthesising quantitative evidence in systematic reviews of complex health interventions. BMJ Global Health 4(Suppl 1):e000858.
Hill, A. B. 1965. The environment and disease: Association or causation? Proceedings of the Royal Society of Medicine 58(5):295–300.
Hoffmann, T. C., P. P. Glasziou, I. Boutron, R. Milne, R. Perera, D. Moher, D. G. Altman, V. Barbour, H. Macdonald, M. Johnston, S. E. Lamb, M. Dixon-Woods, P. McCulloch, J. C. Wyatt, A.-W. Chan, and S. Michie. 2014. Better reporting of interventions: Template for intervention description and replication (TIDieR) checklist and guide. BMJ 348:g1687.
Hohmann, A. A., and M. K. Shear. 2002. Community-based intervention research: Coping with the “noise” of real life in study design. The American Journal of Psychiatry 159(2):201–207.
Howick, J., P. Glasziou, and J. K. Aronson. 2009. The evolution of evidence hierarchies: What can Bradford Hill’s “Guidelines for Causation” contribute? Journal of the Royal Society of Medicine 102(5):186–194.
Howick, J., P. Glasziou, and J. K. Aronson. 2010. Evidence-based mechanistic reasoning. Journal of the Royal Society of Medicine 103(11):433–441.
Hultcrantz, M., D. Rind, E. A. Akl, S. Treweek, R. A. Mustafa, A. Iorio, B. S. Alper, J. J. Meerpohl, M. H. Murad, M. T. Ansari, S. V. Katikireddi, P. Ostlund, S. Tranaeus, R. Christensen, G. Gartlehner, J. Brozek, A. Izcovich, H. Schünemann, and G. Guyatt. 2017. The GRADE Working Group clarifies the construct of certainty of evidence. Journal of Clinical Epidemiology 87:4–13.
Hunter, J. C., J. E. Yang, A. W. Crawley, L. Biesiadecki, and T. J. Aragón. 2013. Public health response systems in-action: Learning from local health departments’ experiences with acute and emergency incidents. PLOS ONE 8(11):e79457. https://doi.org/10.1371/journal.pone.0079457.
IOM (Institute of Medicine). 1992. Developing clincal practice guidelines. In Guidelines for clinical practice: From development to use. Washington, DC: National Academy Press.
IOM. 2008. Research priorities in emergency preparedness and response for public health systems: A letter report. Washington, DC: The National Academies Press.
IOM. 2009. Guidance for establishing crisis standards of care for use in disaster situations: A letter report. Washington, DC: The National Academies Press.
Jennings, B., and J. Arras. 2008. Ethical guidance for public health emergency preparedness and response: Highlighting ethics and values in a vital public health service. Ethics Subcommittee, Advisory Committee to the Director, Centers for Disease Control and Prevention. https://www.cdc.gov/od/science/integrity/phethics/docs/white_paper_final_for_website_2012_4_6_12_final_for_web_508_compliant.pdf (accessed March 4, 2020).
Jennings, B., J. D. Arras, D. H. Barrett, and B. A. Ellis. 2016. Emergency ethics: Public health preparedness and response. New York: Oxford University Press.
Kowalski, S. C., R. L. Morgan, M. Falavigna, I. D. Florez, I. Etxeandia-Ikobaltzeta, W. Wiercioch, Y. Zhang, F. Sakhia, L. Ivanova, N. Santesso, and H. J. Schünemann. 2018. Development of rapid guidelines: Systematic survey of current practices and methods. Health Research Policy and Systems 16(1):61.
Lee, G., W. Carr, and ACIP Evidence-Based Recommendations Work Group. 2018. Updated framework for development of evidence-based recommendations by the Advisory Committee on Immunization Practices. Morbidity and Mortality Weekly Report 67:1271–1272. https://www.cdc.gov/mmwr/volumes/67/wr/mm6745a4.htm?s_cid=mm6745a4_w (accessed March 4, 2020).
Leviton, L. C. 2017. Generalizing about public health interventions: A mixed-methods approach to external validity. Annual Review of Public Health 38:371–391.
Lewin, S., C. Glenton, H. Munthe-Kaas, B. Carlsen, C. J. Colvin, M. Gulmezoglu, J. Noyes, A. Booth, R. Garside, and A. Rashidian. 2015. Using qualitative evidence in decision making for health and social interventions: An approach to assess confidence in findings from qualitative evidence syntheses (GRADE-CERQual). PLOS Medicine 12(10):e1001895.
Lewin, S., A. Booth, C. Glenton, H. Munthe-Kaas, A. Rashidian, M. Wainwright, M. A. Bohren, O. Tuncalp, C. J. Colvin, R. Garside, B. Carlsen, E. V. Langlois, and J. Noyes. 2018. Applying GRADE-CERQual to qualitative evidence synthesis findings: Introduction to the series. Implementation Science 13(Suppl 1):2.
Marcus, J. 2018. Additional evidence evaluation methods for assessing the effectiveness of interventions and practices: National Transportation Safety Board. Paper presented to the Committee on Evidence-Based Practices for Public Health Emergency Preparedness and Response, July 26, Washington, DC.
Mastroianni, A. C., J. P. Kahn, and N. E. Kass, eds. 2019. The Oxford handbook of public health ethics. Oxford University Press. https://www.oxfordhandbooks.com/view/10.1093/oxfordhb/9780190245191.001.0001/oxfordhb-9780190245191 (accessed May 23, 2020).
Miles, M. B., M. A. Huberman, and J. Saldana. 2014. Qualitative data analysis: A methods sourcebook. Thousand Oaks, CA: SAGE Publications.
Minard, C. G., M. F. de Carvalho, and M. S. Iyengar. 2011. Optimizing medical resources for spaceflight using the integrated medical model. Aviation Space and Environmental Medicine 82(9):890–894.
Moberg, J., A. D. Oxman, S. Rosenbaum, H. J. Schünemann, G. Guyatt, S. Flottorp, C. Glenton, S. Lewin, A. Morelli, G. Rada, P. Alonso-Coello, and G. W. Group. 2018. The GRADE Evidence to Decision (ETD) framework for health system and public health decisions. Health Research Policy and Systems 16(1):45.
Montgomery, P., A. Movsisyan, S. P. Grant, G. Macdonald, and E. A. Rehfuess. 2019. Considerations of complexity in rating certainty of evidence in systematic reviews: A primer on using the GRADE approach in global health. BMJ Global Health 4(Suppl 1).
Movsisyan, A., G. J. Melendez-Torres, and P. Montgomery. 2016. Users identified challenges in applying GRADE to complex interventions and suggested an extension to GRADE. Journal of Clinical Epidemiology 70:191–199.
Murad, M. H., R. A. Mustafa, H. J. Schünemann, S. Sultan, and N. Santesso. 2017. Rating the certainty in evidence in the absence of a single estimate of effect. Journal of Evidence-Based Medicine 22(3):85–87.
NASEM (National Academies of Sciences, Engineering, and Medicine). 2017. Application of systematic review methods in an overall strategy for evaluating low-dose toxicity from endocrine active chemicals. Washington, DC: The National Academies Press.
Nelson, C., N. Lurie, and J. Wasserman. 2007a. Assessing public health emergency preparedness: Concepts, tools, and challenges. Annual Review of Public Health 28(1):1–18.
Nelson, C., N. Lurie, J. Wasserman, and S. Zakowski. 2007b. Conceptualizing and defining public health emergency preparedness. American Journal of Public Health 97:S9–S11.
Norris, S. L., E. A. Rehfuess, H. Smith, Ö. Tunçalp, J. M. Grimshaw, N. P. Ford, and A. Portela. 2019. Complex health interventions in complex systems: Improving the process and methods for evidence-informed health decisions. BMJ Global Health 4(Suppl 1).
Noyes, J., A. Booth, K. Flemming, R. Garside, A. Harden, S. Lewin, T. Pantoja, K. Hannes, M. Cargo, and J. Thomas. 2018. Cochrane Qualitative and Implementation Methods Group guidance series-paper 3: Methods for assessing methodological limitations, data extraction and synthesis, and confidence in synthesized qualitative findings. Journal of Clinical Epidemiology 97:49–58.
Noyes, J., A. Booth, G. Moore, K. Flemming, Ö. Tunçalp, and E. Shakibazadeh. 2019. Synthesising quantitative and qualitative evidence to inform guidelines on complex interventions: Clarifying the purposes, designs and outlining some methods. BMJ Global Health 4(Suppl 1).
NTSB (National Transportation Safety Board). 2020. About the National Transportation Safety Board. https://www.ntsb.gov/about/Pages/default.aspx (accessed May 23, 2020).
Pawson, R., T. Greenhalgh, G. Harvey, and K. Walshe. 2005. Realist review: A new method of systematic review designed for complex policy interventions. Journal of Health Services Research and Policy 10(Suppl 1):21–34.
Petticrew, M. 2015. Time to rethink the systematic review catechism? Moving from “what works” to “what happens.” Systematic Reviews 4:36.
Petticrew, M., L. Anderson, R. Elder, J. Grimshaw, D. Hopkins, R. Hahn, L. Krause, E. Kristjansson, S. Mercer, T. Sipe, P. Tugwell, E. Ueffing, E. Waters, and V. Welch. 2013a. Complex interventions and their implications for systematic reviews: A pragmatic approach. Journal of Clinical Epidemiology 66(11):1209–1214.
Petticrew, M., E. Rehfuess, J. Noyes, J. P. T. Higgins, A. Mayhew, T. Pantoja, I. Shemilt, and A. Sowden. 2013b. Synthesizing evidence on complex interventions: How meta-analytical, qualitative, and mixed-method approaches can contribute. Journal of Clinical Epidemiology 66(11):1230–1243.
Petticrew, M., C. Knai, J. Thomas, E. A. Rehfuess, J. Noyes, A. Gerhardus, J. M. Grimshaw, H. Rutter, and E. McGill. 2019. Implications of a complexity perspective for systematic reviews and guideline development in health decision making. BMJ Global Health 4(Suppl 1).
Phillips, C. V., and K. J. Goodman. 2004. The missed lessons of Sir Austin Bradford Hill. Epidemiologic Perspectives and Innovations 1(1):3.
Pope, C., S. Ziebland, and N. Mays. 2000. Qualitative research in health care: Analysing qualitative data. BMJ Clinical Research Edition 320(7227):114–116.
Rehfuess, E. A., and E. A. Akl. 2013. Current experience with applying the GRADE approach to public health interventions: An empirical study. BMC Public Health 13:9.
Rehfuess, E. A., J. M. Stratil, I. B. Scheel, A. Portela, S. L. Norris, and R. Baltussen. 2019. The WHO-INTEGRATE evidence to decision framework version 1.0: Integrating WHO norms and values and a complexity perspective. BMJ Global Health 4(Suppl 1).
Richard, C., K. Magee, P. Bacon-Abdelmoteleb, and J. Brown. 2018. Countermeasures that work: A highway safety countermeasure guide for state highway safety offices. 9th ed. Washington, DC: National Highway Traffic Safety Administration.
Rohwer, A., L. Pfadenhauer, J. Burns, L. Brereton, A. Gerhardus, A. Booth, W. Oortwijn, and E. Rehfuess. 2017. Clinical epidemiology in South Africa, paper 3: Logic models help make sense of complexity in systematic reviews and health technology assessments. Journal of Clinical Epidemiology 83:37–47.
Rooney, A. A., A. L. Boyles, M. S. Wolfe, J. R. Bucher, and K. A. Thayer. 2014. Systematic review and evidence integration for literature-based environmental health science assessments. Environmental Health Perspectives 122(7):711–718.
Savoia, E., F. Agboola, and P. D. Biddinger. 2012. Use of after action reports (AARs) to promote organizational and systems learning in emergency preparedness. International Journal of Environmental Research and Public Health 9(8):2949–2963.
Schünemann, H. J., S. R. Hill, M. Kakad, G. E. Vist, R. Bellamy, L. Stockman, T. F. Wisloff, C. Del Mar, F. Hayden, T. M. Uyeki, J. Farrar, Y. Yazdanpanah, H. Zucker, J. Beigel, T. Chotpitayasunondh, T. T. Hien, B. Ozbay, N. Sugaya, and A. D. Oxman. 2007. Transparent development of the WHO rapid advice guidelines. PLOS Medicine 4(5):e119.
Schünemann, H., J. Brožek, G. Guyatt, and A. Oxman. 2013. GRADE handbook. GRADE Working Group. https://gdt.gradepro.org/app/handbook/handbook.html (accessed March 4, 2020).
Schünemann, H. J., C. Cuello, E. A. Akl, R. A. Mustafa, J. J. Meerpohl, K. Thayer, R. L. Morgan, G. Gartlehner, R. Kunz, S. V. Katikireddi, J. Sterne, J. P. Higgins, G. Guyatt, and GRADE Working Group. 2018. GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. Journal of Clinical Epidemiology 111:105–114.
Schünemann, H. J., Y. Zhang, A. D. Oxman, and Expert Evidence in Guidelines Group. 2019. Distinguishing opinion from evidence in guidelines. BMJ 366:l4606.
Siegfried, A. L., E. G. Carbone, M. B. Meit, M. J. Kennedy, H. Yusuf, and E. B. Kahn. 2017. Identifying and prioritizing information needs and research priorities of public health emergency preparedness and response practitioners. Disaster Medicine and Public Health Preparedness 11(5):552–561.
Sterne, J. A., M. A. Hernan, B. C. Reeves, J. Savovic, N. D. Berkman, M. Viswanathan, D. Henry, D. G. Altman, M. T. Ansari, I. Boutron, J. R. Carpenter, A. W. Chan, R. Churchill, J. J. Deeks, A. Hrobjartsson, J. Kirkham, P. Juni, Y. K. Loke, T. D. Pigott, C. R. Ramsay, D. Regidor, H. R. Rothstein, L. Sandhu, P. L. Santaguida, H. J. Schünemann, B. Shea, I. Shrier, P. Tugwell, L. Turner, J. C. Valentine, H. Waddington, E. Waters, G. A. Wells, P. F. Whiting, and J. P. Higgins. 2016. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355:i4919.
Sterne, J. A., J. Savović, M. J. Page, R. G. Elbers, N. S. Blencowe, I. Boutron, C. J. Cates, H. Y. Cheng, M. S. Corbett, S. M. Eldridge, J. R. Emberson, M. A. Hernán, S. Hopewell, A. Hróbjartsson, D. R. Junqueira, P. Jüni, J. J. Kirkham, T. Lasserson, T. Li, A. McAleenan, B. C. Reeves, S. Shepperd, I. Shrier, L. A. Stewart, K. Tilling, I. R. White, P. F. Whiting, and J. P. Higgins. 2019. RoB 2: A revised tool for assessing risk of bias in randomised trials. BMJ 366:l4898.
Teutsch, S. M., L. A. Bradley, G. E. Palomaki, J. E. Haddow, M. Piper, N. Calonge, W. D. Dotson, M. P. Douglas, A. O. Berg, and EGAPP Working Group. 2009. The evaluation of genomic applications in practice and prevention (EGAPP) initiative: Methods of the EGAPP working group. Genetics in Medicine 11(1):3–14.
The Community Guide. 2012. Emergency preparedness and response: School dismissals to reduce transmission of pandemic influenza: Summary evidence tables—economic review. https://www.thecommunityguide.org/sites/default/files/assets/SET-schooldismissals-econ.pdf (accessed March 4, 2020).
Thomas, J., and A. Harden. 2008. Methods for the thematic synthesis of qualitative research in systematic reviews. BMC Medical Research Methodology 8:45.
Tracy, S. J. 2018. A phronetic iterative approach to data analysis in qualitative research. Semantic Scholar 19(2):61–76. https://pdfs.semanticscholar.org/e6b9/48c979223e21c695003636d2b73ed8a692dc.pdf?_ga=2.166166476.1259835723.1583341124-540380720.1583341124 (accessed March 4, 2020).
Truman, B. I., C. K. Smith-Akin, A. R. Hinman, K. M. Gebbie, R. C. Brownson, L. F. Novick, R. Lawrence, M. Pappaioanou, J. E. Fielding, C. Evans, F. A. Guerra, M. Vogel-Taylor, C. Mahan, M. Fullilove, S. Zaza, and The Task Force on Community Preventive Services. 2000. Developing the guide to community preventive services: Overview and rationale. American Journal of Preventive Medicine 18:18–26.
Tyndall, J. 2010. AACODS checklist. https://dspace.flinders.edu.au/xmlui/bitstream/handle/2328/3326/AACODS_Checklist.pdf (accessed March 4, 2020).
Umscheid, C. A., R. K. Agarwal, and P. J. Brennan. 2010. Updating the guideline development methodology of the healthcare infection control practices advisory committee (HICPAC). American Journal of Infection Control 38(4):264–273.
USPSTF (U.S. Preventive Services Task Force). 2015. U.S. Preventive Services Task Force procedure manual. https://www.uspreventiveservicestaskforce.org/Page/Name/procedure-manual (accessed March 4, 2020).
USPSTF. 2016. Use of decision models in the development of evidence-based clinical preventive services recommendations. https://www.uspreventiveservicestaskforce.org/Page/Name/use-of-decision-models-in-the-development-of-evidence-based-clinical-preventive-services-recommendations (accessed February 15, 2020).
Walshe, K. 2007. Understanding what works—and why—in quality improvement: The need for theory-driven evaluation. International Journal for Quality in Health Care 19(2):57–59.
Ward, K., K. J. Hoare, and M. Gott. 2015. Evolving from a positivist to constructionist epistemology while using grounded theory: Reflections of a novice researcher. Journal of Research in Nursing 20(6):449–462.
Waters, E., B. J. Hall, R. Armstrong, J. Doyle, T. L. Pettman, and A. de Silva-Sanigorski. 2011. Essential components of public health evidence reviews: Capturing intervention complexity, implementation, economics and equity. Journal of Public Health 33(3):462–465.
Welch, V. A., E. A. Akl, K. Pottie, M. T. Ansari, M. Briel, R. Christensen, A. Dans, L. Dans, J. Eslava-Schmalbach, G. Guyatt, M. Hultcrantz, J. Jull, S. V. Katikireddi, E. Lang, E. Matovinovic, J. J. Meerpohl, R. L. Morton, A. Mosdol, M. H. Murad, J. Petkovic, H. Schünemann, R. Sharaf, B. Shea, J. A. Singh, I. Solà, R. Stanev, A. Stein, L. Thabaneii, T. Tonia, M. Tristan, S. Vitols, J. Watine, and P. Tugwellan. 2017. GRADE equity guidelines 3: Considering health equity in GRADE guideline development: Rating the certainty of synthesized evidence. Journal of Clinical Epidemiology 90:76–83.
WHO (World Health Organization). 2015. Ethics in epidemics, emergencies and disasters: Research, surveillance and patient care. Geneva, Switzerland: World Health Organization.
WHO. 2018. Communicating risk in public health emergencies: A WHO guideline for emergency risk communication (ERC) policy and practice. Geneva, Switzerland: World Health Organization.
Wong, G., T. Greenhalgh, G. Westhorp, J. Buckingham, and R. Pawson. 2013. RAMESES publication standards: Realist syntheses. BMC Medicine 11:21.
WWC (What Works Clearinghouse). 2017a. Procedures handbook: Version 4.0. https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_procedures_handbook_v4.pdf (accessed March 4, 2020).
WWC. 2017b. Standards handbook: Version 4.0. https://ies.ed.gov/ncee/wwc/Docs/referenceresources/wwc_standards_handbook_v4.pdf (accessed March 4, 2020).
WWC. 2020. Standards handbook:Version 4.1. https://ies.ed.gov/ncee/wwc/Docs/referenceresources/WWC-Standards-Handbook-v4-1-508.pdf (accessed March 4, 2020).
Zaza, S., R. S. Lawrence, C. S. Mahan, M. Fullilove, D. Fleming, G. J. Isham, and M. Pappaioanou. 2000a. Scope and organization of the Guide to Community Preventive Services: The task force on community preventive services. American Journal of Preventive Medicine 18(1 Suppl):27–34.
Zaza, S., L. K. Wright-De Aguero, P. A. Briss, B. I. Truman, D. P. Hopkins, M. H. Hennessy, D. M. Sosin, L. Andersin, V. G. Carande-Kulis, S. M. Teutsch, M. Pappaioanou, and Task Force on Community Preventive Services. 2000b. Data collection instrument and procedure for systematic reviews in the guide to community preventive services. American Journal of Preventive Medicine 18(1S):44–74.