Page 145 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

B

Research Methods

Evaluating the quality of evidence for key questions on the benefits and harms of preventive services forms a cornerstone of the U.S. Preventive Services Task Force (USPSTF) and the Community Preventive Services Task Force processes. Researchers who produce this evidence often face tradeoffs among an array of possible study designs that can be used for research on whether an action (e.g., screening, behavioral counseling, or preventive medication) causes an outcome. Randomized controlled trials (RCTs) are widely considered to produce the highest-quality evidence because they are less prone to confounding than observational study designs. However, evidence from RCTs often has key limitations in generalizability to real-life populations and settings.

New designs for RCTs and innovative methods for observational studies have gained increasing traction in health care delivery research in the past two decades. Modern study designs could enable researchers to address evidence gaps in the USPSTF analytic framework where traditional individual-level RCTs are not feasible for a variety of reasons. In addition, innovative designs can be useful for studies to address gaps in the research foundation, and in real-world dissemination and implementation of recommended preventive services. In this chapter, the committee discusses the types of studies needed to fill different types of evidence gaps, considering both existing USPSTF methods and innovative modern methods. Our purpose is to highlight specific newer study designs particularly germane to clinical prevention research, and to encourage

Page 146 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

researchers, sponsors, and guideline committees to increase their use and acceptance where appropriate.

TYPES OF STUDIES NEEDED TO FILL EVIDENCE GAPS

To identify the types of studies needed to fill evidence gaps for recent USPSTF-reviewed topics, the committee evaluated all “Research Needs and Gaps” sections from all 14 USPSTF recommendations that contained any “insufficient evidence” (I) statement published over 3 years, from July 2018 to June 2021. Five of these 14 statements contained I statements along with letter grade recommendations (A, B, C, or D) for part of the population or intervention, with the I statement applying to a subgroup or to a specific type of intervention. The committee also reviewed all nine statements containing only A, B, C, or D grades published from January 2020 to June 2021.

Among the 14 USPSTF recommendations reviewed containing an I statement, four specifically called for RCTs (see Table B-1). Others called for evidence about the effectiveness of screening or intervention without specifying a required study design. Ten of the 14 called for studies of risk prediction approaches (i.e., ways to identify high-risk groups). Of these, several statements requested research on how high-risk subgroups could be identified using clinical information such as data from questionnaires. Others requested research to establish the best specifications for a physical test, such as determining the preferred laboratory test for vitamin D deficiency, or developing consistent definitions of hearing loss to improve certainty about the accuracy of screening tests. Five of the 14 recommendations described evidence gaps that would require longitudinal followup of patients, either in RCTs or cohort studies, such as understanding the long-term history of hypertension in children and how often it resolves spontaneously. Half of the statements called for more research on population subgroups defined by age, sex, race or ethnicity, or other factors. Three requested research on real-world dissemination or implementation. One statement, on abdominal aortic aneurysm screening, suggested that high-quality modeling studies for women could be informative if new trials were not available.

Of the nine USPSTF recommendations reviewed containing letter grades and no I statements, six specifically requested additional RCTs. Seven called for studies of risk prediction approaches, and six requested studies to refine the specifications of the intervention or to test related new interventions. Three described research needs that could best be met by longitudinal cohort studies or trials with long-term follow-up. Two mentioned research needs related to dissemination and implementation.

Page 147 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

TABLE B-1 Types of Studies Needed to Address Research Needs and Gaps Described in Recent U.S. Preventive Services Task Force Statements

	Study Type Specified or Implied in the Research Needs Section of the USPSTF Statement
Topic	Randomized controlled trial (RCT)*	Intervention effectiveness study, unspecified*	Risk prediction study, including test performance	Longitudinal follow-up (in RCT or cohort study)	Cross-sectional study	Research in subgroups	Dissemination or Implementation study	Notes
Part 1. Recommendations containing an “insufficient evidence” (I) statement, July 2018–June 2021
Vitamin D Deficiency in Adults: Screening		1	1
Tobacco Smoking Cessation in Adults, Including Pregnant Persons: Interventions	1	1		1	1			Calls for studies of additional outcomes
Hearing Loss in Older Adults: Screening		1	1			1

Page 148 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

	Study Type Specified or Implied in the Research Needs Section of the USPSTF Statement
Topic	Randomized controlled trial (RCT)*	Intervention effectiveness study, unspecified*	Risk prediction study, including test performance	Longitudinal follow-up (in RCT or cohort study)	Cross-sectional study	Research in subgroups	Dissemination or Implementation study	Notes
High Blood Pressure in Children and Adolescents: Screening		1	1	1		1		RCTs could be difficult because this screening is routine in usual care
Prevention and Cessation of Tobacco Use in Children and Adolescents: Primary Care Interventions		1				1
Unhealthy Drug Use: Screening		1	1				1
Illicit Drug Use in Children, Adolescents, and Young Adults: Primary Care–Based Interventions		1				1	1
Bacterial Vaginosis in Pregnant Persons to Prevent Preterm Delivery: Screening		1				1

Page 149 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Cognitive Impairment in Older Adults: Screening		1	1					Calls for more consistent definitions of outcomes
Abdominal Aortic Aneurysm: Screening	1		1	1	1	1	1	Suggested high-quality modeling studies could be useful
Elevated Blood Lead Levels in Children and Pregnant Women: Screening		1	1	1				RCTs could be difficult because this screening is routine in usual care
Atrial Fibrillation: Screening with Electrocardiography	1		1					Mentions ongoing RCTs
Peripheral Artery Disease and Cardiovascular Disease: Screening and Risk Assessment with the Ankle-Brachial Index	1		1			1		Mentions ongoing RCTs
Cardiovascular Disease: Risk Assessment with Nontraditional Risk Factors		1	1	1			1	Requests studies of incremental benefit in real-world practice
Total	4	11	10	5	2	7	3

Page 150 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

	Study Type Specified or Implied in the Research Needs Section of the USPSTF Statement
Topic	Randomized controlled trial (RCT)*	Intervention effectiveness study, unspecified*	Risk prediction study, including test performance	Longitudinal follow-up (in RCT or cohort study)	Cross-sectional study	Research in subgroups	Dissemination or Implementation study	Notes

Part 2. Recommendations containing only A, B, C, or D grades
Healthy Weight and Weight Gain in Pregnancy: Behavioral Counseling Interventions		1		1		1
Colorectal Cancer: Screening	1	1	1			1	1
Hypertension in Adults: Screening			1	1

Page 151 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Lung Cancer: Screening			1				1	Requests studies of risk prediction models to select patients to screen
Asymptomatic Carotid Artery Stenosis: Screening	1		1
Hepatitis B Virus Infection in Adolescents and Adults: Screening	1	1	1	1			1	Requests studies of decision support tools
Healthy Diet and Physical Activity for Cardiovascular Disease Prevention in Adults with Cardiovascular Risk Factors: Behavioral Counseling	1	1
Sexually Transmitted Infections: Behavioral Counseling	1	1	1	1
Hepatitis C Virus Infection in Adolescents and Adults: Screening	1	1	1			1
Total	6	6	7	3	0	3	2

* The Randomized Controlled Trial column was marked only when a Research Needs statement specified a randomized controlled trial. Other statements calling for research on the benefits of a screening, treatment, or other intervention were classified as “Intervention Effectiveness Study, unspecified.”

Page 152 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

PRODUCING HIGH-QUALITY EVIDENCE

The USPSTF rates the body of evidence for each question in the analytic framework for a given topic as convincing, adequate, or inadequate based on several factor (USPSTF, 2021). First among these is, “Do the studies have the appropriate research design to answer the key question(s)?” Other key factors considered in evaluating the adequacy of evidence include the internal validity and external generalizability to the U.S. primary care population, aggregated across all studies across each of the key questions.

The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) system, introduced in 2011 based on the work of an international consensus group, is a widely used body of standards for rating the quality of a body of evidence in systematic reviews and guidelines for clinical, public health, and health systems questions. In the GRADE approach, RCTs are initially classified as high-quality evidence and observational studies as low-quality evidence for estimates of intervention effects (Guyatt et al., 2011). From these starting points, a body of evidence about whether an action (e.g., screening or treatment) causes an outcome may be downgraded or upgraded. Reasons for downgrading include bias, inconsistency, indirectness, imprecision, and publication bias. Reasons for upgrading include evidence of a large effect, a dose-response effect, and all plausible confounding would bias in the direction opposite of the observed effect (or lack of effect).

In recent years, GRADE guidance has been updated to suggest that the evaluation of non-randomized studies of interventions could begin with an assumption of high certainty with the use of a new tool called the Risk of Bias in Non-Randomized Studies of Interventions (ROBINS-I) (Schünemann et al., 2019; Sterne et al., 2016). The ROBINS-I tool enables reviewers to evaluate the risk of bias in estimates of comparative effectiveness (harm or benefit) in non-randomized studies of interventions and to assess the magnitude of bias in different domains, including confounding, selection, misclassification of intervention status, deviations from intended interventions, missing data, outcomes measurement, and reporting.

SELECTION OF STUDY DESIGNS

For many of the USPSTF recommendations reviewed in Table 5-1, it would be possible to address a research need using either an RCT or an observational study design. Many factors influence the selection of a study design by a sponsor or a researcher. These include the anticipated internal and external validity of the study design, and extrinsic factors including time urgency and logistical barriers (Armstrong, 2012). In addi-

Page 153 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

tion, studies designed for diverse populations may also need to consider the possibility that the intervention may have heterogeneous effects on different subpopulations. Study designs that have high per-patient costs may lead to limited sample sizes and inadequate power to identify heterogeneity of treatment effect, making study designs with lower per-patient costs more attractive if they are of adequate quality.

Limitations of Individual-Level Randomized Controlled Trials

RCTs are not practically feasible or are not the optimal approach in some situations. Examples include preventive services where:

The service is already widely accepted as standard of practice, such as lead screening for children. It would be difficult on ethical grounds to withhold standard care for a trial.
The service would likely be viewed as desirable by the average patient, which could hinder enrolling a representative group in an individual-level trial.
The service already has high-quality evidence for effectiveness in one patient group, and this evidence could be extrapolated to other patient groups. For example, abdominal aortic screening is recommended for men aged 65–75 years who have ever smoked. Given this, it could be argued that a similar trial of women of similar age and smoking history is not required.
The cost and/or burden of enrolling patients individually precludes sponsors or clinicians from supporting a trial.

Individual-level RCTs have other important limitations. The results are often not externally generalizable because inclusion and exclusion criteria tend to select patients who do not resemble patients in actual practice based on age, comorbidities, sex, race/ethnicity, or socioeconomic status. In addition, because sample sizes are usually limited by cost, individual-level trials can rarely address questions about how intervention effects may vary in different subgroups (Armstrong, 2012). Failure to identify this effect, termed treatment heterogeneity, may mean that high-risk subgroups miss the potential benefits of a preventive service if it is found ineffective in the general population. These limitations are not inherent to RCTs; they may also hinder non-randomized studies of interventions. Conversely, many pragmatic or effectiveness trials do enroll patients representative of those in practice.

Page 154 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Modern Trial Designs

Newer designs for RCTs may be useful to overcome specific limitations of individual-level trials. Many excellent books on study design and reviews of modern study methods are available and provide more comprehensive descriptions of the options. Here, the committee discusses selected designs that may be useful in clinical prevention. Examples of modern trial designs include

Cluster-randomized trials, in which the unit of intervention is a group of patients clustered based on an organizing factor such as a clinic, a medical center, or a geographic area. Such designs make it feasible to include large numbers of patients and sometimes do not require individual-level consent.
Cluster randomization may be especially useful when it would be difficult to provide different interventions to individuals within the same clinical setting. For example, a trial of mailed and telephone intervention to reduce postpartum weight retention among women with gestational diabetes mellitus used a cluster randomized design in order to leverage a centralized system for case management across the 44 medical facilities participating, and to enable consistent workflow within each medical facility.

Stepped wedge cluster randomized trials are increasingly used to evaluate interventions that involve service delivery in discrete units such as geographical regions, hospitals or clinics. In this design, each unit crosses over from control to intervention in a randomized and sequential fashion (Hemming et al., 2015). This design enables researchers to conduct rigorous evaluation within the constraints sometimes imposed by policy makers when they believe it important that every unit in a study should eventually receive the intervention.
Pragmatic RCTs are designed to test the effectiveness of interventions in real-life clinical practice settings, to maximize applicability and generalizability. They usually have less strict eligibility requirements than individual-level trials and often use cluster randomization. Many focus on outcomes that are measured routinely in the course of usual clinical care. Advantages of pragmatic trials include greater generalizability to diverse populations, ability to enroll larger numbers of patients, and lower cost. For example, a pragmatic randomized trial of one-step versus two-step screening for gestational diabetes was able to study 23,792 women and to study multiple outcomes for both the mothers and infants (Hillier et al., 2021). The trial was pragmatic in that there were few restrictions on interventions subsequent to

Page 155 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

screening. It was supported by a National Institutes of Health R01 grant, which is a modest scope for a study of this size
Adaptive trial designs may be useful for situations where an intervention is warranted but where there are multiple possible approaches and the best choice of intervention is initially unclear. In an adaptive trial, the trial protocol includes a planned schedule of assessments and the trial parameters may be modified after each assessment. The parameters that may be modified include the intervention itself, the dosage of a medication, the patient selection criteria, and the sample size. For example, the USPSTF recommends colorectal cancer screening and calls for more research to understand uptake and adherence to individual screening tests, including repeated colonoscopy after 10 years and repeated stool tests annually, as well as research on the accuracy and effectiveness of serum- and urine-based tests (“liquid biopsy” screening) and capsule endoscopy tests, which could enhance adherence. An adaptive trial design might be useful to compare several of these approaches, eliminate less-promising options early in the process, and focus research effort on those with highest potential for benefit.
Platform trials are designed to simultaneously investigate multiple interventions using specialized statistical methods (Berry et al., 2015). This type of trial has been used to study treatments in cancer, infectious diseases, and neurology and is particularly suited to study multiple interventions in heterogeneous populations that need long-term follow-up. Randomization may use adaptive approaches in response to emerging information about the effectiveness of an intervention. Platform trials could be applied to study preventive interventions where simultaneous comparison of three or more approaches in diverse populations is warranted.
Effectiveness-implementation hybrid designs were created to speed the development of evidence useful to implementation while the effectiveness of a service is still being evaluated. “Design” refers to the procedures used to select units or participants for study, assign these to intervention or usual care, and make evaluations before, during, and after study assignment (Landes et al., 2020). Hybrid design denotes that these procedures may have varying degrees of focus on both the effectiveness and implementation of a clinical intervention. The effectiveness-implementation hybrid design is not limited to RCTs. These designs have been used in many studies of behavioral interventions and are discussed in a review by researchers with the Veterans Affairs

Page 156 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Quality Enhancement Research Initiative (Landes et al., 2020). Clinical trials may sometimes be adapted in ways that specifically suit the evaluation of preventive interventions. Examples include trials that help elucidate mechanisms of why interventions have an impact, enable study of the indirect effects of a program or policy, or facilitate collaborations with implementers of preventive services (Alsan and Finkelstein, 2021). In addition, a novel preference-informed complementary trial design may help trials include patients with strong preferences who might decline to enroll in a traditionally designed trial (Ali et al., 2021).

OBSERVATIONAL STUDY DESIGNS

Observational study designs for comparative effectiveness compare a group that has been exposed to a condition or intervention (e.g., screening or treatment) with a comparison group that has not. Historically, observational studies have been plagued with biases, also referred to as confounding. These include selection bias (when the intervention and comparison groups differ in characteristics associated with the outcome of interest) and performance bias (when delivery of an intervention is associated with generally higher levels of quality of care by the health care unit) (Armstrong, 2012).

Cohort study designs are prone to problems that threaten their internal validity, including secular trend and regression to the mean. Multivariable analysis is perhaps the oldest and best-established method of adjusting for confounders that have been measured. However, specific well-known cases have underscored the limitations of cohort study designs, even when they include untreated comparison groups and multivariable analysis is applied. For example, multiple observational studies prior to the late 1990s indicated that the use of postmenopausal hormone replacement therapy was associated with substantial reduction in coronary heart disease risk (Humphrey et al., 2002). Subsequently, a large RCT of postmenopausal hormone replacement therapy (HRT) found no benefit in cardiovascular heart disease (CHD) risk (Rossouw et al., 2002). This apparent contradiction stemmed in part from many of the observational studies not having collected data on socioeconomic status, which was a key confounder of the relationship between HRT use and CHD risk.

It should be noted that in research intended to reduce health disparities for underrepresented individuals, the control group (both in RCTs and observational designs) should be selected to be as similar as possible to the intervention group. Recent articles recommend standards for reporting on race and ethnicity (Flanagin et al., 2021) and for research and publication on racial health inequities (Boyd et al., 2020).

Page 157 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

In recent years, newer study designs and analytic methods have offered more robust approaches for causal inference, the process of inferring that an action or event affects an outcome of interest (Rothman and Greenland, 2005). Modern analytic methods include

Propensity score methods, which are analytic approaches used in observational studies to balance the intervention and comparison groups on observed pre-intervention characteristics. For each study subject, the researcher generates a propensity score that reflects the individual’s likelihood of receiving the intervention based on their characteristics. Propensity scores are discussed in many review articles (Glynn et al., 2006; Luo et al., 2010; Stürmer et al., 2014) and can be used in several different ways, including adjusting the multivariable regression analysis or matching subjects who received the intervention with appropriate comparison subjects. The use of propensity scores may be especially useful in studies evaluating real-world effectiveness after a preventive service has been recommended.
Instrumental variable analysis, which uses a variable to separate study subjects into two groups, one of which is more likely to have received an intervention than another (Martens et al., 2006). It is typically used retrospectively. The use of the variable, termed the instrumental variable, attempts to mimic random assignment to the intervention vs. the control group. The requirement is that the instrumental variable be associated with the exposure or intervention of interest, but not with the outcome. One advantage of instrumental variable design is that, in theory, it controls for unmeasured confounders. One disadvantage of this approach that there are few situations where an ideal instrumental variable exists, because many prospective instrumental variables are associated with the outcome to some extent.
For example, if a behavioral intervention to prevent illicit drug use in adolescents were made available only to patients making clinic visits on Mondays, Wednesdays, and Fridays, the day of the week each clinic visit was made might be usable as an instrumental variable in an observational study, provided it were not associated with the outcomes of interest.

Quasi-experimental designs can be used to enhance the robustness of non-randomized studies of interventions (Harris et al., 2006). Examples include

Page 158 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Difference-in-difference design (nonrandomized trial with comparison group). In this design, outcomes are initially measured at a baseline time point in two similar groups. Then, one group receives the intervention while the comparison group does not. Subsequently, outcomes are measured again, usually at the same time point for each group. This design can be used prospectively (in advance of data collection) or retrospectively (on existing data). It mitigates secular trend and regression to the mean, both of which are inherent flaws in study designs that involve observing only the group that receives an intervention.
This design can be useful when it is impossible to assign multiple groups at random to intervention or control. For example, if one health care system initiated a program that focused on enhancing colorectal cancer screening through annual stool testing, its outcomes before and after the program began could be compared with a similar system without such a program.
Interrupted time-series design is often used to study changes in health care policy, such as changes in guidelines, laws, or financing mechanisms. In these situations, it is generally impossible to randomize, and the policy change usually occurs at a single point in time. If the policy change has an effect on an outcome, researchers observing the rate of the outcome over time will see a sudden change in the slope of the curve at the time of the policy change. This design is best used to study policy changes that take effect all at once rather than gradually, and can only be used when data are available from enough time points before and after the policy change. For example, if a health care financing policy changed to provide access to a preventive service that was previously unavailable to a group of patients, and if uptake of the preventive service were brisk, this design could be used to study the resulting changes in outcomes.

PREDICTIVE ANALYTICS

Most of the recent USPSTF statements the committee reviewed called for predictive analytics—risk prediction studies using tests, questionnaires, or clinical characteristics to identify patients at high risk for a condition. Research needs involving predictive analytics can be categorized as studying physical screening tests (e.g., laboratory, imaging, colonoscopy), clinical examination findings (e.g., visual examination for skin cancer), patient-reported factors on questionnaires, other clinical characteristics, or combinations of these.

Page 159 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

A rich body of literature exists on methods for evaluating test performance and setting test cutoffs. Some common methods and issues to consider include the need for prediction models to have adequate discrimination and calibration (Alba et al., 2017), and the use of receiver operating characteristic curves to identify potential test cutoff points (Poldrack et al., 2020).

The importance of predictive analytics in clinical prevention research is likely to increase in the foreseeable future due to scientific advances that create the opportunity for precision medicine and precision public health. Genomics, proteomics, and metabolomics are among the many biological fields that could provide new tools for clinical prevention by developing tests to identify patient groups at high risk of future disease. Advances in artificial intelligence have made it possible to analyze images and text and to generate computer-based predictive algorithms using myriad variables available in electronic health records. Patient monitoring technology that gathers highly detailed data, such as wearable patches for heart rhythm monitoring in atrial fibrillation, is also increasing. These trends are likely to increase the needs for predictive analytics, and for modeling studies that combine predictive analytic methods with other methods to give a comprehensive overview of the projected benefits and harms of a preventive service.

One issue that seems likely to warrant increasing attention by researchers and guideline committees is the need to conduct studies that evaluate the net incremental value of a test or prediction model (i.e., its additional value relative to existing methods of identifying patients as high risk). For example, the USPSTF statement on risk assessment of cardiovascular disease using nontraditional risk factors (including the ankle-brachial index, high-sensitivity C-reactive protein, and coronary artery calcification score) noted that studies assessing these factors in isolation are of limited value. Instead, the statement called for studies comparing traditional risk assessment with traditional risk assessment plus one or more of the newer factors, so that the incremental benefits and harms of the newer factors can be more clearly delineated. Likewise, in studies of computer-based risk prediction models, Shah et al. (2019) have called for considering the net incremental value of taking plausible actions when selecting the best model, rather than simply relying on statistical measures.

MODELING

Modeling refers to the use of tables or computer-based simulations to make projections about outcomes under varying scenarios. Models range from simple calculations presented in outcomes tables to more formal decision models. The USPSTF uses modeling to inform recommendations

Page 160 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

when there is direct evidence of the benefit of a preventive service on health outcomes, or when there is evidence for each of the linkages in an analytic framework. The USPSTF Procedure Manual notes that candidates for modeling are generally A, B, and some C recommendations, and notes that decision modeling is primarily warranted when there are outstanding clinical questions about how best to target a clinical preventive service at the individual and program level, and it is unlikely that the systematic review can confidently determine the magnitude of net benefit, particularly for subpopulations of interest (USPSTF, 2021). Thus, the USPSTF currently uses modeling in a highly focused way, to further inform the applications of letter grade recommendations to subpopulations. It would not use modeling to overturn an “insufficient evidence” statement.

Modeling has potentially important uses in research on preventive services beyond its current role in the USPSTF process. For example, a research sponsor with finite funding may wish to compare the value of investing in research on one preventive services topic compared with another, or within a given topic, compare the value of investing in studies to address one key question compared with another. Decision models could be used to project the potential findings of the proposed studies, their effects on preventive services recommendations, and the downstream effects on health outcomes. Multi-criteria decision analysis, a type of decision modeling that takes into account multiple factors of interest, could be used along with the prioritization criteria in the taxonomy presented in this report. Such models could help sponsors compare the projected costs, benefits, and perceived value of alternative investments in research.

Modeling could also be used to inform some of the challenging decisions among competing priorities in preventive care that clinicians and policy makers make in real-world practice. For example, the number of preventive services recommended during an average visit may exceed the amount of time a clinician has available to address them. In addition, many patients have comorbidities that set up competing priorities with the impetus to deliver preventive care. To present realistic projections of the impact of preventive services in actual practice, it would be helpful for models to not assume 100 percent adherence to an intervention and to account for the fact that harms and benefits data from randomized trials may need to be adjusted to reflect real-world outcomes.

Decision models are inherently imperfect and usually rely on base case assumptions about which uncertainty exists. Sensitivity analysis is important in all modeling, to understand how results may change when assumptions are varied over plausible ranges. With the appropriate caveats in mind, modeling can be helpful to elucidate the tradeoffs among costs, benefits, and harms of alternative options, to support decision mak-

Page 161 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

ing by policy makers, operational leaders of programs that deliver preventive care, and individual clinicians and patients.

RESEARCH ON GROUP DECISION PROCESSES

The USPSTF is one of many groups that develop recommendations on preventive services, and its methods are highly evidence-based and structured compared with most others. Even with its highly evolved approaches, the task force likely still has room for additional refinement in the processes used to arrive at decisions as a group. These processes could be applied, for example, to help the task force determine priorities among evidence gaps identified via the use of the taxonomy.

One example of structured group communication and decision-making processes is the Delphi method, which uses iterative questionnaires and statistical feedback to the group to develop a consensus. Other quantitative methods of group decision making include voting, weighting, ranking, scoring, and grading (Madhavan et al., 2017). Some methods, including multi-criteria decision analysis and analytic hierarchy process, enable the user to assign weights to different criteria that describe the options; this can be used for individual or group decision-making (Phelps and Madhavan, 2017). These methods are often poorly understood and deserve to be subjected to more practical research.

The USPSTF uses several types of processes where methods for group decision-making could be further tested and refined, including the selection of topics to review, the decision about a recommendation on a given topic once the evidence is reviewed, and in the future, the prioritization among evidence gaps that should be filled for a given topic. Learning more about how to best apply such methods in developing recommendations on preventive services and associated research would enhance the robustness of decisions and optimize their downstream impact.

REFERENCES

Alba, A. C., T. Agoritsas, M. Walsh, S. Hanna, A. Iorio, P. J. Devereaux, T. McGinn, and G. Guyatt. 2017. Discrimination and calibration of clinical prediction models: Users’ guides to the medical literature. JAMA 318(14):1377–1384.

Ali, S., G. Hopkin, N. Poonai, L. Richer, M. Yaskina, A. Heath, T. P. Klassen, C. McCabe, KidsCAN PERC Innovative Pediatric Clinical Trials No OUCH Study Group, and KidsCAN PERC Innovative Pediatric Clinical Trials Methods Core. 2021. Correction to: A novel preference-informed complementary trial (PICT) design for clinical trial research influenced by strong patient preferences. Trials 22(1):353.

Alsan, M. and A. N. Finkelstein. 2021. Beyond causality: Additional benefits of randomized controlled trials for improving health care delivery. The Milbank Quarterly.

Armstrong, K. 2012. Methods in comparative effectiveness research. Journal of Clinical Oncology 30(34):4208–4214.

Page 162 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Berry, S. M., J. T. Connor, and R. J. Lewis. 2015. The platform trial: An efficient strategy for evaluating multiple treatments. JAMA 313(16):1619–1620.

Boyd, R. W., E. G. Lindo, L. D. Weeks, and M. R. McLemore. 2020. On racism: A new standard for publishing on racial health inequities. Health Affairs Blog, July 2, 2020. https://www.healthaffairs.org/do/10.1377/hblog20200630.939347/full (accessed November 1, 2021).

Flanagin, A., T. Frey, S. L. Christiansen, and AMA Manual of Style Committee. 2021. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA 326(7):621–627.

Glynn, R. J., S. Schneeweiss, and T. Stürmer. 2006. Indications for propensity scores and review of their use in pharmacoepidemiology. Basic & Clinical Pharmacology & Toxicology 98(3):253–259.

Guyatt, G., A. D. Oxman, E. A. Akl, R. Kunz, G. Vist, J. Brozek, S. Norris, Y. Falck-Ytter, P. Glasziou, H. DeBeer, R. Jaeschke, D. Rind, J. Meerpohl, P. Dahm, and H. J. Schünemann. 2011. GRADE guidelines: 1. Introduction—GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology 64(4):383–394.

Harris, A. D., J. C. McGregor, E. N. Perencevich, J. P. Furuno, J. Zhu, D. E. Peterson, and J. Finkelstein. 2006. The use and interpretation of quasi-experimental studies in medical informatics. Journal of the American Medical Informatics Association 13(1):16–23.

Hemming, K., T. P. Haines, P. J. Chilton, A. J. Girling, and R. J. Lilford. 2015. The stepped wedge cluster randomised trial: Rationale, design, analysis, and reporting. BMJ 350:h391.

Hillier, T. A., K. L. Pedula, K. K. Ogasawara, K. K. Vesco, C. E. S. Oshiro, S. L. Lubarsky, and J. Van Marter. 2021. A pragmatic, randomized clinical trial of gestational diabetes screening. New England Journal of Medicine 384(10):895–904.

Humphrey, L. L., B. K. S. Chan, and H. C. Sox, Jr. 2002. Postmenopausal hormone replacement therapy and the primary prevention of cardiovascular disease. Annals of Internal Medicine 137(4):273–284.

Landes, S. J., S. A. McBain, and G. M. Curran. 2020. Reprint of: An introduction to effectiveness-implementation hybrid designs. Psychiatry Research 283(16):112630.

Luo, Z., J. C. Gardiner, and C. J. Bradley. 2010. Applying propensity score methods in medical research: Pitfalls and prospects. Medical Care Research and Review 67(5):528–554.

Madhavan, G., C. Phelps, and R. Rappuoli. 2017. Compare voting systems to improve them. Nature 541(7636):151–153.

Martens, E. P., W. R. Pestman, A. d. Boer, S. V. Belitser, and O. H. Klungel. 2006. Instrumental variables: Application and limitations. Epidemiology 17(3):260–267.

Phelps, C. E., and G. Madhavan. 2017. Using multicriteria approaches to assess the value of health care. Value Health 20(2):251–255.

Poldrack, R. A., G. Huckins, and G. Varoquaux. 2020. Establishment of best practices for evidence for prediction: A review. JAMA Psychiatry 77(5):534–540.

Rossouw, J. E., G. L. Anderson, R. L. Prentice, A. Z. LaCroix, C. Kooperberg, M. L. Stefanick, R. D. Jackson, S. A. A. Beresford, B. V. Howard, K. C. Johnson, J. M. Kotchen, J. Ockene, and Writing Group for the Women’s Health Initiative Investigators. 2002. Risks and benefits of estrogen plus progestin in healthy postmenopausal women: Principal results from the women’s health initiative randomized controlled trial. JAMA 288(3):321–333.

Rothman, K. J., and S. Greenland. 2005. Causation and causal inference in epidemiology. American Journal of Public Health 95:S144–S150.

Schünemann, H. J., C. Cuello, E. A. Akl, R. A. Mustafa, J. J. Meerpohl, K. Thayer, R. L. Morgan, G. Gartlehner, R. Kunz, S. V. Katikireddi, J. Sterne, J. P. Higgins, G. Guyatt, and G. W. Group. 2019. Grade guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. Journal of Clinical Epidemiology 111:105–114.

Page 163 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×

Shah, N. H., A. Milstein, and S. C. Bagley. 2019. Making machine learning models clinically useful. JAMA 322(14):1351–1352.

Sterne, J. A., M. A. Hernán, B. C. Reeves, J. Savović, N. D. Berkman, M. Viswanathan, D. Henry, D. G. Altman, M. T. Ansari, I. B. I., J. R. Carpenter, A. W. Chan, R. Churchill, J. J. Deeks, A. Hróbjartsson, J. Kirkham, P. J. P., Y. K. Loke, T. D. Pigott, C. R. Ramsay, D. Regidor, H. R. Rothstein, L. Sandhu, P. L. Santaguida, H. J. Schünemann, B. Shea, I. Shrier, P. Tugwell, L. Turner, J. C. Valentine, H. Waddington, E. Waters, G. A. Wells, P. F. Whiting, and J. P. Higgins. 2016. ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. BMJ 355:i4919.

Stürmer, T., R. Wyss, R. J. Glynn, and M. A. Brookhart. 2014. Propensity scores for confounder adjustment when assessing the effects of medical interventions using nonexperimental study designs. Journal of Internal Medicine 275(6):570–580.

USPSTF (U.S. Preventive Services Task Force). 2021. U.S. Preventive Services Task Force procedure manual. https://www.uspreventiveservicestaskforce.org/uspstf/about-uspstf/methods-and-processes/procedure-manual (accessed August 30, 2021).

Page 164 Cite

Suggested Citation:"Appendix B: Research Methods." National Academies of Sciences, Engineering, and Medicine. 2022. Closing Evidence Gaps in Clinical Prevention. Washington, DC: The National Academies Press. doi: 10.17226/26351.

×