To obtain information useful for policy making through an effective evaluation requires substantial coordination, support, and data sharing among the stakeholders (e.g., the funding agency, grantees, and the evaluator).
The committee looked at evaluation as comprising three broad evaluation categories: formative, process, and summative (also referred to as outcome evaluation). These three types are described along a continuum as shown in Figure C-1.
Formative evaluations help assess the feasibility and acceptability of a program and to provide preliminary information on the program’s potential effectiveness. Formative evaluations are most often conducted in the last few years of pilot and demonstration projects. Programs in the early stages of their development may not yet be “delivering all intended services, conducting all activities, or reaching all targeted populations” as planned (Ryan et al., 2014). Formative evaluations tend to address the following types of questions:
- How is the program being implemented?
- What data are currently being collected and what needs to be collected to assess the program’s progress and success?
- To what degree are the targeted populations being reached?
- How are program services matched with client needs?
- What factors are impeding (or facilitating) the program’s implementation?
- What modifications might improve the program’s operation?
Formative evaluations can provide early insights into what is working, what is not working, and where modifications to the original program design need to be made. Formative evaluations are also used early in program development to identify relevant process and outcomes measures and the appropriate mechanisms for generating them. Ideally, such measurements should be collected prior to any subsequent process or summative evaluation.
Process evaluations are used to identify the strengths and weaknesses of an ongoing program with the primary objective being to determine how the program could be improved. Process evaluations are typically used to assess programs that have been operating for several years and are in the early- to mid-stages of development. After clearly articulating how the program is intended to work and the goals and objectives for each of its targeted populations, a process evaluation addresses the following types of questions:
- How well are the core components of the program working?
- How well is the program meeting its targeted goals for each of its targeted populations?
- To what degree are the targeted populations benefiting equitably?
- Where are the opportunities for improvement in how the program operates or how it reaches its targeted populations?
Process evaluations can be exploratory in nature and often rely on a mix of qualitative and quantitative data. Process evaluations can also address systematic hypotheses about process and specific implementation findings.
Summative or Outcomes Evaluations
Summative (or outcomes) evaluations are used to systematically assess a program’s effectiveness and, when requested, cost-effectiveness. Summative evaluations are best applied to more mature1 and fully developed programs. They typically address one or more of the following kinds of questions:
- Does the program work in terms of achieving the desired outcomes for each targeted population?
- Is the program (and not some other mechanism) responsible for the desired outcomes?
- How efficient is the program in terms of the benefits it delivers relative to its costs?
To answer such specific questions with high levels of confidence, summative evaluations typically use quasi-experimental or experimental research designs and rely on detailed quantitative measurements that have been systematically collected over time. Cost-effectiveness evaluations are one type of summative evaluation and requires additional systematic collection of appropriate cost information.
The complexity of research design and substantial data collection and analysis requirements often make summative evaluations more expensive and time consuming than formative and process counterparts. It is not uncommon for summative evaluations to take 5 years or more to design, implement, and conclude. Such efforts also require a great deal of coordination between the evaluation and the implementing agencies to assure that the appropriate measurements and sample are selected and consistently
1 In response to the passage of the Government Performance and Results Act Modernization Act of 2010, the U.S. Government Accountability Office (GAO) released “Designing Evaluations,” which it referred to as a “guide to successfully completing evaluation design tasks” for federal programs and policies. Many of the committee’s recommendations in Chapter 8 are consonant with that report, and it particularly emphasizes the importance of program maturity to choosing appropriate methodologies (GAO, 2012).
collected. Summative evaluations also require precise definitions of targeted goals and outcomes; grantees must be clear on how to interpret their outcomes and what data to provide to demonstrate success.
INFORMATION AND DATA REQUIRED TO CONDUCT COST-BENEFIT OR COST-EFFECTIVENESS ANALYSES
Because the Consolidated Appropriations Act of 2018 specifically mentioned cost-effectiveness, the committee provides a detailed discussion of cost-effectiveness and a related type of evaluation, cost benefit analysis. Cost-benefit analysis (CBA) and cost-effective analysis (CEA) are two forms of economic evaluations frequently conducted by public agencies and private organizations interested in determining the most efficient way to use their resources to achieve a stated objective. Both involve the identification, measurement, valuation, and comparison of the true economic costs and consequences (good or bad) of two or more interventions, programs, funding streams, or policies (henceforth referred to as “programs”) seeking to achieve the same clearly stated objective(s) (Drummond et al., 1998). The data requirements for conducting either a CBA or CEA analysis go beyond those required for a summative evaluation, as measures of “effectiveness” are but one component of these economic evaluations. The other is measuring costs. A key difference when conducting economic evaluations of government programs (whether delivered by the government or funded by the government) is the focus on rate of return to the community as a whole (or “population wide”), not the agency or targeted recipient alone. This population-wide focus fundamentally changes how one is expected to measure both the costs incurred and the outcomes or benefits experienced with the implementation of any individual or group of programs.
There are unique challenges to doing these types of economic evaluations, which have led many public agencies (NASEM, 2016; NICE, 2014) and even classic textbooks (Drummond et al., 2015; Levin et al., 2017) to develop new standards when conducting evaluations of “social interventions” at the population level. Those standards consistently involve a multistep process, summarized below so that the committee understands the requirements they would need to specify in order to obtain the quality of data and analysis necessary to conduct such work in the future. A deeper discussion detailing how to do this type of work can be found in Drummond et al. (2015), NASEM (2016), or Levin et al. (2017).
Step 1: Carefully define what policy or program is being evaluated and all the relevant alternatives. While many public programs involve the same terms, such as “naloxone distribution” or “treatment access,” the programs actually implemented on the ground can differ dramatically. It
is important to clearly articulate the goals or objectives of the program that is being implemented, including the conceptual model underlying the intention of how the program is expected to work, its intended targets, the mechanisms and pathways through which it will affect those targets, and the expected institutional and behavioral changes resulting from it. In addition, it is important to thoroughly describe what the full set of alternative program options are, or at a minimum what the “status quo” or basis of comparison is, so that varied approaches to the programs all have the same comparative metric.
Step 2: Articulate the question being addressed by the CBA or CEA. The specifics of the question will determine what information is required to answer it. An economic evaluation focused on the effects of providing recovery support, if nonspecific, can include a range of ways in which recovery might be supported including employment support, peer support aids, clean living assistance, and job training, while an economic evaluation focused on the impact of training and using peer support workers would have a narrower set of outcomes and costs to consider.
Step 3: Clearly identify the perspective to be taken. Evaluations of government-supported programs can be evaluated from (1) the program or agency perspective, (2) the program beneficiaries’ perspective, (3) the payer’s perspective (the taxpayer, ultimately), and (4) the societal perspective. Depending on which perspective is taken, not all stakeholders’ costs, benefits, and outcomes are considered. The broadest perspective, and the gold standard when conducting economic evaluations, is the societal perspective. This perspective includes all of the perspectives named in addition to spillover impacts to children, the elderly, and other groups that may not be included in the previous three categories, such as future generations. The specification of perspective also identifies how these costs, consequences, and benefits values should be measured, a point that will be discussed in greater detail in Step 6 below.
Step 4: Identify the type of economic evaluation to conduct. Various methods have emerged in the literature for conducting an economic evaluation, but CBA and CEA are the two most frequently applied (Drummond et al., 1998, 2015; Yates and Marra, 2017). While similar in many ways, these two methods do have important differences; the choice regarding which to use in a given situation ultimately depends on the question being asked and the available information to answer it. CEA evaluates the desirability of a specific intervention or program over a set of alternative options by assessing and comparing each option’s cost and effectiveness from the same stakeholder perspective using a singular outcome to measure effectiveness,
such as reduction in fatal or nonfatal overdoses, years of life gained, or quality adjusted life years. CEA therefore focuses on calculating a cost-effectiveness ratio, where different programs are compared in terms of their cost per unit of effect (e.g., lives saved, life years gained, and so on). CBA, on the other hand, allows the evaluator to aggregate across multiple outcomes being considered as important for effectiveness even if each program does not have an impact on all of them. It does so by monetizing the value of each outcome (i.e., converting nonfatal and fatal outcomes in terms of values of lives saved or lives extended in dollars). The monetization occurs through a set of established techniques that include both revealed preference approaches (which base value from observed market behavior) or stated preference approaches (which acquire values through survey responses to hypothetical situations). The focus in CBA is on calculating the net benefit of one program over another program or the status quo, where net benefits are calculated as the difference between the present discounted value of benefits and costs of each.
Step 5: Determine time horizon and discount rate. Some government programs stay in effect for decades, while others are short lived. Thus, a critical component of any CEA or CBA analysis is the identification of the proper time horizon for comparing policy options in terms of their costs, benefits, and outcomes. This is important because the timing of when costs and benefits accrue can differ substantially across programs, making one program look more advantageous when evaluated over a shorter time horizon (those with immediate benefits, and low initial costs) and another program more advantageous when evaluated over a longer time horizon (those with higher startup costs, but big benefits later). Choice of discount rate is also important. Discounting adjusts for the fact that costs and benefits can occur at different times and reflects both personal preferences and financial realities that most people are impatient and present-oriented and would prefer to consume goods and services today. Goods and services received today are of greater value than those in the future and discounting reflects these trade-offs (Claxton et al., 2019; Drummond et al., 2015). There is much debate on what the proper discount rate is for societal goods, but the current recommendation by the U.S. Office of Management and Budget (OMB) is that U.S. agencies apply both a 3 percent discount rate and a 7 percent discount rate for a public good that benefits society, and then assess the sensitivity of findings to these rates (OMB, 2003)
Step 6: Identify and quantify outcomes, costs, and benefits. Identification of relevant outcomes to consider in an evaluation is a function of the program being implemented, its targets, and the perspective being taken, steps
we have already discussed above. Doing so is not always easy when looking at programs that affect broad populations, due to the potential for intended and unintended spillover effects on individuals who are not necessarily the target of the program being implemented. Thus, identification of where to draw the line on which spillover effects to include or exclude can be tricky. The quantification of those outcomes, whether in natural units or monetary units, can be even harder. Tangible resources, such as personnel, supplies, technology, and services, are the easiest to identify but can still create challenges for analysts assigning unit costs when market prices do not reflect the true opportunity cost of those resources (e.g., donated time as a volunteer). Intangible resources are particularly difficult to measure. The cost of pain and suffering from struggling with addiction and/or losing a loved one to an opioid overdose are difficult to measure. The value of intangible resources (lost life, lost feeling of safety and security) usually represents the largest share of total costs or benefits in an economic valuation, so their inclusion and method of calculation are important. A variety of methods have emerged to generate proxy prices for them, each with their own strengths and weaknesses (Boardman et al., 2018; Drummond et al., 2015).
Step 7: Assess effectiveness of all program alternatives being considered. Effectiveness estimates of programs are typically obtained through original research, which can be conducted on small pilots, a randomized controlled trial (RCT), or a quasi-experiment conducted in conjunction with the change in the policy or the implementation of the programs. Several technical issues can make the valid assessment of effectiveness difficult, including (1) obtaining the baseline data (or counterfactual) in which to understand what the effectiveness would be without the program in the first place; (2) obtaining clearly defined and otherwise comparable treated and untreated participants, so that the effects of a program can be evaluated in a manner that is unaffected by other secular trends that might also have influenced the outcome; (3) determining the time period over which additional data need to be collected to identify the program’s effects, which could take a matter of months or years. Often analysts collect data in the relative short term (6 or 12 months) and then have to extrapolate longer-term effects 5 or 10 years later. An analyst cannot simply assume that effects observed over short periods will persist for longer periods; and (4) the collection of high-quality data on all relevant outcomes (not just those easily available). Factors influencing the quality of data include the data generating process, the suppression of certain cases, participants or jurisdictions due to lack of reporting from them, and the extent to which the available data truly reflect the outcome of interest. Often models are used to assist analysts in projecting findings from measured outcomes to those of unmeasured, but relevant, ones.
Step 8: Conduct sensitivity analyses, examine primary drivers of uncertainty. Uncertainty is expected in any evaluation and can arise from many sources, including uncertainty regarding the degree of implementation, enforcement or compliance with an adopted program, uncertainty regarding estimates of effectiveness on outcomes, uncertainty with respect to market valuations of benefits and costs (both now and in the future), uncertainty associated with assumptions regarding the persistence or decay of program effects over time, and uncertainty associated with models (data and parameters) used to determine presumed effectiveness of a program on outcomes beyond those measured by the study. A careful CBA or CEA does not eliminate uncertainty, but succinctly and clearly articulates the extent to which this uncertainty matters for decision-making by end users of the information.
Those analysts applying current best practices include sensitivity analyses, which involve the recalculation of CBA or CEA varying specific underlying assumptions presumed in the “base case” (Briggs et al., 1994; Crowley et al., 2018; Drummond et al., 2015). Findings from repeated iterations of these sensitivity analyses can then be summarized using various techniques. In CBA, analysts typically summarize uncertainty using the proportion of Monte Carlo trials conducted that yield a positive net benefit calculation, because only programs with a positive net benefit would be recommended (NASEM, 2016; Vining and Weimer, 2010). In CEA, cost-effectiveness acceptability curves, which summarize in a single graph the uncertainty associated with any single CEA calculation as well as the threshold value that any particular CEA may be trying to achieve, are commonly used (Boardman et al., 2018; Polsky et al., 1997).
Step 9: Apply decision rule criteria under all plausible uncertainty from Step 8 and clearly report results. Depending on the question being asked and the type of economic evaluation undertaken, different summary measures can be generated, and whether a policy or program is a good investment will depend on the summary measure used. For CBA, when using net benefit as the primary decision rule, the reference decision point may be a net benefit > 0, if comparing a program to the status quo, or it may be a minimal positive value if comparing across programs. For CEA, summary measures include cost-effectiveness ratios identifying the cost per outcome gained, which may or may not fall beneath a specified threshold (willingness to pay to obtain that outcome). Including measures that summarize the overall uncertainty regarding the metrics used is also recommended.
The purpose of CBA and CEA is to assess efficiency—identifying which programs or alternatives generate the highest societal value given the resources involved in implementing them. The ultimate metric of a CEA or CBA is based on the net overall outcome, not an even or fair distribution of benefits and costs. Equity is not a consideration in these analyses, and yet may be an important consideration for policy makers. Thus, some recent guidelines on how to conduct CEA and CBA recommend subpopulation analyses that focus on particularly vulnerable populations so that policy makers can see what sort of disproportional impacts might be experienced by these groups (WHO, 2006; Wilkinson et al., 2016). Such considerations can be made explicit in requests for such studies going forward.
Boardman, A. E., D. H. Greenberg, A. R. Vining, and D. L. Weimer. 2018. Cost-Benefit Analysis: Concepts and Practice. Cambridge University Press, 5th edition.
Briggs, A., M. Sculpher, and M. Buxton. 1994. Uncertainty in the economic evaluation of health care technologies: The role of sensitivity analysis. Health Economics 3:95–104. https://doi.org/10.1002/hec.4730030206.
Claxton, K., M. Asaria, C. Chansa, J. Jamison, J. Lomas, J. Ochalek, and M. Paulden. 2019. Accounting for timing when assessing health-related policies. Journal of Benefit-Cost Analysis 10(S1):73–105. https://doi.org/10.1017/bca.2018.29.
Crowley, D. M., K. A. Dodge, W. S. Barnett, P. Corso, S. Duffy, P. Graham, M. Greenberg, R. Haskins, L. Hill, D. E. Jones, L. A. Karoly, M. R. Kuklinski, and R. Plotnick. 2018. Standards of evidence for conducting and reporting economic evaluations in prevention science. Prevention Science 19(3):366–390.
Drummond, M. F., B. O’Brien, G. L. Stoddart, and G. W. Torrance. 1998. Methods for the economic evaluation of health care programmes. American Journal of Preventive Medicine 14(3):243.
Drummond, M. F., M. J. Sculpher, K. Claxton G. L. Stoddard, and G. W. Torrance. 2015. Methods for the Economic Evaluation of Health Care Programmes (4th ed.). Oxford, UK: Oxford University Press.
GAO (U.S. Government Accountability Office). 2012. Designing evaluations: 2012 revision (Supersedes PEMD-10.1.4). https://www.gao.gov/products/gao-12-208g (accessed December 8, 2022).
Levin, H. M., P. J. McEwan, C. R. Belfield, A. B. Bowden, and R. D. Shand. 2017. Economic Evaluation in Education: Cost-Effectiveness and Benefit-Cost Analysis (3rd ed.). Los Angeles: Sage.
NASEM (National Academies of Sciences, Engineering, and Medicine). 2016. Advancing the power of economic evidence to inform investments in children, youth, and families. Washington, DC: The National Academies Press. https://doi.org/10.17226/23481.
NICE (National Institute for Health and Care Excellence). 2014. Developing NICE guidelines: The manual. Process and methods (PMG20). London: National Institute for Health and Care Excellence. Last updated: 18 January 2022. https://www.nice.org.uk/process/pmg20/chapter/incorporating-economic-evaluation (accessed June 20, 2022).
OMB (Office of Management and Budget). 2003. Circular A-4: Regulatory Analysis. https://obamawhitehouse.archives.gov/sites/default/files/omb/assets/omb/circulars/a004/a-4.pdf (accessed January 26, 2020).
Polsky, D., H. A. Glick, R. Willke, and K. Schulman. 1997. Confidence intervals for cost–effectiveness ratios: A comparison of four methods. Health Economics 6(3):243–252.
Ryan G. W., C. M. Farmer, D. M. Adamson, and R. M. Weinick. 2014. Assessing whether a program is working well. A Program Manager’s Guide for Program Improvement in Ongoing Psychological Health and Traumatic Brain Injury Programs: The RAND Toolkit, Volume 4. RAND Corporation. https://www.jstor.org/stable/10.7249/j.ctt5vjw69.9 (accessed May 18, 2023).
Vining, A., and D. L. Weimer. 2010. An assessment of important issues concerning the application of benefit-cost analysis to social policy. Journal of Benefit-Cost Analysis 1(1) Article 6. https://doi.org/10.2202/2152-2812.101
WHO (World Health Organization). 2006. Guidelines for Conducting Cost-Benefit Analysis of Household Energy and Health Interventions. G. Hutton and E. Rehfuess (Eds.). Geneva, Switzerland: World Health Organization. https://apps.who.int/iris/handle/10665/43570 (accessed December 18, 2021).
Wilkinson, T., M. J. Sculpher, K. Claxton, P. Revill, A. Briggs, J. A. Cairns, Y. Teerawattananon, E. Asfaw, R. Lopert, A.J. Culyer, and D. G. Walker. 2016. The international decision support initiative reference case for economic evaluation: An aid to thought. Value in Health 19(8):921–928.
Yates, B. T., and M. Marra. 2017. Introduction: Social return on investment (SROI). Evaluation and Program Planning 64:95–97.