The National Academies Press

Currently Skimming:

5 Methodologies of Impact Evaluation
Pages 119-150

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 119... ... Thus this chapter also examines credible inference designs for cases where randomization is not possible and for projects with a small number of units -- or even a single case -- involved in the project. Some of the material in this chapter is somewhat technical, but this is necessary for this chapter to serve, as the committee hopes it will, as a guide to the design of useful and credible impact evaluations for DG missions and implementers. Read the entire page →
From page 120... ... There are three fundamental elements of sound and credible impact evaluations. First, such evaluations require measures relevant to desired project outcomes, not merely of project activity or outputs. Read the entire page →
From page 121... ... With rare exceptions, USAID evaluations and missions generally do not allocate resources to baseline and follow-up measurements on nonintervention groups. Virtually all of the USAID evaluations of which the committee is aware focus on studies of groups that received USAID DG assistance, and estimates of what would have happened in the absence of such interventions are based on assumptions and subjective judgments, rather than explicit comparisons with groups that did not receive DG assistance. Read the entire page →
From page 122... ... The only way to resolve these various possibilities would be to have taken measures of legislative activity before and after the training program for both those legislators in the program and those not in the program. While it would be most desirable to have randomly assigned legislators to take the training or not, that is not necessary for the before and after comparison measures to still yield valuable and credible information. Read the entire page →
From page 123... ... Or it might have led to cuts in funding for anticorruption programs that were, in fact, highly valuable in preventing substantial increases in corruption. This chapter discusses how best to obtain comparisons for evaluating USAID democracy assistance projects. Read the entire page →
From page 124... ... Evaluation strategies are compared and contrasted based on their methodological strengths and weaknesses, not their feasibility in the field. While a first step is taken at the end of the chapter in the direction of exploring whether the most rigorous evaluation design -- large N randomized evaluation -- is feasible for many DG projects, a more extensive treatment of this key question is reserved for the chapters that follow, when the committee presents the findings of its field studies, in which the feasibility of various impact evaluation designs is explored for current USAID DG programs with mission directors and DG staff. Read the entire page →
From page 125... ... This report focuses on how to develop impact evaluations because the committee believes that at present this is the most underutilized approach in DG program evaluations and that therefore USAID has the most to gain if it is feasible to add sound and credible impact evaluations to its portfolio of M&E activities. Second, the committee recognizes that not all projects need be, or should be, chosen for the most rigorous forms of impact evaluation. Read the entire page →
From page 126... ... Sometimes it is possible to test both aggregated and disaggregated components of a project in a single research design. This requires a sufficient number of cases to allow for multiple treatment groups. Read the entire page →
From page 127... ... This is usually understood as a question of internal validity. In a given instance, what causal effect did a specific policy intervention, X, have on a specific outcome, Y? Read the entire page →
From page 128... ... : measurable, precise, determinate, and multiple. The best research designs feature outcomes that are easily observed, that can be readily measured, where the predictions of the hypotheses guiding the intervention are precise and determinate (rather than ambiguous) Read the entire page →
From page 129... ... with other changes that might obscure causal attribution? External Validity External validity is the generalizability of the project beyond a single case. Read the entire page →
From page 130... ... With a number of sound impact evaluations of a specific type of project in several different settings, USAID would be able to learn more from its interventions, rather than rely solely on the experiences of individuals. To maximize the utility of such impact evaluations, each aspect of the research design must be carefully considered. Read the entire page →
From page 131... ... The first three issues may be understood as various approaches to "replication." If USAID is concerned about the internal validity of an impact evaluation, subsequent evaluations should replicate the original research design as closely as possible. If USAID is concerned about the external validity of an evaluation, then replications should take place in different sites. Read the entire page →
From page 132... ... However, an appropriate mix of evaluations offers better information about projects on which DG staff can create new, more effective policy. A Typology of Impact Evaluation Designs A major goal of this chapter is to identify a reasonably comprehensive, yet also concise, typology of research designs that might be used to test the causal impact of projects supported by USAID's DG office. Read the entire page →
From page 133... ... Together, these provide pre- and posttests of the policy intervention. In the large N randomized assignment design -- but only in that case -- it is possible to evaluate project outcomes even in the absence of baseline data, as shown, for example, in Hyde (2006) Read the entire page →
From page 134... ... Citations in the text to existing work on these subjects should provide further guidance for project officers and implementers, although the literature on large N randomized treatment research designs is much more developed than the literature on other subjects. Read the entire page →
From page 135... ... As discussed above, in the context of many projects in a country, gathering baseline data to evaluate the intervention in different ways, and measure other efforts, including activities and outputs would still be valuable. Another advantage of randomized assignment in large N studies is that it often is perceived as the fairest method of distributing assistance in cases where the ability of USAID to provide DG assistance is limited and cannot cover all available units. Read the entire page →
From page 136... ... The present chapter focuses mainly on the methodological reasons why the efforts needed to carry out randomized assignments for project evaluations can be worthwhile in terms of the increased confidence they provide that genuine causal relationships are being discovered and hence real project impact. Unfortunately, from the standpoint of making the most credible impact evaluations, the units chosen to receive interventions from USAID are seldom selected at random. Read the entire page →
From page 137... ... Randomized evaluations are useful for determining not only whether or not a given project/activity has had an effect but also where it appears to be most effective. To see this, consider Figure 5-1, which displays hypothetical data collected on outcomes among treatment and control groups for a particular USAID project. Read the entire page →
From page 138... ... for correcting the potential selection bias that complicates the analysis of causal effects. The "matching" research design seeks to identify units that are similar to the ones getting treatment and then comparing outcomes. For example, Heckman et al (1997) Read the entire page →
From page 139... ... As always, baseline data were critical to discovering any effect from the program. This design is useful when there is only one or a few treated units and is better than just a before-and-after analysis of a single unit since it offers a controlled comparison. Read the entire page →
From page 140... ... As long as contamination effects are not severe, the results from this sort of design may be more easily interpreted than the results from a simple split-sample research design (i.e., treating three regions and retaining the others as a control group) Read the entire page →
From page 141... ... In an example of a small N randomized evaluation, Glewwe et al (2007) used a very modest sample of 25 randomly chosen schools to evaluate the effect of the provision of textbooks on student test scores. Read the entire page →
From page 142... ... Nonetheless, the delicacy of this research design -- its extreme sensitivity to any violation of ceteris paribus assumptions -- requires the researcher to anticipate what may occur, at least through the duration of the experiment. Second, with respect to detrending the data, it is helpful if the researcher can gather information on the outcome(s) Read the entire page →
From page 143... ... Observed differences in outcome measures preand posttreatment can be interpreted as causal effects only if the evaluator can make the case that other factors were not important. Some of the strategies described above are applicable in an N = 1 comparison if the treatment can be interpreted "as if" it was manipulated (e.g., Miron 1994) Read the entire page →
From page 144... ... In most settings, worthwhile insights into project impacts can be derived from designs that include small N comparisons, as long as good baseline, outcome, and comparison group data are collected. Examples of the Use of Randomized Evaluations in Impact Evaluations of Development Assistance (Including DG Projects) Read the entire page →
From page 145... ... . In education, randomized evaluations have been used to explore the efficacy of conditional cash transfers (Schultz 2004) Read the entire page →
From page 146... ... This chapter closes with two examples of impact evaluations using randomized designs applied to DG subjects that tested commonly held programming assumptions. The first addresses the issue of corruption. Read the entire page →
From page 147... ... . Knowing whether outside assistance helps or harms CSOs is a question of vital importance, and randomized evaluations have begun to offer some preliminary evidence. Read the entire page →
From page 148... ... . These two examples serve to support a broader point: It is both possible and important to conduct randomized impact evaluations of projects designed to support DG. Read the entire page →
From page 149... ... 1996. The Economics of Eligibility Rules for a Social Program: A Study of the Job Training Partnership Act (JTPA) Read the entire page →
From page 150... ... 2003. Randomized Evaluations of Educational Programs in Developing Coun tries: Some Lessons. Read the entire page →

From page 119...

... Thus this chapter also examines credible inference designs for cases where randomization is not possible and for projects with a small number of units -- or even a single case -- involved in the project. Some of the material in this chapter is somewhat technical, but this is necessary for this chapter to serve, as the committee hopes it will, as a guide to the design of useful and credible impact evaluations for DG missions and implementers.

5 Methodologies of Impact Evaluation Pages 119-150

5 Methodologies of Impact Evaluation
Pages 119-150