Skip to main content

Currently Skimming:

4 Evidence on the Use of Test-Based Incentives
Pages 53-90

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 53...
... In our descriptions of the structure of the test-based incentive programs, we provide information about the key elements that should be considered in designing incentive systems (see Chapter 2) : who receives incentives (the targets of the incentive)
From page 54...
... ; • cross-sectional studies that compare results with and without incentive programs but with no controls for selection into the 1 Forliterature reviews that cover a broader range of related studies, see Figlio and Loeb (2010) on school accountability, Podgursky and Springer (2006)
From page 55...
... ; and • studies that focus on contrasting results for students, teachers, or schools that are immediately above or below the threshold for receiving the consequences of the incentive programs,2 including well-known studies of exit exams (e.g., Martorell, 2004; Papay et al., 2010; Reardon et al., 2010) and school incentives (e.g., Ladd and Lauen, 2009; Reback, 2008; Rouse et al., 2007)
From page 56...
... Performance Measures We used the limited information about the performance measures to code two different features related to the coverage of the measures across subjects and within subjects. For most of the incentive programs we reviewed, the performance measures included only tests, but we noted other measures if they were used.
From page 57...
... In the experimental work discussed in Chapter 2, the contrast between different conditions sometimes involved subtle differences in wording. It is plausible that most of the incentive programs we discuss could have been presented in ways that were either more positive or more negative, depending on whether those in leadership positions characterized them as supporting a shared commitment to learning or as posing an additional burden in already difficult circumstances.
From page 58...
... Example 3 is Chicago, for both the initial district-level incentives in the 1990s and the implementation of the succeeding NCLB incentives. Examples 1 and 2: Nationwide School Incentives A number of states instituted test-based incentives during the 1990s, with consequences for schools that anticipated the consequences that were implemented for all states in 2001 under NCLB (Dee and Jacob, 2007; Hanushek and Raymond, 2005)
From page 59...
... For example, North Carolina's school incentives, which were implemented in 1996 and continued alongside NCLB after 2001, are based on test score gains rather than proficiency levels and so are targeted to a broad range of performance rather than a narrow range near the proficiency cut point. Under the two different performance criteria, there were different out comes: schools facing sanctions under NCLB improved the test scores of lower performing students, while schools facing sanctions under the state program improved the test scores of both lower and higher performing students (Ladd and Lauen, 2009)
From page 60...
... The effects on eighth grade mathematics and fourth grade reading were positive, and the effect on eighth grade reading was negative; none of these other effects was statistically significant.9 The paper did not provide a formal test of the statistical significance of the subject or grade differences in the effect sizes. Over four combinations of 7 Given this generalization, the multiple studies in Lee (2008)
From page 61...
... Across three combinations of subject and grade, the average effect size associated with incentives was 0.12 (Wong, Cook, and Steiner, 2009, Table 14) .12 The effect size was statistically significant only for fourth grade mathematics (Table 13)
From page 62...
... Over six combinations of subject, grade, and private school type, there was an average effect size of 0.22 standard deviations associated with the change in public school NAEP scores by 2007 or 2009.13 Although all of the effect sizes were positive, the only one that was marginally significant was for fourth grade mathematics for Catholic private schools (Wong, Cook, and Steiner, 2009, Table 6)
From page 63...
... . Looking across students, there were generally positive effects for both lower and higher performing stu dents in mathematics; for reading, the effects occurred primarily for lower performing students (Table 3)
From page 64...
... also looked at changes in low-stakes tests in science and social studies for students in the fourth and eighth grades, finding that scores in these subjects increased after incentives were introduced. Although the increase in test scores for science and social studies was smaller than for reading and mathematics and occurred primarily with higher performing students, it was positive and so does not suggest a tradeoff between the high-stakes and low-stakes subjects.
From page 65...
... . Over four combinations of subject and grade, the average effect size was 0.00 standard deviations, evenly divided between small positive and negative effects, and none was statistically significant.16 Examples 4B and 4C: Effects on Graduate Rates Two studies looked at effects on graduation rates.
From page 66...
... The performance pay in the two incentive conditions was based on average gains in student test scores in mathematics and language, measured either for the school as a whole in the schoolwide incentives condition or for the teacher's own students in the individual teacher incentives condition. The experiment used specially designed tests that explicitly included both basic and higher order skills,20 and also included tests on science and social studies that did not receive incentives.
From page 67...
... . These reported differences were large and statistically significant in all cases, and in three cases were significantly correlated with student test scores.
From page 68...
... The spending in the resource conditions was chosen to roughly equal the spending in the incentive conditions, so the higher increases in the incentive conditions suggests that they might have been more cost effective. However, it is likely that the test scores in the incentive conditions were inflated by the attachment of the incentives while the test scores in the resource conditions were not; as a result, a valid comparison of the incentive and resource conditions cannot be made.
From page 69...
... , the proportion of stu 24 Lavy (2002) contrasted the effect of the school incentives program with the results of a program implemented in 22 high schools in which extra teachers were used to help improve performance on the bagrut tests.
From page 70...
... There were indications of increases in the proportion of students taking exams, the proportion achieving passing scores and the number of credits earned, though in the first year these increases appeared only for religious schools. Over 8 combinations of year, school type, and comparison group, the average effect of the incentives program on test scores was 0.11 standard deviations.26 Six of the effect sizes were positive and statistically significant; two were not statistically significant, of which one was positive and one was negative.
From page 71...
... , there was a weaker response, probably because the program included a wide range of schools: some were far below the 40-50 percent level where few students have a realistic chance of earning a certificate; and others were far above the 40-50 percent level, where most students would be expected to earn a certificate.29 There was evidence that the incentive programs produced changes in the behavior of teachers and students, with more 28 Angrist and Lavy (2009) referenced a number of studies on financial incentives in education that show stronger responses of females than males.
From page 72...
... . The first incentives program -- Example 9 -- provided schoolwide incentives to teachers on the basis of the students' average performance on district tests in grades 4-8 in seven subjects (Glewwe et al., 2010)
From page 73...
... Over 6 combinations of district, baseline control, and sample, the average effect size on the high-stakes tests was 0.20 standard deviations, with 4 of the effects statistically significant.31 The test score effects occurred for both lower and higher performing girls within the 30 A swith the schools in the incentive programs in India, the teachers in the programs in Kenya had a high rate of absenteeism, averaging roughly 20 percent (Glewwe et al., 2010, p.
From page 74...
... Teachers in the treatment group were eligible to receive annual bonuses of $5,000-$15,000 on the basis of a value-added measure of change in the test scores of their students on the Tennessee state mathematics test. Although the performance indicator used changes in test scores rather than a single proficiency target, we coded the performance measure within subjects as "narrow" because the Tennessee state tests used only multiple-choice questions for mathematics.32 The performance levels for receiving a bonus were set between the 85th and 95th percentiles of the districtwide distribution for the value-added measure.
From page 75...
... Over 3 years and four grades, the average effect of the incentive program was 0.04 standard deviations on the high-stakes test, which was not statistically significant (Springer et al., 2010, p.
From page 76...
... Considering the effects separately by grade, the average effect size was 0.03 for fourth grade and 0.00 for seventh grade, with each grade having two positive and two negative effects. A separate assessment of student interest and enjoyment in schoolwork did not find a statistically significant change in motivation from the program, but the measured change was negative (Table 7)
From page 77...
... There were eight TAP elementary (K-8) schools in each cohort.39 The studies analyzed changes in the test scores of the tests attached to the 38 I n the Chicago implementation of TAP, performance pay was phased in so that it was smaller during the first year of the program than it was in the second year.
From page 78...
... . The second-year study found effect sizes of 0.00 for reading and 0.02 for mathematics, neither of which was statistically significant.
From page 79...
... compared changes in outcomes in schools that adopted the AP incentive program to the changes in outcomes in schools that had chosen to adopt the program but had not yet done so because no donor had been found. The analysis measured student achievement with SAT and ACT test results, using a criterion of 1,100 on the SAT and 24 on the ACT.
From page 80...
... Types of Incentive Programs Investigated in the Literature As summarized in Tables 4-1A and 4-1B, researchers and policy makers have explored incentive programs with a relatively wide range of variation in key structural features. Across the 15 examples we analyzed, there are substantial differences in who receives incentives, the breadth of the performance measures across and within subjects that are attached to the incentives, the nature of the consequences that the program attaches to the performance level, and whether extra support is provided by the program.
From page 81...
... Effects on Student Achievement and High School Graduation and Certification We summarize the effects of the incentive programs on student achievement and high school graduation and certification in Tables 4-2 and 4-3. We discuss these effects in terms of four groupings of programs: NCLB and its predecessors, high school exit exams, programs using rewards in other countries, and programs using rewards in the United States.
From page 82...
... First, the statistically significant effects were concentrated in fourth grade math; in contrast, the results for eighth grade math and for reading for both grades were often not statistically significant and sometimes negative. Second, the highest two estimates -- 0.22 and 0.12 standard deviations -- were problematic.
From page 83...
... As with the studies on NCLB and its predecessors, the studies on foreign reward programs suggest substantial benefits of incentive programs that must be considered in light of important caveats. First, the programs in India and Israel measured achievement using the high-stakes tests attached to the incentives.
From page 84...
... incentive programs that use rewards showed average effects on achievement that ranged from −0.02 to 0.06 standard deviations (see Table 4-2)
From page 85...
... Some proposals for new models of incentive programs involve combinations of features that have not yet been tried to a significant degree, such as school-based incentives using broader performance measures and teacher incentives using sanctions related to tenure. Other proposals involve more sophisticated versions of the basic features we have described, such as the "trigger" systems discussed in Chapter 3 that use the more narrow information from tests to start an intensive school evaluation that considers a much broader range of information and then provides more focused supports to aid in school improvement.
From page 86...
... We have not considered those tradeoffs in our examination of test-based incentives, but those trade-offs are the most important costs that need to be considered by the policy makers who will decide which new incentive programs to support.
From page 87...
... aThe features related to the structure of incentive programs that should be considered when designing the programs are (1) the target for the incentives (schools, teachers, or students in these examples)
From page 88...
... NCLB 0/+ 3. Chicago pre- + 0/+/− + + +/0 NCLB Studies of High School Exit Exams 4.
From page 89...
... a Effect size is presented in standard deviation units. b Omits eighth grade reading.
From page 90...
... U.S. HS Exit −0.6% 0% 0% 33% 67% Studies of Incentive Experiments Using Rewards 6.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.