Skip to main content

Currently Skimming:

3 Tests as Performance Measures
Pages 37-52

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 37...
... We begin by looking at an essential characteristic of tests themselves and then turn to review the ways that test results can be turned into performance measures that can be used with incentives. Finally, we look at the use of multiple measures in incentive systems in which there is an attempt to overcome the limitations of any single measure by using a set of comple mentary measures.
From page 38...
... Test scores will typically differ from one occasion to another even when there has been no change in a test taker's proficiency because of chance differences in the interaction of the test questions, the test taker, and the testing context. Researchers think of these fluctuations as measurement error and so treat test results as estimates of test takers' "true scores" and not as "the truth" in an absolute sense.
From page 39...
... Score Inflation Although the example of a test covering only three-quarters of a domain is hypothetical, it provides a useful way to think about what can happen if instruction shifts to focus on test preparation in response to test-based incentives. If teachers move from covering the full range of material in eighth grade mathematics to focusing specifically on the portion of the content standards included on the test, it is possible for test scores to increase while learning in the untested portions of the subject stays the same or even declines.
From page 40...
... It is an essential goal of education reform that instruction be tied to the full set of intended learning goals, not just the tested sample of knowledge, skills, and question formats. Bad or inappropriate test preparation is instruction that leads to test score gains without increasing students' mastery of the broader, intended domain, which can result from engaging in the types of inappropriate strategies discussed above.
From page 41...
... found that trends in reading and mathematics achievement on NAEP generally moved in the same positive direction as trends on state tests, although gains on NAEP tended to be smaller than those on state tests. The exception to the broad trend of rising scores on both assessments occurred in eighth grade reading, in which fewer states showed gains on NAEP than on state tests.
From page 42...
... . Fundamentally, the score inflation that results from teaching to the test is a problem with attaching incentives to performance measures that do not fully reflect desired outcomes in a domain that is broader than the test.
From page 43...
... CONSTRUCTING INDICATORS FROM TEST RESULTS Incentives are rarely attached directly to individual test scores; rather, they are usually attached to an indicator that summarizes those scores in some way. The indicators that are constructed from test scores have a crucial role in determining how the incentives operate.
From page 44...
... Research has demonstrated the effect that incentives based on performance standards can have in focusing attention on students who are near the standard. In a study that analyzed test scores before and after the introduction of Chicago's own accountability program in 1996, and before and after the introduction of the No Child Left Behind (NCLB)
From page 45...
... . Growth Models Some indicators of change look at the growth paths of individual students using longitudinal data that has multiple test scores for each student over time (see, e.g., Raudenbush, 2004)
From page 46...
... Growth indicators look at changes for individual students and provide a way of isolating the learning that occurs in a given year. Because one always expect students to be learning -- whether there is education reform or not -- growth models need to provide some sort of target to indicate what level of annual change is appropriate.
From page 47...
... Professional standards for educational testing and guidelines for using tests emphasize that important decisions should not be made on the basis of a single test score and that other relevant information should be taken into account (American Educational Research Association, Ameri can Psychological Association, and National Council on Measurement in Education, 1999; National Research Council, 1999)
From page 48...
... Adopting appropriate multiple measures is a design choice that satisfies profes sional standards and can offer a better representation of the full range of educational goals. Give the context of our focus on incentives, we are particularly interested in the possibility that a set of multiple measures may better reflect education goals and so can provide better incentives when consequences are attached to those measures.
From page 49...
... Attaching consequences to a system of multiple measures using a compensatory model provides incentives to improve overall performance; the consequences in this system focus attention on the areas where there are the most opportunities for improvement, not areas that are most in danger of failing to meet their individual targets, because there are no individual targets. Compensatory incentives are appropriate in cases where policy makers want to ensure overall performance levels across a number of areas but not where they have individual targets for each of those areas that they view as critical.
From page 50...
... Although people commonly think of high school exit exam requirements as requiring students to pass a single test, the actual requirement in many states involves additional routes to meeting the target. These multiple routes effectively create a compensatory system of multiple measures.
From page 51...
... One way of thinking about the trigger approach is that it effectively institutes a system of multiple measures in stages, incorporating addi tional measures of school performance only when the test score measures indicate a likelihood that there is a problem. The approach trades off greater reliability and validity of a system of multiple measures applied to all schools for a more detailed inspection carried out for those schools identified as possibly in trouble.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.