Skip to main content

Currently Skimming:

3 Measurement Issues
Pages 27-40

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 27...
... , the most sophisticated statistical analysis in the world still will not yield good estimates of value-added unless it is appropriate to attach zero weight to learning that is not covered by the test." As Mark Reckase, an educational testing expert noted, even the educational measurement literature on value-added models "makes little mention of the measurement requirements for using the models. For example, a summary of value-added research published by the American Educational Research Association (Zurawsky, 2004)
From page 28...
... More collaborative, crossdisciplinary work between VAM researchers from the disciplines of economics, statistics, and educational measurement will also be needed to resolve some of the difficult technical challenges. The papers on value-added measurement issues that were prepared for the workshop consistently raised issues related to what tests mea sure, error associated with test scores, complexities of measuring growth, and the score scales that are used to report the results from tests.
From page 29...
... It is not yet clear how important these concerns are in practice when using value-added modeling. If two schools have similar students ini tially, but one produces students with better test scores, it will have a higher measured value-added regardless of the scale chosen.
From page 30...
... If this narrowing is severe, and if the test does not cover the most important educational goals from state content standards in sufficient breadth or depth, then the value-added results will offer limited or even misleading information about the effec tiveness of schools, teachers, or programs. For example, if a state's science standards emphasize scientific inquiry as an important goal, but the state test primarily assesses recall of science facts, then the test results are not an appropriate basis for using value-added models to estimate the effectiveness of science teachers with respect to the most valued educational goals.
From page 31...
... . Measurement Error Despite all the efforts that test developers devote to creating tests that accurately measure a student's knowledge and skills, all test scores are susceptible to measurement error.
From page 32...
... Thus, gain scores can be less reliable than either of the scores that were used to compute them. However, some researchers have argued that this simple logic does not necessarily mean that one should abandon gain scores altogether (Rogosa and Willett, 1983)
From page 33...
... It is clear that a number of scales that are used to report test scores, such as percentile ranks or grade-equivalent scores, are not equal-interval scales. Floor and ceiling effects also militate against the equal interval property.4 Scales developed using item response theory (IRT, a psychometric theory currently used to score most standardized tests)
From page 34...
... in the context of value-added analysis, which typically demands score scales that extend over several grades. Such scales are constructed through a procedure called "vertical linking." Vertical Scales Reckase explained that when the left side of the model (the crite rion)
From page 35...
... Data from the responses to the questions that are common from one grade to the next are then used to construct the vertical scale. However, as noted above, the validity of the inferences based on the analysis of test data represented on a verti cal scale depends in part on how closely the vertical scale satisfies the equal-interval scale criterion.
From page 36...
... The approaches differed with respect to the IRT model used, the method used to estimate student scale scores, and the IRT calibration method used to place items from the different grades on the vertical scale. Although the estimated school effects from the value-added analyses were highly correlated for the eight vertical scales, the estimated school effects differed for the different scales.
From page 37...
... Kolen made a similar point regarding the development of vertically scaled tests. If vertical scales are to become more widely used in the future, he argued that content standards will need to be better articulated within and across grades to lend themselves to measuring growth and vertical scaling.
From page 38...
... This discussion suggests that in order to make value-added models more useful, improved content standards are needed that lay out developmental pathways of learning and highlight critical transitions; tests could then be aligned to such developmental standards. This would improve all models that use prior test scores to predict current performance and would be particularly helpful for those that measure growth using gain scores.
From page 39...
... However, the vertical scale issues and the equal interval assumption are more specific to VAM applications. As far as measurement error, Linn said, "I guess one difference is that the VAM has this kind of scientific aura about it, and so it's taken to be more precise." According to Kolen, there are several critical questions: Are estimated teacher and school effects largely due to idiosyncrasies of statistical methods, measurement error, the particular test examined, and the scales used?


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.