The National Academies Press

Currently Skimming:

4 Tests as Measurements
Pages 71-88

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 71... ... The key issue of reliability, then, is to establish that something is being mea' surer with a certain degree of consistency. The key issue of validity is to determine the nature of that something specifically, whether the test measures what it purports to mea' 71 Read the entire page →
From page 72... ... For most purposes, a more useful index than reliability is the starrdard error of measurement, which is related to the unreliability of a test. This index defines a range of likely variation, or uncertainty, around the test score similar to when public opinion polls report a margin of error of plus or minus x points. Read the entire page →
From page 73... ... Hence, what is to be valiciatec3 is not the test per se but rather the inferences cierivec3 from the test scores anc3 the actions that follow (Cronbach, 1971~. On one hand, for example, the validity of a proficiency test can be subverted by inappropriate test preparation, such as having students practice on the actual test items or teaching students testwise strategies that might increase test scores without actually improving the skills the test is intenciec3 to measure. Read the entire page →
From page 74... ... Validity is now wiclely viewer! as an integral or unified concept (American Educational Research Association et al., 1985~. Read the entire page →
From page 75... ... Note these two important points: the need for tests to assess processes in abolition to the traditional coverage of content anc3 the need to move beyond traditional professional judgment of con' tent to accrue empirical evidence that the assumed processes are actually at work (Embretson, 1983; Loevinger, 1957; Messick, 1989~. For instance, it would be desirable to have evidence that a test item intenclec3 to measure problem solving floes in fact tap those skills anc3 not just elicit a memorized solution. Read the entire page →
From page 76... ... The cor~sequer~tial aspect which corresponds most directly to the third of the three stanciarcis for test use named in Chapter 1, appropriate treatment includes evidence and rationales for evaluating the intended and unintended consequences of score interpretation and use in both the short anc3 long terms. Ideally, there should be no adverse consequences associated with bias in scoring anc3 interpretation, with unfairness in test use, or with negative effects on teaching anc3 learning (Messick, 1980, 19891. Read the entire page →
From page 77... ... What is required is a compelling argument that the available evidence justifies the test interpretation anc3 use, even though some pertinent evidence may be lacking. Va~iclity as Integrative Summary The six aspects of construct validity explained above apply to all educational anc3 psychological measurement, including performance anc3 other alternative assessments. Read the entire page →
From page 78... ... If these words or expressions are not removed from the test, then the unfair advantage could result in a lack of comparable score meaning across groups of test takers. Fairness as equitable treatment of all examiners in the testing process requires that examiners be given a comparable opportunity to ciemon Read the entire page →
From page 79... ... But the idea that fairness requires overall passing rates to be equal across groups is not generally accepted in the professional literature. This is because unequal test outcomes among groups do not in themselves signify test unfairness: tests may validly document group differences that are real and may be reflective in part of unequal opportunity to learn (as discussed above) Read the entire page →
From page 80... ... Thus, comparable validity anc3 test fairness c30 not require iclentical task conditions, but rather common construct-relevant processes, with ignorable construct-irrelevant or ancillary processes that may be different across inclivicluals anc3 groups. Such accommodations, of course, have to be justified with evidence that score meaning anc3 properties have not been unduly eroclec3 in the process. Read the entire page →
From page 81... ... This external pressure may leac3 some teachers to provide inappropriate assistance to their students before anc3 cluring the test administration or to mix-score exams. Fairness issues related to test use include relying unduly on a single score anc3 basing decisions on an uncierrepresentec3 view of the relevant Read the entire page →
From page 82... ... The most recent efforts bearing on the eclucational uses of tests include the Starrdards for Educatiorral arid Psychological Testing of the American Eclucational Research Association, the American Psychological Association, and the National Council on Measurement in Education (1985) , currently uncler revision; the Code of Fair Testing Practices ire Educatiorr (Ioint Committee on Testing Practices, 19881; Respor~sibitities of Users of Starrdardized Tests (Association for Mea 1Considerable attention has been given to developing fair selection models in the context of college admissions and job entry. Read the entire page →
From page 83... ... REFERENCES American Educational Research Association, American Psychological Association, and National Council on Measurement in Education 1985 Standards for Educational arid Psychological Testing. Washington, DC: American Psychological Association. Read the entire page →
From page 84... ... , S 1983 Construct validity: Construct representation versus nomothetic span. Read the entire page →
From page 85... ... National Council on Measurement in Education, Ad Hoc Committee on the Develop ment of a Code of Ethics 1995 Code of Professional Responsibilities ire Educational Measurement. Washington, DC: National Council on Measurement in Education. Read the entire page →
From page 87... ... Part 11 Uses of Tests to Make High-Stakes Decisions About Individuals Read the entire page →

From page 71...

... The key issue of reliability, then, is to establish that something is being mea' surer with a certain degree of consistency. The key issue of validity is to determine the nature of that something specifically, whether the test measures what it purports to mea' 71

4 Tests as Measurements Pages 71-88

4 Tests as Measurements
Pages 71-88