Skip to main content

Currently Skimming:

Item Quality and Readiness
Pages 21-43

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 21...
... Our review of items in relatively early stages of development suggested that considerable improvement was possible, anc3 the contractor's plans called for procedures that macie further improvements likely. This review of VNT items initially aciciressec3 two general questions related to item quality: i.
From page 22...
... In abolition, committee anc3 staff members examined item folders at the contractor's facility anc3 reviewed information on item status provided by AIR in April. During our April meeting, committee members anc3 a panel of aciclitional reacling anc3 mathematics assessment experts reviewed anc3 rated samples of 120 mathematics items anc3 90 reacling items.
From page 23...
... llem Status as of April 1 999 The VNT Phase ~ evaluation report suggested a need for better item tracking information. At our February 1999 workshop, the contractor presented plans for an improved item status tracking system (American Institutes for Research, 1999f)
From page 24...
... EVALUATION OF THE VOLUNTARY NATIONAL TESTS, YEAR 2 Item Content Formata Strand Awaiting Awaiting Awaiting Total Needed Fully NAGB Ach. Level In 1999 Scoring Items Items for Pilot Ready Review Matching Cog Labs Edits Written Needed ECR Algebra and functions18 1 0 0 0 67 11 Geometry and spatial sense18 0 1 0 3 48 10 OtherNone 1 1 0 5 1320 0 Subtotal36 2 2 0 8 2335 21 S CR/3 points Algebra and functions18 6 1 0 4 1526 0 Data analysis, statistics, and probability18 1 5 0 11 825 0 Geometry and spatial sense18 0 2 0 8 1626 0 Measurement18 8 10 1 13 941 0 Number36 7 10 1 11 1443 0 Subtotal108 22 28 2 47 62161 0 SCR/2 points Algebra and functions18 1 1 0 1 14 14 Data analysis, statistics, and probability18 0 6 0 2 19 9 Geometry and spatial sense18 2 4 0 4 717 1 MeasurementNone 2 4 0 4 111 0 Number18 1 2 0 3 17 11 Subtotal72 6 17 0 14 1148 35 GR Algebra and functionsNone 1 7 1 1 010 0 Data analysis, statistics, and probability18 6 21 2 4 033 0 Geometry and spatial sense18 5 21 1 2 029 0 Measurement36 0 14 5 7 026 10 Number36 5 25 1 3 034 2 Subtotal108 17 88 10 17 0132 12 MC Algebra and functions198 26 99 15 4 0144 54 Data analysis, statistics, and probability108 11 71 1 4 087 21 Geometry and spatial sense126 38 64 1 5 0108 18 Measurement126 11 137 1 8 0157 0 Number198 46 222 1 13 0282 0 Subtotal756 132 593 19 34 0778 93 Total 1,080 179 728 31 120 961,154 161 aECR = extended constructed response; SCR = short constructed response; GR = "ridded; MC = multiple choice.
From page 25...
... Two additional short information passages are classed as medium information due to length, but they have no pairing nor intertextual items. CMedium information entries are passage pairs plus intertextual questions.
From page 26...
... Other issues, most notably passage length limits, had not been fully resolved as this report is being completecl, but further changes in the item anc3 test specifications appear unlikely. Mathematics New information on the status of the mathematics items was received in July.
From page 27...
... These items are still on the current version of the file, but it is unclear whether they have been reviewed again Of the i,355 active mathematics items in the July file, 179 were fully complete anc3 i,176 items required further review. At the August 1999 NAGB meeting, the contractor inclicatec3 that i,100 mathematics items would be reviewed by NAGB's appropriate subject- area committee between September anc3 November of 1999.
From page 28...
... Algebra and Function18 26 41 0 Data, Statistics, Probability18 25 33 0 Geometry and Spatial18 26 33 0 Measurement18 41 4 6 0 Number3 6 43 44 0 Subtotal108 161 197 0 scRb (2 points) Algebra and Function18 4 19 0 Data, Statistics, Probability36 9 46 0 Geometry and Spatial18 17 46 0 Measurement18 1 1 3 9 0 Number3 6 7 4 5 0 Subtotal126 48 l95b 0 GRb MC Algebra and FunctionNone 100 0 Data, Statistics, ProbabilityNone 330 0 Geometry and SpatialNone 290 0 MeasurementNone 2 60 0 NumberNone 3 40 0 SubtotalNone 13 20 0 Algebra and Function180 144199 0 Data, Statistics, Probability108 87113 0 Geometry and Spatial162 108119 43 Measurement162 157185 0 Number198 282307 0 Subtotal810 778923 43 Total1,080 1,1541,355 60 aECR = extended constructed response; SCR = short constructed response; GR = "ridded; MC = multiple choice ball of the "ridded items were combined with the 2-point SCR items.
From page 29...
... 12 1715 2b Second Medium Information Pairs (2nd of 2) 12 1715 2 Long Information 0 60 6c Total Passages 72 10885 23 aWord count exceeds the limit bToo few extended constructed response or multiple choice items CItems previously classified as "long literary" passages TABLE 3~6 Reading Items by Stance and Format (as of July 1999)
From page 30...
... Given that the number of mathematics items in some categories is aIreacly less than 18 times the number specified for each form, it is unlikely that each of the pilot test forms will exactly match the specifications for operational VNT forms, unless some items are incluciec3 in multiple pilot test forms. In Chapter 4, we raise a question of whether aciclitional extenciec3 constructec3-response items shouic3 be incluciec3 in the pilot test.
From page 31...
... The items selected for review are a large anc3 representative sample of VNT items that were then really or nearly really for pilot testing, but they JO not represent the balance of the current VNT items, which are still uncier development. Exper' Pane' Our overall conclusions about item quality are based primarily on ratings proviciec3 by panels of five mathematics experts anc3 six reacling experts with a variety of backgrounds anc3 perspectives, including classroom teachers, test developers, anc3 disciplinary experts from academic institutions: TABLE 3~7 Items for Quality Evaluation by Completion Status Current Item Status (Completeness)
From page 32...
... Comparison Sample of NAEP /tems In addition to the sampled VNT items, we identified a supplemental sample of released NAEP 4th' gracie reacling anc3 8th~gracie mathematics items for inclusion in the rating process, for two reasons. First, content experts will nearly always have suggestions for ways items might be improved.
From page 33...
... For mathematics, each booklet containec] three sets of common VNT items, targetec]
From page 34...
... 0 17 TQ Text quality 14 1 VOC Vocabulary: difficulty 0 3 aUsed for VNT items only. bUsed only on two NAEP items.
From page 35...
... For both reacling anc3 mathematics items, about 10 percent of the VNT items hac3 average ratings that inclicatec3 serious problems. The proportion of NAEP items jucigec3 to have similarly serious problems was higher for mathematics (23 percent)
From page 36...
... Specific Comments The expert raters used specific comment codes to indicate the nature of the minor or major edits that were needed for items rated as less than fully ready (see Hoffman and Thacker, 1999~. For both reading and math items, the most frequent comment overall, particularly for items judged to require minor edits, was "distracter quality" for both NAEP and VNT items.
From page 37...
... For reading items, the next most frequent code was "too literal," meaning that the item did not really test whether the student understood the material, only whether he or she could find a specific text string within the passage. Conclusions and Recommendations With the data from item quality rating panels and other information provided to the committee by NAGB and AIR, the committee reached a number of conclusions about current item quality and about the item development and review process.
From page 38...
... The recommendation was repeated in the final Phase ~ report (National Research Council, 1999b:341: "NAGB anc3 the development contractor should monitor summary information on available items by content anc3 format categories anc3 by match to NAEP achievement-level descriptions to assure the availability of sufficient quantities of items in each category." Although the initial recommendation was linked to concerns about accuracy at different score levels, the Phase ~ report was also concerned about the content validity of achievement-level reporting for the VNT. All operational VNT items would be released after each administration, anc3 if some items appeared to measure knowledge anc3 skills not covered by the achievement-level descriptions, the credibility of the test would suffer.
From page 39...
... In an effort to begin to address the content validity concerns about congruence of item content and the achievement- level descriptions, we had our reacling anc3 math experts conduct an aciclitional item rating exercise. After the item quality ratings, they matchec3 the content of a sample of VNT items to descriptions of the skills anc3 knowledge required for basic, proficient, or acivancec3 performance.
From page 40...
... Rather, the committee focused on whether the content of the VNT items appeared to match the descriptions cievelopec3 by NAGB for reporting results by achievement levels. Conclusions and Recommenclations In reviewing efforts by NAGB anc3 its contractor to match VNT items to NAEP achievement~level descriptions, the committee's overall conclusion is that these efforts have been helpful in ensuring a reasonable distribution of item difficulty for the pilot test item pool, but they have not yet to begun to
From page 41...
... Both the personalization of the results anc3 the availability of the test items suggest very high levels of scrutiny anc3 the consequent need to ensure that the achievement- level descriptions are clear anc3 that the incliviclual items are closely tied to them. RECOMMENDATION 3.4 The contractor should continue to refine the achievement level matching process to include the alignment of item content to achievement level de scriptions, as well as the alignment of item difficulty to the achievement {eve!
From page 42...
... The descriptions of achievement levels should reflect the widely accepted interactive model of reacling. DOMAIN COVERAGE A very key question about the quality of the VNT items, in abolition to their incliviclual fit to the test frameworks, is whether in the aggregate they cover completely the intenciec3 frameworks.
From page 43...
... Although the ratings of the VNT items anc3 NAEP were generally similar, 14.5 percent of the panelists' comments were coclec3 as "too literal," while none of the NAEP items were copied this way. The majority of items for the stances labeled "reacier/text connection" anc3 "critical stance" were rated as involving at least some difficulty (67% anc3 59%, respectively; see Hoffman and Thacker, 1999:Table 141.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.