Skip to main content

Currently Skimming:

Technical Issues in Test Development
Pages 44-57

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 44...
... This final report includes an expanciec3 discussion of the first three topics as well as the committee's findings anc3 recommendations on issues associated with linking VNT scores to the NAEP scale. The committee reviewed the following documents: Linking the Voluntary National Tests with NAEP anc3 TIMSS: Design anc3 Analysis Plans (American Institutes for Research, 1998g)
From page 45...
... . PILOT TEST PLANS Forms Design Key features of the pilot test forms design are the use of school clusters, the use of hybrid forms, NAEP anchor forms, anc3 item calibration procedures.
From page 46...
... However, as the number of school clusters increases, creating equivalent school clusters becomes more difficult, anc3 the analyses become more complex. We conclude that the choice of four school clusters is a good compromise between the need to minimize item exposure anc3 the need to produce accurate item parameters.
From page 47...
... To facilitate linking the two assessments, the most recent version of the pilot test design calls for the inclusion of NAEP item blocks in two of the four school clusters. The proposed item calibration plan calls for the estimation of NAEP item parameters along with VNT item parameters and, thus it implicitly assumes that NAEP and VNT measure the same content constructs.
From page 48...
... An advantage of the Stocking- Lord TCC procedure over the two- stage calibration anc3 linking procedure is that the multiple sets of parameter estimates for the anchor forms can be used to provide a check on model fit. Consequently, we suggest that the contractor select the calibration procedure that is best suited to the final data collection design, is compatible with software limitations, anc3 permits item- fit analyses.
From page 49...
... We concur with the cieveloper's judgment that the overall number of items to be pilot tested appears reasonable anc3 repeat our hope that further edits of clistractor quality will reduce the number of items ciroppec3 after pilot testing. RECOMMENDATION 4.2 Information regarding expected item survival rates from pilot to field test should be stated explicitly, and NAGB should consider pilot testing additional constructed response items, given the likelihood of greater rates of problems with these types of items than with multiple choice items.
From page 50...
... The contractor has proposed to use the Mantel- Hanszel method for the pilot test data and methods based on item response theory for the field test data. The sampling plan will allow for comparisons based on race/ ethnicity (African Americans and whites, Hispanics and whites)
From page 51...
... Item difficulty targets speak to the expected difficulty of the test forms. The test information function provides estimates of the expected accuracy of a test form at different score levels.
From page 53...
... As measured by NAEP, in 1998, 38 percent of children in public schools were below the basic level in reacling at gracie 4; similarly, in 1996, 39 percent were below the basic level in mathematics at gracle 8. Among specific populations within the United States, the numbers are much larger: 64 percent of African American students, 60 percent of Hispanic students, anc3 53 percent of Native American students were below the basic achievement level at gracle 4 in reacting; also, 45 percent of students in central- city schools anc3 58 percent of students eligible for free or reclucec3- price lunches were below the basic level in reacling at gracie 4 (Donahue et al., 19991.
From page 54...
... (2) Reporting individual student scores from the full array of state and commercial achievement tests on the NAEP scale and transforming individual scores on these various tests and assess' meets into the NAEP achievement levels are not feasible.
From page 55...
... the VNT anc3 NAEP measure the NAEP constructs as established empirically; anc3 3) the content of the VNT can support NAEP achievement levels descriptions." The report states that if these requirements are met, VNT scores could be interpreted directly as estimates of NAEP scores, anc3 NAEP achievement-level descriptions could be used to help interpret VNT achievement-level estimates.
From page 56...
... could then be accomplished through a simultaneous TRT scaling of the VNT and short~form NAEP items, through separate TRT scalings followed by a Stocking-Lord test characteristic function transformation, or through an equipercentile matching of the raw score distributions. Achievement~level cutpoints on the short~form NAEP scale could be obtained by using the judge mental proportions correct from the achievement~level setting used for the main NAEP.
From page 57...
... It would also expedite the actual score reporting process following the field test. RECOMMENDATION 4.6 Plans for the VNT pilot test should include efforts to gather empirical data on the effects of content, administration, and use differences between the VNT and NAEP on the feasibility of linking VNT scores to the NAEP score scale.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.