The National Academies Press

Currently Skimming:

4 Item Development
Pages 37-44

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 37... ... CURRENT COSTS Test item development for NAEP is expensive. The costs for item creation and review range from $1,000 to $2,500 for selected-response items, from $1,500 to $3,500 for constructed-response items, and from $6,000 to $20,000 for scenario-based task items.1 With a typical distribution across these three types and taking the midpoint of the ranges, average per-item costs for creation and review are about $3,700.2 1 NCES response to Q68a. Read the entire page →
From page 38... ... In addition, sometimes extra items need to be developed, which can be required, for example, when a new framework requires a new type of item or area or content that was not previously covered.6 Over the next few years, a somewhat higher proportion of new items may be required, if the items in long-term trend NAEP are updated in its transition to digital administration and if the scheduled framework updates the three types of items (roughly the midpoints of the three ranges) produces the weighted average item creation and review cost of $3,700. Read the entire page →
From page 39... ... NCES reports that there are other activities in the item development contract, including "preparation work prior to, during and after operational administration (e.g., Block Assembly) , translating assessment content for the Bilingual accommodations and the mathematics Puerto Rico assessment, survey questionnaire development, Alliance-wide collaboration and planning, NAEP Integrated Management Systems (IMS) Read the entire page →
From page 40... ... Although NAEP includes some traditional selected-response items for which automatic item generation might be applied, those items are more prevalent in long-term trend NAEP, where new items are not generally created. Main NAEP, where new items are needed, often uses more complex item types, which are less amenable to automatic item generation. Read the entire page →
From page 41... ... Among them are the ideas of drawing from the detailed achievement-level descriptions to specify intended inferences and claims; better integrating the work of the experts who create NAEP frameworks with the experts who write items (as noted in Chapter 3) ; and applying many of the quality control processes to standardized item models instead of individual items to reduce review and pilot testing costs. Read the entire page →
From page 42... ... NAEP's use of automated processes of item generation could then evolve as the state of the art in automatic item generation evolves. RECOMMENDATION 4-2: The National Assessment Governing Board and the National Center for Education Statistics should move toward using more structured processes for item development to both 16 For examples of the detailed achievement-level descriptions, see, for mathematics, https:// nces.ed.gov/nationsreportcard/mathematics/achieve.aspx; for science, https://nces.ed.gov/ nationsreportcard/science/achieve.aspx; and for reading, https://nces.ed.gov/nationsreportcard/ reading/achieve.aspx. Read the entire page →
From page 43... ... This alignment can be seen in the 4th-grade science item map, where seven of the eight items listed as above the NAEP advanced cut scores are constructed-response items.18 Despite this association between item types and the cognitive level and content of the items, the relation is not exact. As is often pointed out, selected-response or simpler constructed-response items can be used to assess cognitively complex material, even though there are many examples when this is not the case.19 It is important to consider the full range of item types that can potentially be used to assess the different cognitive and content areas specified in the frameworks, rather than focusing on particular item types in the abstract. Read the entire page →
From page 44... ... The costs considered should include item development (both item creation and pilot administration) , administra tion time, and scoring. Read the entire page →

From page 37...

... CURRENT COSTS Test item development for NAEP is expensive. The costs for item creation and review range from $1,000 to $2,500 for selected-response items, from $1,500 to $3,500 for constructed-response items, and from $6,000 to $20,000 for scenario-based task items.1 With a typical distribution across these three types and taking the midpoint of the ranges, average per-item costs for creation and review are about $3,700.2 1 NCES response to Q68a.

Read the entire page →

From page 38...

... In addition, sometimes extra items need to be developed, which can be required, for example, when a new framework requires a new type of item or area or content that was not previously covered.6 Over the next few years, a somewhat higher proportion of new items may be required, if the items in long-term trend NAEP are updated in its transition to digital administration and if the scheduled framework updates the three types of items (roughly the midpoints of the three ranges) produces the weighted average item creation and review cost of $3,700.

Read the entire page →

From page 39...

... NCES reports that there are other activities in the item development contract, including "preparation work prior to, during and after operational administration (e.g., Block Assembly) , translating assessment content for the Bilingual accommodations and the mathematics Puerto Rico assessment, survey questionnaire development, Alliance-wide collaboration and planning, NAEP Integrated Management Systems (IMS)

Read the entire page →

From page 40...

... Although NAEP includes some traditional selected-response items for which automatic item generation might be applied, those items are more prevalent in long-term trend NAEP, where new items are not generally created. Main NAEP, where new items are needed, often uses more complex item types, which are less amenable to automatic item generation.

Read the entire page →

From page 41...

... Among them are the ideas of drawing from the detailed achievement-level descriptions to specify intended inferences and claims; better integrating the work of the experts who create NAEP frameworks with the experts who write items (as noted in Chapter 3) ; and applying many of the quality control processes to standardized item models instead of individual items to reduce review and pilot testing costs.

Read the entire page →

From page 42...

... NAEP's use of automated processes of item generation could then evolve as the state of the art in automatic item generation evolves. RECOMMENDATION 4-2: The National Assessment Governing Board and the National Center for Education Statistics should move toward using more structured processes for item development to both 16 For examples of the detailed achievement-level descriptions, see, for mathematics, https:// nces.ed.gov/nationsreportcard/mathematics/achieve.aspx; for science, https://nces.ed.gov/ nationsreportcard/science/achieve.aspx; and for reading, https://nces.ed.gov/nationsreportcard/ reading/achieve.aspx.

Read the entire page →

From page 43...

... This alignment can be seen in the 4th-grade science item map, where seven of the eight items listed as above the NAEP advanced cut scores are constructed-response items.18 Despite this association between item types and the cognitive level and content of the items, the relation is not exact. As is often pointed out, selected-response or simpler constructed-response items can be used to assess cognitively complex material, even though there are many examples when this is not the case.19 It is important to consider the full range of item types that can potentially be used to assess the different cognitive and content areas specified in the frameworks, rather than focusing on particular item types in the abstract.

Read the entire page →

From page 44...

... The costs considered should include item development (both item creation and pilot administration) , administra tion time, and scoring.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

4 Item Development Pages 37-44

4 Item Development
Pages 37-44