Skip to main content

Currently Skimming:

3 Developing Performance Levels for the National Adult Literacy Survey
Pages 50-86

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 50...
... We then provide a brief overview of the test development process used for NALS, as it relates to the procedures for determining performance levels, and describe how the performance levels were determined and the cut scores set. The chapter also includes a discussion of the role of response probabilities in setting cut scores and in identifying assessment tasks to exemplify performance levels; the technical notes at the end of the chapter provides additional details about this topic.
From page 51...
... While determination of the performance-level descriptions is usually completed early in the test development process, determination of the cut scores between the performance levels is usually made after the test has been administered and examinees' answers are available. Typically, the process of setting cut scores involves convening a group of panelists with expertise in areas relevant to the subject matter covered on the test and familiarity with the test-taking population, who are instructed to make judgments about what test takers need to know and be able to do (e.g., which test items individuals should be expected to answer correctly)
From page 52...
... Some of the tasks had been used on the earlier adult literacy assessments (the Young Adult Literacy Survey in 1985 and the survey of job seekers in 1990) , to allow comparison with the earlier results, and some were newly developed for NALS.
From page 53...
... The process for test development and determining performance levels for state K-12 achievement tests is similar. Under ideal circumstances, the performance-level categories and their descriptions are determined in advance of or concurrent with item development, and items are developed to measure skills described by the performance levels.
From page 54...
... The first step in the process that ultimately led to the formulation of NALS performance levels was an in-depth examination of the items included on the Young Adult Literacy Survey and the Survey of Workplace Literacy, to identify the features judged to contribute to their complexity.2 For the prose literacy items, four features were judged to contribute to their complexity: · Type of match: whether finding the information needed to answer 1The analyses were conducted on the Young Adult Literacy Survey but performance levels were not used in reporting its results. The analyses were partly replicated and extended to yield performance levels for the Survey of Workplace Literacy.
From page 55...
... For the quantitative literacy items, the identified features included type of match and plausibility of the distractors, as with the prose items, and structural complexity, as with the document items, along with two other features: · Operation specificity: the process required for identifying the operation to perform and the numbers to manipulate. · Type of calculation: the type and number of arithmetic operations.
From page 56...
... Spock column: Alterntv to phys punish 251 AB21201 Swimmer: Age Ms. Chanin began to swim 250 competitively A131001 Shadows Columbus saw 280 AB80801 Illegal questions 265 AB41001 Declaration: Describe what poem is 263 about AB81101 New methods for capital gains 277 AB71001 Instruction to return appliance: 275 Indicate best note AB90501 Questions for new jurors 281 AB90701 Financial security tips 262 A130901 Shadows Columbus saw 282 Level 3 AB60201 Make out check: Write letter explaining 280 bill error AB90601 Financial security tips 299 A121201 Dr.
From page 57...
... DEVELOPING PERFORMANCE LEVELS 57 IRT Parameters Type of Distractor Information a b c Readability Match Plausibility Type 0.868 ­2.488 0.000 8 1 1 1 1.125 ­1.901 0.000 8 1 1 1 0.945 ­1.896 0.000 7 1 1 2 1.213 ­1.295 0.000 7 3 2 2 0.956 ­1.322 0.000 7 1 2 3 1.005 ­1.195 0.000 10 3 1 3 1.144 ­1.088 0.000 8 3 2 4 1.035 ­1.146 0.000 8 2 2 3 1.070 ­1.125 0.000 8 3 4 2 1.578 ­0.312 0.000 9 3 1 2 1.141 ­0.788 0.000 6 3 2 2 0.622 ­1.433 0.000 4 3 1 3 1.025 ­0.638 0.000 7 4 1 3 1.378 ­0.306 0.266 5 3 2 3 1.118 ­0.493 0.000 6 4 2 1 1.563 ­0.667 0.000 8 3 2 4 1.633 ­0.255 0.000 9 3 4 1 1.241 ­0.440 0.000 7 3 2 4 1.295 ­0.050 0.000 8 2 2 4 1.167 ­0.390 0.000 8 3 2 4 0.706 ­0.765 0.000 7 3 4 1 0.853 ­0.479 0.000 10 4 3 2 1.070 ­0.203 0.000 9 3 2 3 0.515 ­0.929 0.000 9 3 2 2 0.809 ­0.320 0.000 10 3 2 4 0.836 ­0.139 0.000 8 3 3 4 1.230 ­0.072 0.000 6 4 2 3 0.905 ­0.003 0.000 6 4 3 3 0.772 ­0.084 0.000 8 4 3 2 continued
From page 58...
... 58 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS TABLE 3-1 Continued Scaled Identifier Task Description RP80 Level 4 AB40901 Korean Jet: Give argument made in article 329 A131101 Shadows Columbus saw 332 AB90801 Financial security tips 331 AB30601 Technology: Orally explain info from article 333 AB50201 Panel: Determine surprising future headline 343 A101101 AmerExp: 2 similarities in handling receipts 346 AB71101 Explain difference between 2 types of benefits 348 AB81301 New methods for capital gains 355 A120401 Blood donor pamphlet 358 AB31201 Dickinson: Describe what is expessed in poem 363 AB30501 Technology: Underline sentence explaining action 371 Level 5 AB81201 New methods for capital gains 384 A111201 Toyota, Acura, Nissan 404 A101201 AmExp: 2 diffs in handling receipts 441 AB50101 Panel: Find information from article 469 TABLE 3-2 List of Document Literacy Tasks, Along with RP80 Task Difficulty Score, IRT Item Parameters, and Values of Variables Associated with Task Difficulty (structural complexity, type of match, plausibility of distractor, type of information) : 1990 Survey of the Literacy of Job-Seekers Identifier Task Description RP80 Level 1 SCOR100 Social Security card: Sign name on line 70 SCOR300 Driver's license: Locate expiration date 152 SCOR200 Traffic signs 176 AB60803 Nurses' convention: What is time of program?
From page 59...
... DEVELOPING PERFORMANCE LEVELS 59 IRT Parameters Type of Distractor Information a b c Readability Match Plausibility Type 0.826 0.166 0.000 10 4 4 4 0.849 0.258 0.000 9 5 4 1 0.851 0.236 0.000 8 5 5 2 0.915 0.347 0.000 8 4 4 4 1.161 0.861 0.196 13 4 4 4 0.763 0.416 0.000 8 4 2 4 0.783 0.482 0.000 9 6 2 5 0.803 0.652 0.000 7 5 5 3 0.458 ­0.056 0.000 7 4 5 2 0.725 0.691 0.000 6 6 2 4 0.591 0.593 0.000 8 6 4 4 0.295 ­0.546 0.000 7 2 4 2 0.578 1.192 0.000 8 8 4 5 0.630 2.034 0.000 8 7 5 5 0.466 2.112 0.000 13 6 5 4 IRT Parameters Type of Distractor Information a b c Complexity Match Plausibility Type 0.505 ­4.804 0.000 1 1 1 1 0.918 ­2.525 0.000 2 1 2 1 0.566 ­2.567 0.000 1 1 1 1 1.439 ­1.650 0.000 1 1 1 1 1.232 ­1.620 0.000 1 1 1 1 0.442 ­2.779 0.000 2 1 2 2 0.940 ­1.802 0.000 8 2 2 1 0.763 ­1.960 0.000 3 1 2 2 0.543 ­2.337 0.000 1 2 1 2 1.017 ­1.539 0.000 1 1 2 1 0.671 ­1.952 0.000 2 1 2 2 continued
From page 60...
... 255 A120901 MasterCard/Visa statement 257 A130101 El Paso Gas & Electric bill 257 AB91101 Minimum wage power 260 AB81001 Consumer Reports books 261 AB90101 Pest control warning 261 AB21501 With graph, predict sales for spring 1985 261 AB20601 Yellow pages: Find place open Saturday 266 A130401 El Paso Gas & Electric bill 270 AB70902 Checking deposit: Enter correct cash amount 271
From page 61...
... DEVELOPING PERFORMANCE LEVELS 61 IRT Parameters Type of Distractor Information a b c Complexity Match Plausibility Type 1.454 ­1.283 0.000 1 1 2 1 1.069 ­1.434 0.000 1 1 1 1 1.292 ­1.250 0.000 7 2 2 2 0.633 ­1.898 0.000 3 2 2 1 1.179 ­1.296 0.000 1 2 2 1 0.997 ­1.296 0.000 6 1 2 2 0.766 ­1.454 0.000 1 1 2 2 1.029 ­1.173 0.000 5 3 2 1 1.266 ­0.922 0.000 3 2 2 1 0.990 ­1.089 0.000 3 1 1 1 0.734 ­1.366 0.000 5 2 2 2 1.317 ­0.868 0.000 8 1 2 2 1.143 ­0.881 0.000 8 2 3 1 0.954 ­0.956 0.000 4 2 2 2 0.615 ­1.408 0.000 2 3 2 1 0.821 ­1.063 0.000 6 2 2 3 1.005 ­0.872 0.000 5 2 3 3 0.721 ­1.170 0.000 1 2 3 2 1.014 ­0.815 0.000 7 3 2 2 0.948 ­0.868 0.000 1 2 3 1 1.538 ­0.525 0.000 6 3 2 1 0.593 ­1.345 0.000 2 2 3 2 0.821 ­0.947 0.000 5 3 2 1 0.904 ­0.845 0.000 2 2 2 3 0.961 ­0.703 0.000 4 2 2 2 0.993 ­0.674 0.000 6 3 2 1 1.254 ­0.497 0.000 6 3 2 1 1.408 ­0.425 0.000 6 3 2 1 0.773 ­0.883 0.000 8 3 2 1 0.904 ­0.680 0.000 1 2 2 2 1.182 ­0.373 0.000 6 2 2 3 1.154 ­0.193 0.228 4 3 2 1 0.610 ­0.974 0.000 6 1 2 2 0.953 ­0.483 0.000 8 2 2 2 0.921 ­0.447 0.000 4 3 3 2 1.093 ­0.304 0.000 4 3 2 1 0.889 ­0.471 0.000 2 3 3 2 0.799 ­0.572 0.000 5 3 2 2 1.078 ­0.143 0.106 7 3 2 1 0.635 ­0.663 0.000 8 3 3 2 0.858 ­0.303 0.000 3 3 3 2 continued
From page 62...
... 315 AB60502 Petroleum graph: Complete graph including axes 318 A120701 MasterCard/Visa statement 320 AB20701 Bus schd: Take correct bus for given condition (1) 324 Level 4 A131301 Tempra dosage chart 326 AB50501 Telephone bill: Mark information on bill 330 AB91401 Consumer Reports index 330 AB30801 Almanac: Find page containing chart for given info 347 AB20901 Bus schd: After 2:35, how long til Flint&Acad bus 348 A130301 El Paso Gas & Electric bill 362 A120801 MasterCard/Visa statement 363 AB91301 Consumer Reports index 367 Level 5 AB60501 Petroleum graph: Label axes of graph 378 AB30901 Almanac: Determine pattern in exports across years 380 A100701 Spotlight economy 381 A100501 Spotlight economy 386 A100401 Spotlight economy 406 AB51001 Income tax table 421 A100601 Spotlight economy 465
From page 63...
... DEVELOPING PERFORMANCE LEVELS 63 IRT Parameters Type of Distractor Information a b c Complexity Match Plausibility Type 1.001 ­0.083 0.000 5 3 2 2 0.820 ­0.246 0.000 3 2 5 2 0.936 ­0.023 0.097 4 4 2 1 0.762 ­0.257 0.000 10 5 2 3 0.550 ­0.656 0.000 2 3 2 2 0.799 ­0.126 0.000 4 4 2 2 0.491 ­0.766 0.000 9 2 4 2 0.754 ­0.134 0.000 5 3 4 2 0.479 ­0.468 0.144 7 2 5 1 0.415 ­0.772 0.088 7 2 4 2 0.640 ­0.221 0.000 1 5 2 1 0.666 ­0.089 0.000 2 2 1 4 0.831 0.285 0.000 10 4 2 2 1.090 0.684 0.142 4 4 2 1 0.932 0.479 0.000 6 4 4 2 0.895 0.462 0.000 1 5 2 3 0.975 0.570 0.000 4 3 5 2 1.282 0.902 0.144 10 3 5 2 1.108 0.717 0.000 8 4 4 3 0.771 0.397 0.000 5 4 3 2 0.730 0.521 0.144 10 3 4 2 1.082 0.783 0.000 10 6 2 2 0.513 ­0.015 0.000 6 2 4 2 0.522 0.293 0.131 10 3 4 2 0.624 0.386 0.000 5 4 4 2 0.360 ­0.512 0.000 7 4 4 2 0.852 0.801 0.000 7 3 5 3 0.704 0.929 0.000 5 4 5 2 1.169 1.521 0.163 10 5 4 2 0.980 1.539 0.000 8 5 4 5 0.727 1.266 0.000 6 5 4 2 0.620 1.158 0.000 7 4 5 3 1.103 1.938 0.000 11 7 2 5 0.299 0.000 0.000 7 5 5 3 0.746 1.636 0.000 10 5 5 2 0.982 1.993 0.000 10 5 5 5 0.489 1.545 0.000 10 5 5 2 0.257 0.328 0.000 9 4 5 2 0.510 2.737 0.000 10 7 5 2
From page 64...
... 283 AB80201 Burning out of control 286 A110101 Dessert recipes 289 AB90201 LPGA money leaders 294 A120101 Businessland printer stand 300 AB81003 Consumer Reports books 301 AB80601 Valet airport parking discount 307 AB40301 Unit price: Mark economical brand 311 A131701 Money rates: Compare S&L w/mutual funds 312 AB80701 Valet airport parking discount 315 A100101 Pizza coupons 316 AB90301 LPGA money leaders 320 A110401 Dessert recipes 323 A131401 Tempra dosage chart 322 Level 4 AB40501 Airline schedule: Plan travel arrangements (1) 326 AB70501 Lunch: Determine correct change using info in menu 331 A120201 Businessland printer stand 340 A110901 Washington/Boston train schedule 340 AB60901 Nurses' convention: Write number of seats needed 346 AB70601 Lunch: Determine 10% tip using given info 349 A111001 Washington/Boston train schedule 355 A130501 El Paso Gas & Electric bill 352 A100801 Spotlight economy 356
From page 65...
... DEVELOPING PERFORMANCE LEVELS 65 IRT Parameters Type of Distractor Calculation Op a b c Complexity Match Plausibility Type Specfy 0.869 ­1.970 0.000 2 1 1 1 1 0.968 ­0.952 0.000 6 3 2 1 3 0.947 ­0.977 0.000 1 2 1 5 4 1.597 ­0.501 0.000 3 2 2 1 4 0.936 ­0.898 0.000 2 3 2 3 2 0.883 ­0.881 0.000 2 3 3 1 4 1.936 ­0.345 0.000 3 2 2 2 4 1.874 ­0.332 0.000 3 1 2 2 4 1.073 ­0.679 0.000 4 3 2 2 4 1.970 ­0.295 0.000 3 2 2 2 4 0.848 ­0.790 0.000 2 3 2 2 4 0.813 ­0.775 0.000 5 3 2 2 4 0.896 ­0.588 0.000 5 2 2 2 4 1.022 ­0.369 0.000 2 3 3 2 4 0.769 ­0.609 0.000 7 2 3 1 4 0.567 ­0.886 0.000 2 3 3 2 4 0.816 0.217 0.448 2 2 3 4 6 1.001 ­0.169 0.000 4 3 3 2 2 0.705 ­0.450 0.000 2 2 3 3 4 0.690 ­0.472 0.000 2 3 3 1 4 1.044 0.017 0.000 5 1 2 4 3 1.180 0.157 0.000 5 3 2 3 6 1.038 0.046 0.000 5 3 3 2 4 0.910 0.006 0.000 3 3 3 5 3 0.894 0.091 0.000 2 2 2 5 4 0.871 0.232 0.000 2 3 4 3 5 1.038 0.371 0.000 7 4 4 2 5 0.504 ­0.355 0.000 3 4 4 1 5 0.873 0.384 0.000 2 1 2 5 7 0.815 0.434 0.000 7 4 4 2 5 0.772 0.323 0.000 8 3 4 2 2 0.874 0.520 0.000 8 5 4 2 2 continued
From page 66...
... would imply a level of precision of measurement that the test designers believed was inappropriate for the methodology adopted. Thus, identical score intervals were adopted for each of the three literacy scales as shown below: · Level 1: 0­225 · Level 2: 226­275 · Level 3: 276­325 · Level 4: 326­375 · Level 5: 376­500 Performance-level descriptions were developed by summarizing the features of the items that had difficulty values that fell within each of the score ranges.
From page 67...
... The performance levels produced by this approach were score ranges based on the cognitive processes required to respond to the items. While the 1992 score levels were used to inform a variety of programmatic decisions, there is a benefit to developing performance levels through open discussions with stakeholders.
From page 68...
... Department of Education, National Center for Education Statistics, National Adult Literacy Survey, 1992.
From page 69...
... Thus, the scaled scores used in determining the score ranges associated with the five performance levels were the scaled scores associated with an 80 percent probability of responding correctly. The choice of the specific response probability value (e.g., 50, 65, or 80 percent)
From page 70...
... , would have lowered the cut scores associated with the performance levels in such a way that a much smaller percentage of adults would have been classified at the lowest level. For example, the cut score based on a response probability of 80 placed slightly more than 20 percent of respondents in the lowest performance level; the cut score based on a response probability of 50 classified only 9 percent at this level.
From page 71...
... Additional discussion of response probabilities appears in the technical note to this chapter, and we revisit the topic in Chapter 5. MAPPING ITEMS TO PERFORMANCE LEVELS Response probabilities are calculated for purposes other than determining cut scores.
From page 72...
... Recommendation 3-1: If the Department of Education decides to use an item mapping procedure to exemplify performance on the National Assessment of Adult Literacy (NAAL) , displays should demonstrate that individuals who score at all of the performance levels have some likelihood of responding correctly to the items.
From page 73...
... . We relied on this guidance offered by the Standards in designing our approach to developing performance levels and setting cut scores, which is the subject of the remainder of this report.
From page 74...
... 74 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS TABLE 3-5 Difficulty Values of Selected Tasks Along the Prose Literacy Scale, Mapped at Four Response Probability Criteria: The 1992 National Adult Literacy Survey RP 80 RP 65 RP 50 RP 35 75 <81> Identify country in short articlea <102> Identify country in short articlea <123> Identify country in short articlea 125 <145> Underline sentence explaining action stated in short article <149> Identify country in short articlea <169> Underline sentence explaining action stated in short article 175 <194> Underline sentence explaining action stated in short article <224> Underline sentence explaining action stated in short article 225 <255> State in writing an argument made in a long newspaper story
From page 75...
... At a scale score of 102 and 81, individuals have, respectively, a 50 percent chance of responding correctly to the item.
From page 76...
... For the illustration in Figure 3-1, the variable being measured is prose literacy as defined by the 1992 NALS. A hypothetical population distribution is shown in the upper panel of Figure 3-1, simulated as a normal distribution.4 3 Item discrimination is denoted by ai; item location (difficulty)
From page 77...
... . For this item, the trace line in Figure 3-1 shows that people with prose literacy scale scores higher than 300 are nearly certain to respond correctly, while those with scores lower than 200 are nearly certain to fail.
From page 78...
... That is, items with 1.0 0.80 0.67 Response 0.50 Correct 0.5 a of Probability 0.0 100 150 200 250 300 350 400 450 Prose Scale Score FIGURE 3-2 Trace lines for the 39 open-ended items on the prose scale for the 1992 NALS.
From page 79...
... For reporting purposes, the prose literacy scale for the 1992 NALS was divided into five levels using cut scores that are shown embedded in the population distribution in Figure 3-3. Using these levels for reporting, the proportion of the population scoring 225 or lower was said to be in Level 1, with the proportions in Levels 2, 3, and 4 representing score ranges of 50 points, and finally Level 5 included scores exceeding 375.
From page 80...
... For example, for the "write letter" item, it was said "this task is at 280 on the prose scale" (Kirsch et al., 1993, p.
From page 81...
... Department of Education, National Center for Education Statistics, National Adult Literacy Survey, 1992.
From page 82...
... . That appendix includes a representation of the trace line for each item at seven equally spaced scale scores between 150 and 450 (along with the rp80 value)
From page 83...
... To illustrate, Figure 3-6 shows the trace line for the "write letter" item as it passes through the middle of the prose score scale. The trace line is enclosed in dashed lines that represent the boundaries of a 95 percent confidence envelope for the curve.
From page 84...
... That is, the confidence envelope translates statistical uncertainty (due to random sampling) in the estimation of the item parameters into a graphical display of the consequent uncertainty in the location of the trace line itself.6 A striking feature of the confidence envelope in Figure 3-6 is that it is relatively narrow.
From page 85...
... Figure 3-4 illustrates another statistical property of the trace lines used for NALS and NAAL that provides motivation for choosing an rp value closer to 50 percent. Note in Figure 3-2 that not only are the trace lines in a different (horizontal)
From page 86...
... A compromise value of rp67, combined with a reminder that the rp values are arbitrary values used in the standard-setting process, and reporting of the results can describe the likelihood or correct responses for any level or scale score, are what we suggest.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.