Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
4 Determining Performance Levels for the National Assessment of Adult Literacy T he committee began its work by reviewing the processes for devel- oping the National Adult Literacy Survey (NALS) and determining the 1992 performance levels in order to gain a comprehensive un- derstanding of the assessment and to evaluate whether new performance levels were needed. Our review revealed that the 1992 levels were essen- tially groupings based on the cognitive processes required to respond to the items. The committee decided that a more open and transparent procedure could be used to develop performance levels that would be defensible and informative with regard to the policy and programmatic decisions likely to be based on them. It was clear from the press coverage of the release of 1992 results that people wanted information about the extent of literacy problems in the country as well as an indication of the portion of adults whose literacy skills were adequate to function in society. Although the test development process was not designed to support inferences like this, we decided that new performance levels could be developed that would be more informative to stakeholders and the public about adultsâ literacy skills. On the basis of our review, the committee decided to embark on a process for defining a new set of performance levels for the National As- sessment of Adult Literacy (NAAL). This decision meant that we needed to address five main questions: 1. How many performance levels should be used? 2. Should performance levels be developed for each of the literacy scales (prose, document, and quantitative) or should one set of levels be developed? 87
88 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS 3. What should the levels be called? 4. How should the levels be described? 5. What score ranges should be associated with each of the levels? In this chapter, we take up the first three questions and describe our process for determining the number of performance levels and their pur- poses. In Chapter 5, we discuss our procedures for determining the descrip- tions of the levels and the associated cut scores. Our process for determining the performance levels combined feedback from stakeholders regarding the ways they used the 1992 results and antici- pated using the 2003 results with information from analyses of the relation- ships between NALS literacy scores and background data. We began our work by using two documents prepared prior to our first meeting: Devel- oping the National Assessment of Adult Literacy: Recommendations from Stakeholders (U.S. Department of Education, 1998), which reported on a series of discussion sessions conducted by the National Center for Educa- tion Statistics (NCES) in 1998, and The National Adult Literacy Survey: A Review of Primary and Secondary Analyses of the NALS (Smith, 2002), a literature review prepared for the committee that summarizes the empirical research on the relationships between NALS literacy scores and background characteristics. These documents served as the starting place for the committeeâs work. The report on the discussion sessions indicated ways in which NALS results were used, the problems users had in interpreting and using the results, and the information stakeholders would like to obtain from reports of NAAL results. The literature review assisted us in designing analyses to explore whether alternative versions of performance levels would permit such uses and interpretations. This chapter begins with a summary of key points stakeholders made during the NCES discussion sessions and the public forum convened by the committee. This is followed by a description of analyses we carried out and the performance levels we recommend. STAKEHOLDER VIEWS Discussion Sessions Sponsored by NCES When NCES began planning for the NAAL, it commissioned the Ameri- can Institutes for Research (AIR) to convene a series of discussion sessions to gather feedback from stakeholders that would guide development ef- forts. Three discussion groups were held in January and February 1998 at AIRâs offices in Washington, DC, and included federal and state policy
DETERMINING PERFORMANCE LEVELS 89 makers and program directors as well as representatives from the general and educational media. AIR facilitators posed questions to stakeholders to hear their comments about the ways the 1992 data had been used and interpreted, the problems associated with using and interpreting the data, and issues to consider in designing the new assessment. A summary of these discussion sessions and a listing of the participants has been published (see U.S. Department of Education, 1998). Stakeholders indicated that the 1992 survey results were used to de- scribe the status of literacy to policy makers at the federal and state levels; to argue for increased funding and support for adult literacy programs; to support requests for studies of special populations such as non-English speakers, welfare recipients, incarcerated individuals, and elderly popula- tions; to document needed reforms in education and welfare policy; and to enable international comparisons. Participants described a number of problems associated with interpret- ing the results, including the following: â¢ Stakeholders need data that can be used to make programmatic decisions. They had trouble using the 1992 NALS results for such purposes because the levels were difficult to understand, there was no definition of âhow much literacy was enough,â and the results were not connected to workplace requirements or measures of employability. â¢ The lowest level was so broad that it made it difficult to identify the truly nonliterate population. â¢ Having three literacy scales crossed with five performance levels produced so much information that it was difficult to present the results to policy makers and others. It was hard to distill the information into easily interpreted messages. â¢ Some participants said they used only one scale when reporting information to the public and suggested that a composite literacy score be developed. They believed this was justified because the three scales were so highly correlated with each other. â¢ The five performance levels were difficult to understand, in part because there were no concrete meanings associated with them. That is, there was no definition of which people âneeded helpâ and which people had âenoughâ literacy. Some tried to equate the levels to high school graduation and beyond. â¢ There was a lack of congruence between self-perception of literacy skills and NALS literacy scores. Some cited the work of Berliner (1996), whose research showed that individuals who scored in the bottom two NALS levels reported reading the newspaper on a daily basis.
90 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS â¢ Some suggested that the NALS levels be cross-walked with those used by the National Reporting System (NRS)1 and by other adult literacy assessments. â¢ The scores for non-English speakers were not reported separately, making it difficult to distinguish between literacy problems and lack of fluency in English. Stakeholders also indicated that there was interest in conducting stud- ies of special populations (e.g., those living in rural areas, immigrants, non- English speakers), but the sampling design used for NALS did not support such studies. They encouraged oversampling participants in NAAL to allow such studies and adding background questions to clearly identify those in the categories of interest. Discussion Sessions Sponsored by the Committee The committee arranged for several opportunities to obtain feedback from various stakeholders regarding the ways NALS results were used, the ways stakeholders anticipate using NAAL results, and the types of informa- tion that stakeholders would like to see included in reports of NAAL re- sults. We collected information about the types of inferences that might be based on NAAL results, the policy and program decisions that might be made, the number of performance levels needed to support these infer- ences and uses, and the types of performance-level descriptions that would communicate appropriately to the various audiences for NAAL results. The committeeâs fourth meeting, on February 27, 2004, included a public forum to hear feedback from stakeholders. Participating stakehold- ers included representatives from organizations likely to be involved in policy and programmatic decisions based on NAAL results, some of whom were individuals who had participated in the earlier discussion sessions sponsored by NCES. The committee also solicited feedback from directors of adult education in states that subsidized additional sampling during NAAL in order to obtain state-level NAAL results (see Appendix A for a list of individuals who provided feedback, their affiliations, and the materials they were asked to react to). The stakeholders were helpful in delineating the types of uses that would be made of the results. Overall, their comments tended to concur with those made by participants in the NCES-sponsored discussion ses- sions. In general, it appeared that NAAL results would be used to advocate 1See Chapter 2 for an explanation of the levels used by the NRS.
DETERMINING PERFORMANCE LEVELS 91 for needed policy and to shape and design programs. Forum participants indicated that they expected to use NAAL results to evaluate preparedness for work and the need for job training programs, adultsâ ability to under- stand health- and safety-related reading matter and physiciansâ instruc- tions, parentsâ readiness to help their children with their schooling, and the need for adult education and literacy services. Most also emphasized that having scores for the three literacy scales was useful, and that the scores were used for different purposes (e.g., the tasks used to evaluate document literacy were most relevant to work skills). The feedback from stakeholders indicated considerable diversity of opinions about the number of performance levels needed for reporting assessment results, what the levels should be called, and the type of descrip- tion needed to best communicate with the various audiences. For example, reporters and journalists present at the forum argued for straightforward approaches that could be easily communicated to the public (e.g., two performance levels described with nontechnical terminology). They main- tained that the public is most interested in simply knowing how many people in the country are literate and how many are not. Others argued for more than two levelsâsome thought there should be three levels, while others thought there should be six or seven, with finer distinctions made at the lower levels. Some stakeholders, particularly those from the health literacy field, preferred that the 1992 levels be used for NAAL, commenting that consis- tency was needed so as not to disrupt on-going longitudinal research or interfere with programs and interventions already in place. NCES staff members present at the forum pointed out that the data files would be made available and score data provided so that users could group the scores based on the score ranges used for the 1992 performance levels or any other grouping that fit their particular needs. With regard to qualitative names for the levels, some favored labels for the levels, noting that this can provide a means for succinctly and accurately communicating the meaning of the levels (e.g., satisfactory literacy skills, deficient literacy skills). Reporters present at the forum suggested two labels, literate and not literate. They warned that if the labels did not clearly indicate which adults were not literate, they would derive a means to figure this out on their own. Some participants recommended the labels used by state achievement testing programs and by the National Assess- ment of Educational Progress (i.e., below basic, basic, proficient, and ad- vanced) since the public is well acquainted with these terms. Others argued against labeling the levels (e.g., proficient, fully func- tional), especially labels that place a stigma on the lowest levels of literacy (e.g., minimally literate). They urged that if labels had to be used that they be descriptive, not imply normative standards, and not be connected with
92 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS educational level. The question of the type of information to include in the performance-level descriptions solicited equally diverse perspectives. Some thought more detailed descriptions were better, while others argued for more succinct descriptions. It seemed clear that different levels of detail would be important for different uses of the performance levels. Stakeholders representing adult education were most interested in hav- ing more detailed information about adults at the lowest levels. Several participants commented that adults who receive adult education services tend to have skills described by the bottom two levels used for reporting NALS 1992. Knowing more about the skills of those who fall in these two levels would be useful in identifying target clientele and making planning decisions. RELATIONSHIPS BETWEEN LITERACY SCORES AND BACKGROUND CHARACTERISTICS Although the NALS literacy assessments were not designed to distin- guish between adequate and inadequate literacy skills, the committee thought that analyses of the background questionnaire might provide in- sight into ways to identify adults who were experiencing significant difficul- ties in life. We hypothesized that such analyses might reveal break points in the distribution of literacy scores at which individuals were at an unaccept- ably high risk for encountering social and economic hardships. This type of information might lead to obvious choices in performance levels, standards, or both. The committee therefore focused its analytic work on the relationships between NALS literacy scores and relevant information collected on the background questionnaire. Our literature review (Smith, 2002) gave us a sense of the type of research that had been conducted with NALS over the past decade and the sorts of relationships found. The research findings generally indicated that individuals with lower literacy scores were more likely to experience difficulties, such as being in poverty, on welfare, or unemployed; working in a low-paying job; not having a high school di- ploma; or being unable to pass the general education development (GED) exam. Low literacy skills were also associated with being less likely to participate in such important aspects of life as voting and reading the newspaper. With this literature review in mind, we tried to identify the levels of literacy at which the risk of encountering such difficulties differed, focusing specifically at the points where the risk would be unacceptably high. We thought that examining relationships with important socioeconomic fac- tors might suggest categories of performance that would be useful in deter-
DETERMINING PERFORMANCE LEVELS 93 mining new policy-relevant performance levels. For these analyses, we used the following information from the NALS background questionnaire:2 employment status; income; occupation; voting history over the past five years; receiving food stamps or other public assistance; receiving income from stocks, mutual funds, or other sources of interest income; and level of formal education. We also considered responses to questions about partici- pation in reading-related activities, such as how often the participant reads a newspaper, reads or uses written information for personal use or for work, uses math for personal use or for work, and receives reading assis- tance from family and friends. We had hoped to explore the relationships between literacy scores and the variables described above to develop performance levels that were not just descriptions of skills but were descriptions of the functional conse- quences of adultsâ literacy skills, such as education level, family income, and job status. In the end, however, we came to realize that the back- ground questions did not provide the information needed for the analyses we had hoped to conduct. Overall, the relationships between literacy scores and the background variables did not suggest obvious break points that could be used in defin- ing performance levels. In part, this may have been because the back- ground questions did not probe deeply enough into a particular area, or the answer choices did not allow for fine enough distinctions. For example, participants were asked to characterize their newspaper reading habits by indicating the frequency with which they read the newspaper and the sec- tions of the newspaper that they read (news section, editorial, comics, etc.); however, they were not asked questions that could help evaluate the diffi- culty or complexity of the newspapers they read, such as which newspapers they read. Clearly, the news section of the Wall Street Journal requires different literacy skills than the news section of a local newspaper. Similar questions inquire about magazine and book reading but do not collect information that could be used to make inferences about the complexity of the books or magazines read. Thus, the information may be useful for making rough distinctions between those who do not read magazines, books, or newspapers at all and those who doâbut not useful for making finer distinctions required to sort people into incremental levels of literacy. Similar observations can be made about the information collected about voting behavior. The 1992 questionnaire included only a single question on this topic, asking participants if they had voted in a national or state 2See Appendix G of the technical manual for NALS (Kirsch et al., 2001) for the exact wording of the questions.
94 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS election in the past five years. This provides only a rough glimpse into voting behavior. A more in-depth query might have asked about voting in national and state elections separately and added questions about voting in local elections and other forms of civic participation (e.g., running for office, communicating with elected officials). Again, the information avail- able from the background questionnaire can be used to make rough distinc- tions, such as between those who do and do not vote, but it is not useful for making more nuanced distinctions. In the end, we concluded that the background information could not be used by itself to identify categories of literacy skills but could be used to evaluate the reasonableness of cut scores resulting from a more typical standard-setting procedure. In Chapter 5, we therefore use the results of our analyses as a complement to a standard- setting procedure using the test items themselves, rather than as an alternative to such a standard setting procedure. The analyses discussed in this report are all univariate analyses. The committee also explored the use of multivariate regression techniques to look at the relationship between literacy scores and the various back- ground questions. These multivariate analyses built on work by Sum (1999) related to employment and earnings and by Venezky (Venezky and Kaplan, 1998; Venezky, Kaplan, and Yu, 1998) on voting behavior. Not surprisingly, the independent explanatory power of literacy scores in such analyses is severely reduced by the inclusion of other variables, such as education, because these variables themselves have a complex bidirectional relationship with literacy. Upon reflection, the committee decided that it was beyond the scope of its charge and beyond the quality of the data to attempt to resolve the causal relationships of literacy with the various characteristics indicated in the background questions that might be en- tered as additional controls in multivariate regressions. Therefore, only the univariate analyses were used to suggest possible conclusions about performance levels. To demonstrate the committeeâs exploratory analyses, the next sections provide information on the relationships between literacy scores and two variables: income and occupational status. These examples are illustrative of a larger set of findings that demonstrate that there are strong gradients across an array of literacy outcomes but no clear break points that would, prima facie, lead one to set cut points in the literacy distribution. For each variable, we show several approaches that highlight the continuous nature of the relationship and contrast those with an approach that can be used to suggest a contrast in functional level. The latter approach is then further developed in Chapter 5, when we present the procedures we used for setting the cut scores and discuss the complementary role played by the statistical analyses. Because the 2003 data were not available to us until the final
DETERMINING PERFORMANCE LEVELS 95 months of our work, the analyses discussed in this chapter are based on the 1992 data alone. The complementary statistical analyses presented in the next chapter are carried out with both the 1992 and 2003 data sets. Literacy Scores and Annual Income It seems intuitively sensible that literacy should be related to how one functions in other critical aspects of life, and that income and occupation should serve as indicators of how well one is functioning. Furthermore, one would expect that low literacy skills should be associated with restricted opportunities, such as not pursuing postsecondary education or specialized training and working in lower paying jobs with no formal training require- ments. With these assumptions in mind, we examined the relationships between literacy skills and income and occupation. Three figures present information on the relationships between literacy skills and income. Figure 4-1, adapted from Kirsch et al. (1993), shows the percentages of adults who, according to federal guidelines, were poor or near poor or who had received food stamps in the year prior to the assess- ment at each of the five 1992 performance levels for prose literacy. This graph shows that the risk of being in poverty or being on food stamps increases as literacy scores decrease. Because some stakeholders have reported that Level 1 was âtoo broadâ to be informative about individuals with the lowest level of literacy skills, we adapted this figure and examined the relationships between poverty and prose literacy scores for specific groupings within the 0 to 225 score range 50 Percentage of NALS Respondents Poor/Near Poor 40 Received Food Stamps Last Year 30 20 10 0 Level 1 Level 2 Level 3 Level 4 Level 5 0-225 226-275 276-325 326-375 376-500 Prose Scale Score Ranges FIGURE 4-1 Percentage of NALS respondents who are poor or near poor or who received food stamps in the past year by prose literacy level.
96 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS Percentage of NALS Respondents 60 50 40 30 20 10 0 Level 1 Level 1 Level 1 Level 1 Level 1 Level 2 0-100 101-150 151-175 176-200 201-225 226-275 Prose Scale Score Ranges FIGURE 4-2 Percentage of NALS respondents who are poor or near poor for six groupings of NALS Level 1 prose scores and NALS Level 2 prose scores. encompassed by Level 1. Figure 4-2 presents a comparison of the percent- ages of adults who were poor or near poor at six groupings of Level 1 scores and at the Level 2 score range (226-275). Clearly, the risk of being poor is not even across the Level 1 groupings; risk of being poor increases steadily as scores decrease with what appears to be substantial risk at scores of 175 or lower. To see if this relationship between literacy and income suggested clear break points, we plotted the distribution of literacy scores for the 12 group- ings of income levels used on the background questionnaire. Figure 4-3 presents this information in boxplots: each box shows the range of scores from the 25th percentile (bottom of the box) to the 75th percentile (top of box). The 50th percentile (median) score is marked within the box. Also displayed is the full range of scores for each income group, denoted by horizontal lines below the box (minimum score) and above the box (maxi- mum score). Comparison of these boxplots shows that prose literacy scores tend to increase as annual income increases. But there is considerable over- lap in the range of literacy scores across adjacent income levels, and no break points between adjacent income levels that would clearly signify a heightened risk of encountering difficulties in life associated with having a low income level. Whereas there were no break points between adjacent income groups, there did appear to be differences between the more extreme groups, those with incomes of $15,000 or less and those with incomes of $75,000 or more. That is, 75 percent of those earning $75,000 or more achieved a prose score of 312 or higher, while 75 percent of those earning $15,000 or less scored 308 or lower on prose.
DETERMINING PERFORMANCE LEVELS 97 Prose Scale Score FIGURE 4-3 Boxplots illustrating the distribution of prose literacy scores for groupings of income levels, as indicated on the background questionnaire. To follow up on this observation, we compared the responses to two background questions indirectly related to income. The first set of re- sponses included whether or not the individual or household received Aid to Families with Dependent Children (AFDC, replaced in 1996 by Tempo- rary Assistance for Needy Families) or food stamps; and the second set identified whether or not the individual or household received interest or dividend income. These questions identify respondents who are experienc- ing difficulty or success at a functional level associated with income that is not indicated by the income figures alone. Figure 4-4 shows the boxplots of the prose literacy scores for respon- dents who answered âyesâ to one or the other of the two questions. The boxplots indicate that approximately three-quarters of the people receiving AFDC or food stamps scored below 380, and three-quarters of the people receiving interest or dividend income scored above 275. To the extent that it is appropriate to link literacy level in a causal way to the set of behaviors that ultimately influence an individualâs financial success, this figure sug- gests that a cut score somewhere in the 273 to 380 range might be a rough dividing line between individuals who are experiencing functional difficul- ties and individuals who are experiencing financial success.
98 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS Prose Scale Score AFDC or Food Stamps Interest of Dividends FIGURE 4-4 Boxplots that illustrate the distribution of prose literacy scores for respondents who indicated that they/their household received either (a) Aid to Families with Dependent Children or food stamps or (b) interest or dividend income. Literacy Scores and Occupational Information To examine the relationships between literacy scores and occupation, we drew on analyses conducted by Sum (1999), Rock, Latham, and Jeanneret (1996), and Barton (1999).3 First, using information derived by Rock et al. (1996), we examined the mean quantitative score for occupa- 3The method of identifying the occupations of NAAL respondents, obviously crucial to examining the relationships between literacy scores and occupational classification, depends on accurate classification of a respondentâs narrative description of their occupation into a
DETERMINING PERFORMANCE LEVELS 99 tional categories that contained at least 30 respondents in 1992. Table 4-1 displays this information with occupations rank-ordered according to their mean quantitative score and grouped by performance level. Second, we examined mean prose scores for a sample of occupations selected to be representative of Bartonâs (1999) nine broad categories identified through an analyses of job requirements. These categories are: (1) executive, admin- istrative, and managerial; (2) professional specialty; (3) technicians and related support occupations; (4) marketing and sales; (5) administrative support; (6) service occupations; (7) agriculture, forestry, and fishing; (8) precision production, craft, and repair; and (9) operators, fabricators, and laborers. Figure 4-5 displays, for the selected occupations, the means (noted with a shaded diamond) as well as the range of scores bounded by the mean plus or minus one standard deviation (noted by the horizontal lines above and below the shaded diamond). The data in these figures seem reasonable in that the general trend of the mean literacy scores required for the different occupations seems intu- itively sensibleâoccupations that one would expect to require more lit- eracy do indeed have higher mean scores. None of the occupations had mean scores that fell in the score range for Level 1 or Level 5, however; the preponderance of occupations had mean scores that fell in Level 3 (see Table 4-1). Only the mean for those who had never worked fell into Level 1 (see Figure 4-5). Most importantly, the variability of literacy scores within occupations showed considerable overlap between occupational groups (see Figure 4-5). Clearly, there are no obvious break points in the distribution of literacy scores; that is, there are no points on the scale at which there is a distinctly higher risk of being unemployed or working in a low-paying job. Nonetheless, while the information does not seem to be useful in deter- mining specific performance levels or identifying cut scores, it does demon- strate how opportunities to enter into white-collar, higher paying occupa- tions increase as literacy skills increase. That is, for those at higher literacy levels, opportunities are readily accessible; for those at lower levels of lit- eracy, the opportunities to obtain higher paying jobs are more limited. As an alternate approach that could indicate a possible contrast in performance levels, the committee formed three clusters based on Bartonâs classifications. We included the occupations in Bartonâs groups 7, 8, and standard occupational classification system. Currently the U.S. Department of Laborâs Stan- dard Occupational Classification is used. That is, two respondents who describe their occu- pations in the same or similar words during the collection of the NALS/NAAL data actually are in the same occupation and are classified in the same way by those making the classifica- tions into occupational categories with that narrative description.
100 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS TABLE 4-1 Occupation with at Least 30 Respondents in NALS, 1992 JOB Quantity Level 1 None Level 2 Janitor 234 Sewing-machine operator, semiautomatic 243 Orderly 251 Construction worker II 253 Bus driver 257 Cook 261 Physical therapy aide 264 Cashier II 273 Level 3 Teacher aide II 276 Farmworker, livestock 277 Truck driver, heavy 278 Clerk, general 278 Mail-distribution-scheme examiner 285 Sales clerk 285 Waiter/waitress, formal 285 Nurse, licensed practical 286 Carpenter 289 Chef 289 Correction officer 291 Automobile mechanic 292 Manager, retail store 296 Assistant construction superintendent 297 Manager, property 297 Manger, food service 298 Teller 299 Secretary 303 Legal secretary 306 Nurse, office 306 Poultry farmer 307 Disbursement clerk 307 Superintendent, construction 311 Police officer I 311 Manager, department 315 Sales agent, insurance 316 Caseworker 319 Sales agent, real estate 322 Director, educational program 323 Teacher, physically impaired 324
DETERMINING PERFORMANCE LEVELS 101 TABLE 4-1 Continued JOB Quantity Level 4 Teacher, elementary school 329 Operations officer 332 Public health physician 348 Manager, financial institution 349 Lawyer 350 Accountant 351 Systems analyst 352 Level 5 None 9 in a low category, the occupations in Bartonâs groups 1 and 2 in a high category, and the remainder of the occupations in a medium category. We then contrasted the literacy score distribution in the low and high categories. Clustered in this way, the categories may be considered to indicate a contrast between occupations that have minimal formal education and training requirements and those that require formal education and training. Figure 4-6 shows the boxplots of the prose literacy scores for the employed respondents whose stated occupation fell into either the low or the high category. The boxplots indicate that these two groups of people can be roughly separated by drawing a cut score somewhere in the range 291-301. Three-quarters of the people who are in the low category are below this literacy range, and three-quarters of the people who are in the high category are above this literacy level. To the extent that it is appropriate to link literacy level in a causal way to the set of behaviors that ultimately influ- ences an individualâs occupational choice, Figure 4-6 suggests that a cut score somewhere in the range 291-301 might be a rough dividing line between individuals who work in occupations that require minimal formal education and training (and hence are lower paying) and individuals who work in occupations that require formal education and training (and hence are higher paying). Assessing the Dimensionality of NALS In Adult Literacy in America, Kirsch et al. (1993) presented a number of graphs that portrayed the relationships between background informa- tion and literacy scores, with separate graphs for each of the literacy scales (prose, document, and quantitative). One observation that can be made about such graphs is that, regardless of the background variable, the rela- tionships are always similar for the three literacy scales (e.g., see Kirsch et
102 475 425 375 325 Scale Score 275 225 175 125 d s. b e ts e r. s s rs n s e s es rs rs ts ke yr la tiv af e rk rie ia tiv er rs ito tis or t3 l./ ra cr op cl ta viso ic ta ne u che d n w nd n p. re r gi n a i e as .h o pe io ui n c pe hn sen au sc er p ip ct eq io ec e En ed Te s/ ev ed o rt u r at Se Su t pr t er nt p. N rk qu r te rm l th re is ta o e sp st fo s un om n an on pu In ea le eg /c o tw l ea Tr C m H R c co ath N C o Sa A M C Occupational Group FIGURE 4-5 Distribution of the mean prose literacy scaled scores, within one standard deviation, for various occupational groups.
DETERMINING PERFORMANCE LEVELS 103 Scale Score High Category Low Category FIGURE 4-6 Boxplots that illustrate the distribution of prose literacy scores for respondents who were in either the low or high occupation category. al., 1993, Figures 1.4, 1.5, and 1.6, p. 29, 31, 33). This observation led the committee to question the need for reporting three separate literacy scores. Questions about the extent to which the items included on an assessment support the number of scores reported are addressed through a statistical procedure called factor analysis, which is commonly used to examine the cognitive dimensions that underlie a set of test data. Several investigations of the factor structure of NALS have been con- ducted (see, for example, Reder, 1998a, 1998b; Rock and Yamamoto, 1994). These analyses have repeatedly shown high intercorrelations among prose, document, and quantitative scores, suggesting that NALS tasks mea- sure a single dimension of literacy rather than three. The committee chose to conduct its own dimensionality analyses using two different procedures. The first was exploratory in nature and generally replicated procedures
104 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS used by Rock and Yamamoto (1994) but based the analyses on different blocks of items. The results revealed that a three-factor model (reporting scores for prose, document, and quantitative literacy) provided an accept- able fit to the data, although the intercorrelations among the three literacy scales tended to be quite high (mostly above .85). Additional details about this dimensionality analysis appear in Appendix B. The second analysis addressed questions about the relationships be- tween performance in the prose, document, and quantitative areas and an array of literacy outcomes (e.g., years of formal education, being in the labor force, occupation type, self-report about reading activities). Here, using a statistical procedure called structural equation modeling, we inves- tigated the extent to which performance in the three literacy areas was differentially associated with the outcome measures (e.g., that one literacy area was more strongly associated with certain outcomes than another). If differential associations were found, there would be empirical support for using the separate dimensions to guide decision making about adult literacy policy and programs. If the associations were found to be similar across the three literacy areas, one would conclude that either the assessment does not measure the dimensions independently or there is little practical signifi- cance to the distinctions among them. In addition, we sought to determine if a single weighted combination of the prose, document, and quantitative scores adequately described the relationship of measured literacy to the outcome measures. The results indicated that all three types of literacy had statistically significant associations with the outcome measures. The relationship of document literacy to the outcomes was much weaker than that observed for prose or quantitative literacy. In addition, the relationship between prose literacy and the outcomes was slightly stronger than that observed for quantitative literacy. Statistical tests, however, indicated that the relation- ships decreased if either document or quantitative literacy was excluded from the analysis (in statistical terminology, model fit deteriorated if docu- ment or quantitative literacy was ignored). These results highlight the apparently prime importance of prose lit- eracy but also point out that the other dimensions should not be ignored. For the most part, the three types of literacy have similar relationships with each of the outcome measures. That is, if an outcome was strongly related to prose literacy, its relationships with document and quantitative literacy were also relatively strong, and vice versa. There were a few notable excep- tions. For example, quantitative literacy was more highly correlated with earnings and the use of mathematics on the job than one would expect from the relationships of the three types of literacy with the other outcomes. The findings suggest that for some purposes it may be useful to construct a composite of the three literacy scores. Additional details about these analy-
DETERMINING PERFORMANCE LEVELS 105 ses are presented in Appendix B, and we revisit these findings again in Chapter 6. DEVELOPING POLICY-RELEVANT PERFORMANCE LEVELS Although the above-described factor analyses and the analyses of rela- tionships with background data did not lead us to specific performance- level categories, we used the results to guide our decision making about performance levels and their descriptions. We designed a process for deter- mining the performance levels that was iterative and that integrated infor- mation obtained from several sources: our analyses of NALS literacy and background data, feedback from stakeholders, and a review of the test items. This process is described below. The feedback from stakeholders suggested the importance of perfor- mance levels that could be linked to meaningful policy choices and levels of proficiency understood by the public. Our information gathering suggested that stakeholders seek answers to four policy-related questions from NAAL results. They want to know what percentage of adults in the United States: â¢ Have very low literacy skills and are in need of basic adult literacy services, including services for adult English language learners. â¢ Are ready for GED preparation services. â¢ Qualify for a GED certificate or a high school diploma. â¢ Have attained a sufficient level of English literacy that they can be successful in postsecondary education and gain entry into professional, managerial, or technical occupations. Based on the information obtained from data analyses, stakeholder feedback, and review of test items, we initially developed a basic frame- work for the performance-level descriptions that conformed to the policy- related contrasts suggested by the above questions. These contrasts indi- cate points at which public policy effectively draws a line delineating the literacy level adults need or should have by making available extra educa- tional services to those adults below that level. We then developed draft performance-level descriptions corresponding to these groupings to reflect the types of literacy skills generally needed at each level and that were evaluated on the assessment, as determined by a review of the assessment frameworks and test items. The descriptions were revised and finalized by obtaining feedback on various versions of the performance-level descrip- tions from standard-setting panelists on three occasions. The factor analyses revealed high intercorrelations among the three literacy scales, which suggested that a single literacy score would be ad- equate for reporting the assessment results (e.g., an average of prose, docu-
106 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS ment, and quantitative scores). It is quite likely that these dimensions are more independent than they appear to be in NALS, but that they are confounded due to the nature of the NALS tasks. That is, often multiple questions are based on a single stimulus (e.g., a bus schedule) presented to the test taker; the tasks may include questions from all three literacy areas. In addition, stakeholder feedback indicated that the three literacy scores are used for different purposes. We therefore developed a set of performance-level descriptions that includes both an overall description of each performance-level and subject-specific descriptions for prose, docu- ment, and quantitative literacy scales. Based on this process, we recommend the use of five performance levels. We remind the reader that these performance levels are not intended to represent standards for what is required to perform adequately in soci- ety, since the assessment was not designed to support such inferences. To reinforce this, we have intentionally avoided the use of the term âprofi- cientâ in the labels for the performance levels. RECOMMENDATION 4-1: The 2003 NAAL results should be reported using five performance levels for each of the three types of English literacy: nonliterate in English, below basic literacy, basic literacy, intermediate lit- eracy, and advanced literacy. These levels are described in Box 4-1. The recommended levels roughly correspond to the four policy questions posed earlier, with the exception that two levels describe the skills of individuals likely to be in need of basic adult literacy services. That is, the nonliterate in English group includes those whose literacy levels were too low to take NAAL and were adminis- tered the Adult Literacy Supplemental Assessment, and the below basic literacy group includes those who scored low on NAAL. The basic cat- egory is intended to represent the skills of individuals likely to be ready for GED preparation services. Likewise, the intermediate category generally describes the skills of individuals likely to have a GED certificate or a high school diploma. The advanced category is meant to portray the literacy skills of individuals who would be generally likely to succeed in college or postsecondary education. (We caution the reader that, in the end, we had some reservations about the adequacy of NALS and NAAL to measure skills at the advanced level and refer the reader to the discussion in Chapter 5.) The various versions of these descriptions and the process for revising them are described in the next chapter and in Appendix C. In identifying these levels, we were conscious of the fact that one of the chief audiences for NAAL results is adult education programs, which are guided legislatively by the Workforce Investment Act of 1998. Title II of this act mandates an accountability system for adult education programs,
DETERMINING PERFORMANCE LEVELS 107 BOX 4-1 Performance-Level Descriptions Developed for 2003 NAAL Nonliterate in English: May recognize some letters, numbers, and/or common sight words in frequently encountered contexts. Below Basic: May sometimes be able to locate and make use of simple words, phrases, numbers, and quantities in short texts drawn from commonplace contexts and situations; may sometimes be able to perform simple one-step arithmetic op- erations. Basic: Is able to read and understand simple words, phrases, numbers, and quantities in English when the information is easily identifiable; able to locate infor- mation in short texts drawn from commonplace contexts and situations; able to solve simple one-step problems in which the operation is stated or easily inferred. Intermediate: Is able to read, understand, and use written material sufficiently well to locate information in denser, less commonplace texts, construct straightfor- ward summaries, and draw simple inferences; able to make use of quantitative information when the arithmetic operation or mathematical relationship is not spec- ified or easily inferred. Advanced: Is able to read, understand, and use more complex written material sufficiently well to locate and integrate multiple pieces of information, perform more sophisticated analytical tasks such as making systematic comparisons, draw more sophisticated inferences, and can make use of quantitative information when mul- tiple operations or more complex relationships are involved. known as the NRS, that specifies a set of education functioning levels used in tracking the progress of enrollees. Feedback from stakeholders empha- sized the usefulness of creating levels for NAAL aligned with the NRS levels. Although it was not possible to establish a clear one-to-one corre- spondence between NAAL performance levels and the NRS levels, there appears to be a rough parallel between nonliterate in English and the NRS beginning literacy level; between below basic and the NRS beginning basic and low intermediate levels; and between basic and the NRS high interme- diate level. In the next chapter, we detail the process we used for developing de- scriptions for the performance levels, obtaining feedback on them, and revising them to arrive at the final version.