Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Executive Summary I n todayâs society, literacy is an essential skill, one that helps people thrive individually, socially, and economically. Literacy is important for all aspects of life, from handling personal affairs, to engaging in the workforce, to participating in a democratic society. Literacy skills are criti- cal both for individualsâ functioning and for a well-functioning society. Literacy has an impact on a nationâs economic status, the well-being of its citizens, the capabilities of its workforce, and its ability to compete in a global society. Deficiencies in literacy and mismatches between the skills of citizens and the needs of an economy can have serious repercussions. Policy makers rely on assessments of literacy to evaluate both the ex- tent of such mismatches and the need for services that provide basic literacy skills to adults. Such assessments can provide the foundation and impetus for policy interventions. The National Adult Literacy Survey (NALS) was designed to provide such information. NALS was a household survey of a nationally representative sample of 26,000 adults age 16 and older conducted by the U.S. Department of Education in 1992. It built on two prior literacy assessments that were more limited in scope, the 1985 Young Adult Literacy Survey of 21- to 28- year-olds and a national survey of job seekers in 1990. The 1992 assessment was designed to assess adultsâ ability to apply their literacy skills to everyday materials and tasks. NALS measured three dimensions of functional literacy using a wide array of tasks and materials encountered in daily life. Prose literacy measured skill in understanding information presented in continuous texts (e.g., a newspaper article). Docu- ment literacy reflected skill in using information presented in graphs, fig- 1
2 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS ures, and tables (e.g., a bus schedule). Quantitative literacy assessed skill with using arithmetic operations on numbers presented in scenarios, texts, or documents (e.g., a product advertisement). Performance on NALS re- flected both the difficulty of the tasks and the complexity of the materials. To provide information that could more easily be understood and used by policy makers and the public, the test designers grouped scores on NALS into five performance levels. Brief descriptions of the levels were provided and the percentage of adults whose scores fell into each performance level was reported along with summary measures of the scores. A decade later, the Department of Education implemented plans for a successor to NALS, called the National Assessment of Adult Literacy (NAAL), conducted in 2003. NAAL was designed to produce some new information while retaining enough consistency with the 1992 assessment to evaluate trends over the ensuing decade. NAAL includes additional health related materials intended to yield a measure of health literacy in addition to scores in prose, document, and quantitative literacy. Two other components were added to increase the information gathered about adults with low-level English literacy skills: the Fluency Addition and the Adult Literacy Supplemental Assessment (ALSA). In preparation for release of NAAL results, the Department sought advice from the National Research Councilâs Board on Testing and Assessment about developing performance levels for the assessment. PROBLEM STATEMENT NALS was intended to describe the range of English literacy skills of adults in the United States. The performance levels used to report the 1992 results were designed as a means for communicating about adultsâ literacy skills, but they were not meant to reflect policy-based judgments about expectations for adult literacy. That is, the procedures used to develop the assessment did not involve identifying the level of skills adults need in order to function adequately in society. When findings from the 1992 survey were released, however, the performance levels were interpreted and discussed as if they represented standards for the level of literacy adults should have. The lowest two levels were referred to as inadequate, so low that adults with these skills would be unable to hold a well-paying job. The results of the assessment and these sorts of unsupported inferences about the results provoked widespread controversy in the media and among experts in adult literacy about the extent of literacy problems in the country. In response to the departmentâs request for advice, the Committee on Performance Levels for Adult Literacy was established and charged to:
EXECUTIVE SUMMARY 3 â¢ Review and evaluate the procedures for determining the perfor- mance levels for the 1992 National Adult Literacy Survey and â¢ Recommend a set of performance levels for the 2003 National Assessment of Adult Literacy that are valid, appropriate, and permit com- parisons between the 1992 and the 2003 results. Through a process detailed below, the committee has determined that five performance levels should be used to characterize the status of English language literacy in the United States: nonliterate in English, below basic literacy, basic literacy, intermediate literacy, and advanced literacy. DETERMINING THE 1992 PERFORMANCE LEVELS AND CUT SCORES The process for determining the 1992 performance levels is described in the technical manual for NALS. The test designers developed a process for determining the levels that drew on analyses conducted with the earlier literacy assessments. The process involved making judgments about fea- tures of the test questions that contributed to their complexity (e.g., the amount of distracting information) and rating the items according to these features. The questions were rank-ordered from least to most difficult ac- cording to a statistical estimate of each questionâs difficulty. The listing of questions was visually inspected for natural break points in the complexity ratings. Four break points were identified and converted to scale scores that became the cut scores used to separate the five performance levels. Narra- tive descriptions characterized the cognitive complexity of the items consti- tuting each level. The statistical estimate of each questionâs difficulty used to rank-order the questions was calculated to represent a certain chance of responding correctly. In the language of test design, this chance is called a âresponse probability.â The choice of a specific response probability value is an im- portant decision because it affects the value of the cut scores used to sepa- rate the performance levels: the cut scores could be higher or lower simply as a consequence of the response probability selected. In 1992, the test designers chose to use a response probability of 80 percent for NALS. This decision has been the subject of debate, largely centering on whether it led to overly high cut scores, thus underestimating the literacy of adults in the United States. Like many decisions made in connection with developing a test, the choice of a response probability value requires both technical and nontech- nical considerations. The decision should be based on the level of confi- dence one wants to have that examinees have truly mastered the content and skills assessed, but it should also reflect the objectives for the test, the
4 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS ways the test results are used, and the consequences associated with these uses. Choice of a response probability value requires making a judgment, and reasonable people may disagree about which of several options is most appropriate. Committeeâs Evaluation Some of the more important details about the process for determining the 1992 performance levels were not specified in the NALS technical manual, such as who participated in producing the complexity ratings and exactly how decisions were made about the break points. Although the test designers appear to have selected a response probability of 80 percent to represent the concept of mastery, as is sometimes used in the field of educa- tion, the reasons for this choice were not fully documented in the technical manual. It is therefore difficult to fully understand the process and how it was carried out. It is our opinion that a more open and public process combined with more explicit documentation would lead to better understanding of how the performance levels were determined and what inferences could be based on them. An open process would be in line with currently accepted guide- lines for educational and psychological testing. There is a broad literature on procedures for developing performance levels and setting cut scores. This literature documents the methods and ways to systematize the process of setting cut scores. Use of established procedures for setting cut scores allows one to draw from the existing research and experiential base and facilitates communication with others about the general process. We therefore decided to pursue use of these methods in our process for determining performance levels and cut scores. DEVELOPING NEW PERFORMANCE LEVELS Based on our review of the procedures used for determining the 1992 performance levels, we decided to embark on a systematic process to deter- mine a new set of performance levels. We established as overriding prin- ciples that the process should model exemplary practices, be conducted in an open and public way, and be explained in a manner that permits replica- tion and invites constructive criticism. Our range of options for new perfor- mance levels was substantially narrowed, however, by prior decisions about test development, the scope of content and skills to be covered, and the background information gathered from assessment participants. Typically, when the objective of a test is to report results according to performance levels, the desired performance categories are articulated early
EXECUTIVE SUMMARY 5 in the development phase and serve as the foundation for test development. With the number of levels and their descriptions laid out in advance, devel- opment efforts can focus on constructing items that measure the skills described by the levels and in sufficient number to provide reliable results, particularly at the boundaries between performance levels. Determining performance levels after the test development process is complete does not represent exemplary practice. Furthermore, because the assessment was not designed to provide information about what adults need to function adequately in society, there was no way for us to develop performance levels that would support such inferences. Nevertheless, we agreed to assist with the challenging problems of communicating about adultsâ literacy skills and improving understanding of the findings. We sought to determine performance levels that would describe adultsâ literacy skills and be relevant to public policy on adult literacy. The decision to design a new set of performance levels meant that the committee needed to address questions related to the number of levels, the cut scores for the levels, and whether separate performance-level descrip- tions should be developed for each of the three literacy scales. Feedback from stakeholders suggested they seek answers to four policy-related ques- tions. They want to know what percentage of adults in the United States: â¢ Have very low literacy skills and are in need of basic adult literacy services, including services for adult English language learners. â¢ Are ready for GED (general educational development) preparation services. â¢ Qualify for a high school diploma. â¢ Have attained a sufficient level of English literacy that they can be successful in postsecondary education and gain entry into professional, managerial, or technical occupations. The committeeâs process for determining the performance-level descrip- tions involved a combination of data analyses, stakeholder feedback, and review of the test specifications and actual test questions. Our analytic work revealed very high correlations among the three literacy scales, which suggested that a single literacy score combining the three scales would be sufficient for reporting the assessment results. Stakeholder feedback indi- cated that the three literacy scores provide information that is useful for other purposes. We therefore developed performance-level descriptions that include both an overall description for each level as well as descriptions specific to the prose, document, and quantitative literacy scales. Based on our information-gathering activities and analytic work, we recommend the use of five performance levels that correspond to the policy-
6 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS related questions identified by stakeholders.1 We remind the reader that these performance levels are not intended to represent standards for what is required to perform adequately in society because the assessments were not developed to support such inferences. RECOMMENDATION 4-1: The 2003 NAAL results should be reported using five performance levels for each of the three types of English literacy: Nonliterate in English, below basic literacy, basic literacy, intermediate literacy, and advanced literacy. A brief description of each level appears below: Nonliterate in English: may recognize some letters, numbers, or common sight words in everyday contexts. Below Basic Literacy: may sometimes be able to locate and use simple words, phrases, and numbers in everyday contexts and perform simple one-step arithmetic operations. Basic Literacy: is able to read simple words, phrases, and numbers in everyday contexts when the information is easily located and able to solve one-step problems. Intermediate Literacy: is able to read and use written materials to locate information in denser, less commonplace texts, summarize infor- mation, draw simple inferences, and make use of quantitative informa- tion when the arithmetic operation is not easily inferred. Advanced Literacy: is able to read and use more complex written material to integrate multiple pieces of information, perform analytical tasks, draw more sophisticated inferences, and make use of quantita- tive information when more complex relationships are involved. Each performance level was intended to correspond to one of the policy-related questions suggested by stakeholders, with the exception that the two lowest levels both address the first question. The reason for this is attributable to differences between the 1992 and 2003 assessments. Be- cause a significant number of 1992 participants were unable to complete any of the NALS questions, the supplemental ALSA was added in 2003 as a separate low-level component. A set of screening questions was used to determine which component, ALSA or NAAL, participants should take; the nonliterate in English category encompasses those who were assigned to take ALSA. This screening procedure was not used in 1992, however, so no one from the earlier assessment can be classified into the nonliterate in English category. Thus, the nonliterate in English and below basic catego- 1Recommendation numbers refer to the report chapter in which they are made and the sequence in which they appear in the chapter.
EXECUTIVE SUMMARY 7 ries will need to be combined to permit comparisons between NAAL and NALS. In identifying these levels, we were conscious of the fact that one of the audiences for NAAL results will be adult education programs, which are for the most part guided legislatively by the Workforce Investment Act of 1998, Title II, Adult Education and Family Literacy Act. This act mandates the National Reporting System (NRS), which specifies a set of education functioning levels used in tracking progress of adult education program enrollees. Although it was not possible to establish a one-to-one correspon- dence between the NAAL and NRS levels, there appears to be a rough parallel between nonliterate in English and the NRS beginning literacy level; between below basic and the NRS beginning basic and low intermedi- ate levels; and between basic and the NRS high intermediate level. Setting Cut Scores The literature on setting achievement levels documents the strengths and weaknesses of various methods of setting cut scores. A review of these critiques quickly reveals that there are no perfect methods. Like the cut score-setting process itself, choice of a procedure requires making an in- formed judgment about the most appropriate method for a given assess- ment situation. Based on our review, we decided to use the bookmark standard-setting method and to evaluate the reasonableness of the resulting cut scores by comparing them with data from the assessmentâs background questionnaire. We held two bookmark standard-setting sessions, in July 2004 to examine the NALS data and in September 2004 using the NAAL data. Given the public debate about the response probability value chosen for NALS, we decided to examine the impact of three commonly used response probability values (50, 67, and 80 percent) on the July bookmark standard-setting process. Analyses of the results from the July standard setting revealed that use of different response probability values produced different cut scores. The committee considered this finding, along with feedback from panelists, as well as other factors (e.g., the uses of the assess- ment results) to inform their choice of response probability values for the September standard setting. Panelist feedback about applying a probability level of 50 percent tended to be negative, which contributed to our view that it was not a viable option. The committee judged that a probability level of 80 percent was overly stringent given the uses of the assessment results. We therefore decided that the September bookmark panelists should use a moderate response probability level of 67 percent, the value generally recommended in the literature by the developers of the bookmark proce- dure. This is not to suggest that a response probability of 67 percent would
8 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS be appropriate for all situations in which cut scores must be set. We ac- knowledge that some stakeholders for the present assessment, such as those in the health field, would argue for a response probability of 80 percent to reflect the critical importance of correctly using health-related materials to accomplish health tasks. We examined the cut scores that emerged from the bookmark proce- dures in relation to relevant background information and made slight ad- justments. We make the following recommendation with regard to the cut scores for the performance levels: RECOMMENDATION 5-1: The scale-score intervals associated with each of the levels should be as shown below for the prose, document, and quan- titative dimensions of literacy. Nonliterate Below in English Basic Basic Intermediate Advanced Prose: Took 0-209 210-264 265-339 340-500 ALSA Document: Took 0-204 205-249 250-334 335-500 ALSA Quantitative: Took 0-234 235-289 290-349 350-500 ALSA We note that although these scale-score intervals reflect extensive data collection, statistical analysis, and informed judgment, their precision should not be overemphasized. If another standard setting was held with different panelists, it is likely that the cut scores would vary to some extent. Initially, the committee hoped to set cut scores for an overall score that combined the three literacy areas. This was not possible, however, because the statistical procedures used to estimate each questionâs difficulty level were not run in a way that would allow the combination of questions from the different literacy areas. Thus, although we provide a set of overall performance levels that combine the descriptions for each literacy area, cut scores could not be set on an overall scale. We note that there are significant problems at both the lower and upper ends of the literacy scale. At the lower end of the scale, the problems relate to the test designersâ decision to develop ALSA as a separate component and to not place ALSA and NAAL scores on the same scale. With regard to the upper end of the scale, feedback from the bookmark panelists, com- bined with our review of the items, suggests that the assessment does not adequately cover the upper end of the distribution of literacy skills. We note that there is growing public concern about readiness for college-level work and preparedness for entry into technical and professional occupations, but
EXECUTIVE SUMMARY 9 NAAL, as currently designed, will not allow for detection of problems at that level. It is therefore with some reservations that we include the ad- vanced category in our recommendation for performance levels, and we leave it to the Department of Education to ultimately decide on the utility and meaning of this category. With regard to these issues, we recommend: RECOMMENDATION 5-2: Future development of NAAL should include more comprehensive coverage at the lower and upper ends of the con- tinuum of literacy skills. At the lower end, the assessment should include evaluation of the extent to which individuals are able to recognize letters and numbers and read words and simple sentences, to allow determination of which individuals have the basic foundation skills in literacy and which individuals do not. This assessment should be part of NAAL and should yield information used in calculating scores for each of the literacy areas. At the upper end of the continuum of literacy skills, future development should include assessment items necessary to identify the extent to which policy interventions are needed at the postsecondary level and above. OTHER ISSUES Communicating Survey Results Experience with the initial release and subsequent media coverage of the 1992 NALS results highlighted the critical importance of clearly com- municating assessment results so they are interpreted correctly and are useful to the various audiences concerned about adult literacy in the United States. The substantive challenge will be to convey the message that literacy is not a unidimensional concept or an all-or-nothing state. That message will be most understandable and useful to the public and policy makers if it is anchored in the competencies and life circumstances associated with each performance level and each of the three literacy areas. We therefore encourage the Department of Education to present the NAAL results with implications of their relevance for different contexts in which adults function, such as employment and the workplace, health and safety, home and family, community and citizenship, consumer economics, and leisure and recreation as well as the different aspects of life affected by literacy. In addition, the department should prepare different versions of the performance-level descriptions that are tailored to meet the needs of vari- ous audiences. Simple descriptions of the performance levels should be prepared for general audiences to enhance public understanding. More detailed descriptions should be developed to be responsive to the needs of
10 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS other users. The report includes several versions of performance-level de- scriptions that could serve as a starting place for such efforts. Policy Interventions for Low-Literate Adults With the development of ALSA in 2003, specific attention was focused on the skills of low-literate adults, and one would expect that many services will be directed at the needs of this group. The nonliterate in English and below basic categories are likely to be heterogeneous, encompassing En- glish speakers who have weak literacy skills, non-English speakers who are highly literate in their native languages but not in English, and non-English speakers who are not literate in any language. Distinctly different services and strategies will be needed for these groups. Reports of the percentages of adults in the nonliterate in English and below basic categories should distinguish among native English speakers and non-English speakers. This will allow for more appropriate conclusions to be drawn about the extent of literacy problems among native English- speaking adults and the share of adults in the United States who are still learning English and therefore cannot handle literacy tasks in English. Exemplifying the Performance Levels Presentations of the 1992 results included samples of released NALS items that illustrated the skills represented by each of the performance levels. Items were used to exemplify (or were âmappedâ to) the perfor- mance level at which there was an 80 percent probability of an examineeâs responding correctly. Mapping procedures are useful for communicating about test performance, but we suggest that the department carefully con- sider the ways in which released items are used to illustrate the skills repre- sented by the performance levels. The simplest displays should avoid the use of response probabilities and just indicate the proportion of people in a given level who can do the item. If the department decides to use an item- mapping procedure, we suggest that presentations include more than one response probability for each item (e.g., 80 and 60 percent) and encourage use of displays that emphasize that individuals at every score point and each performance level have some probability of responding correctly to each item. This will stimulate understanding of the strengths and weaknesses of those scoring at each level. Developing a Dissemination Strategy To ensure that an accurate, nuanced message is effectively conveyed, the department should consider a variety of dissemination strategies be-
EXECUTIVE SUMMARY 11 yond publication of the results, press releases, and news conferences. This should include information on the type of literacy that is assessed and recognition that many of the individuals who score in the lowest levels are English learners. We encourage the department to enlist the services of communication professionals to develop materials that present a clear and accurate message; to pilot test the interpretation of those materials with focus groups; and to revise them as appropriate before release. A briefing strategy should be developed that includes prebriefings for department policy makers and congressional staff. These groups should be briefed in detail on the supportable inferences from the findings before the official release of NAAL results. Future Literacy Assessments The committee understands that there are currently no plans to con- duct a follow-up to NAAL. In our judgment, ongoing assessment of the literacy skills of this nationâs adults is important, and planning for a follow- up to NAAL should begin now. In an effort to be forward looking, we offer several suggestions for ways to improve the assessment instrument and expand on the literacy skills assessed. Demand-Side Analysis of Critical Skills It is clear from the conclusions drawn about the 1992 results that stakeholders expected the findings to inform them about the percentages of adults whose literacy skills were adequate to function well in society. Al- though NALS was not designed for this purpose, an assessment could be designed to support interpretations about the skills adults need and should have. Many testing programs, such as those used to make credentialing decisions, begin the development process by gathering information from experts in the specific fields about the skills and capabilities essential to successful performance. NALS and NAAL currently draw items from six contexts of daily life. An alternate approach to test development would analyze the literacy demands in each context and identify the essential proficiencies. The standard-setting process could then articulate the level of skills required to adequately function in the six areas. Each new version of NAAL should update the items to reflect current literacy requirements and expectations in each context but also retain some time-invariant items to allow for trend analysis. We therefore suggest that the department work with relevant domain- specific experts, stakeholders, and practitioners to identify the critical lit- eracy demands in at least six contexts: work, health and safety, community and citizenship, home and family, consumer economics, and leisure and
12 MEASURING LITERACY: PERFORMANCE LEVELS FOR ADULTS recreation. Future generations of NAAL should be designed to measure these critical skills and should be developed from the outset to support standards-based inferences about the extent to which adults are able to perform these critical skills. Feedback from experts in each of the contexts could also be used to expand the information collected on the background questionnaire, follow- ing the procedures used to design additional questions about individualsâ health and safety habits on the 2003 background questionnaire. Similar procedures could be used to link demand-side analyses with the construc- tion of the background questionnaire items for each context. This approach would also facilitate the validation of performance standards. Broadening the Scope of Coverage Several decisions made during the design of NAAL served to narrow its focus and the type of information obtained. In our view, future generations of NAAL could be broadened in terms of content coverage and sampling procedures. Quantitative literacy, as conceived for NALS and NAAL, evaluates relatively basic arithmetic skills but commingles evaluation of mathematics skills with reading and locating information in texts. Other adult literacy assessments (e.g., the Adult Literacy and Lifeskills Survey) have moved to include a more mathematically based numeracy component. Neither NALS nor NAAL was meant to be a formal test of mathematical proficiency in higher level domains, and we are not suggesting that this should be the case. We think, however, that the mathematical demands of a technological society require more than a basic grasp of whole numbers and money as reflected in NAAL. The department should consider revising the quantitative literacy com- ponent on future assessments of adult literacy to include a numeracy component assessed as a separate construct, less tied to prose or document literacy but still reflective of the types of tasks encountered by adults in everyday life situations. The numeracy skills to include on the assessment should be identified as part of an analysis of critical literacy demands in six content areas. The types of numeracy skills assessed on the Adult Literacy and Lifeskills Survey could serve as a starting place for identifying critical skills. Currently NAAL collects background information only from those who speak sufficient English or Spanish to understand and respond to the screen- ing and background questions. No information is collected about those who do not speak English or Spanish, unless an interpreter happens to be present at the time of the assessment, and even then the information col- lected is only about age, ethnicity, and gender. We recognize that NAAL is
EXECUTIVE SUMMARY 13 intended to be an assessment of English literacy skills and that assessing competence in other languages is not the goal. Nevertheless, it is important to paint a nuanced picture of the skills and backgrounds of the entire population. If background questions were asked in a language newcomers could understand, the range of information obtained would be much broader, and policy makers would gain a more accurate picture of the literacy needs in this country. The department should seek to expand the information obtained about non-English speakers in future assessments of adult literacy, including back- ground information about formal education, training and work experience here and abroad, and self-reports about the use of print materials in lan- guages other than English. Efforts should also be made to be more struc- tured in collecting background information about individuals who speak languages other than English and Spanish and to better address the chal- lenges of translation. For NALS and NAAL, literacy has been construed in a specific way. The concept of literacy changes over time, however, as expectations for knowledge and skill levels increase, and it changes with the advent of new mediating technologies. We suggest that the definition of literacy be recon- sidered and possibly broadened for future assessments of adultsâ literacy skills. Issues that should be considered in developing the definition include assessment of writing and composition skills; assessment of technology mediated literacy skills; and the role of computers, the Internet, and tech- nology in evaluating literacy skills. CONCLUSION The committee has suggested some far-reaching recommendations for future literacy assessments. Most notably, we recommend an alternative approach to test development, one that considers the tasks of daily living to identify the critical literacy demands that will guide development of the item pool. This approach could change the nature of the assessment, the test administration processes, and the meaning of the results. We recognize that such extensive modifications of the assessment would make it difficult to measure trends in adult literacy, which is also an important goal. These competing goals must be carefully weighed in the design of future assess- ments. Regardless of whether any of the proposed changes are imple- mented, the committee recommends that, in the future, the process of devel- oping the performance levels be carried out concurrently with the process of designing the assessment and constructing the items.