National Academies Press: OpenBook
« Previous: 5 Inclusion and Accommodation
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×

6
Reporting

The National Assessment Governing Board (NAGB) has made very clear its intention that Voluntary National Tests (VNT) results should be reported using National Assessment of Educational Progress (NAEP) achievement levels. Presumably, this means that each student and his or her parents and teachers would be told whether performance on the test reflects below basic, basic, proficient, or advanced mastery of the reading or mathematics skills outlined in the test frameworks.

More specific discussion of reporting has been largely postponed. NAGB reviewed a “Revised Test Result Reporting Work Plan (American Institutes for Research [AIR], April 23, 1998) at its May 1998 meeting. This plan outlined a number of research steps, from literature review through focus groups, that might be undertaken to identify and resolve reporting issues and problems. The plan did not propose any specific policies or even attempt to enunciate key reporting issues. The schedule called for approval of field test reporting plans by NAGB in August 1999, with decisions on reporting for the operational test in August 2000. In this section, we discuss four key issues in reporting VNT results and describe implications of decisions about these issues for other test development activities.

Key Issues

The charge for phase 1 of our evaluation does not emphasize evaluation of reporting plans and, as indicated above, final decisions on many reporting issues are not yet available for review. Nonetheless, several reporting issues are discussed here that we hope will be addressed in the final plans for reporting. These include:

  • the validity of the achievement-level descriptions,
  • communicating uncertainty in VNT results,
  • computing aggregate results for schools, districts, and states, and
  • providing more complete information on student achievement.
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×

The Validity of the Achievement-Level Descriptions

NAEP's procedures for setting achievement levels and their results have been the focus of considerable review (see Linn et al., 1991; Stufflebeam et al., 1991; U.S. General Accounting Office, 1993; National Academy of Education, 1992, 1993a, 1993b, 1996; National Research Council, 1999a). Collectively, these reviews agree that achievement-level results do not appear to be reasonable relative to numerous other external comparisons, such as course-taking patterns and data from other assessments, on which larger proportions of students perform at high levels. Furthermore, neither the descriptions of expected student competencies nor the exemplar items appear appropriate for describing actual student performance at the achievement levels defined by the cutscores. Evaluators have repeatedly concluded that the knowledge and skills assessed by exemplar items do not match up well with the knowledge and skill expectations put forth in the achievement-level descriptions, nor do the exemplars provide a reasonable view of the range of types of performance expected at a given achievement level.

The design of the VNT will expose the achievement-level descriptions to a much higher level of scrutiny than has previously occurred. They will be applied to individual students—not just to plausible values. The classification of students into the achievement levels will be based on a smaller set of items than is used in a NAEP assessment, and all of these items will be released and available for public review. Judgments about the validity of the achievement level descriptions will be based in large part on the degree to which the items used to classify students into achievement levels appropriately match the knowledge and skills covered in the achievement-level descriptions.

In Chapter 2 we recommend greater integration of the achievement-level descriptions with the test specifications, and in Chapter 3 we recommend matching the VNT items to the knowledge and skills in these descriptions. Consideration should also be given to ways in which the link between items and the achievement-level descriptions could be made evident in reporting. For example, the description of proficient performance in 4th-grade reading includes “recognizing an author's intent or purpose,” while the description of advanced performance includes “explaining an author's intent, using supporting material from the story/informational text.” Given these descriptions, it would be helpful to provide information to students classified at the proficient level as to how they failed to meet the higher standard of advanced performance.

Communicating Uncertainty in VNT Results.

Test results are based on responses to a sample of items provided by the student on a particular day. The statistical concept of reliability focuses on how much results would vary over different samples of items or at different times. In reporting aggregate results for schools, states, or the nation, measurement errors are averaged across a large number of students and are not a critical factor in the accuracy of the results. When results are reported for individual students, however, as they will be for the VNT, measurement error is a much more significant issue.

The report of the Committee on Equivalency and Linkage (National Research Council, 1999c) describes how the same student could take several parallel versions of the VNT and end up with different, perhaps even quite different, achievement-level classifications. Such possibilities raise two key issues for reporting:

  • How can uncertainty about test scores best be communicated to parents, teachers, and other recipients of test results?
  • How much uncertainty will users be willing and able to tolerate?
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×

Computing Aggregate Results for Schools, Districts, and States

Another issue identified by the Committee on Equivalency and Linkage concerns differences in reporting individual and aggregate results. NAEP uses sophisticated methodology to provide accurate estimates of the proportion of students at each achievement level. These methods involve conditioning on background variables and creating multiple “plausible values” for each student on the basis of their responses to test questions and their background information. (For a more complete explanation of this methodology, see Allen et al., 1998.)

We believe that student-level reporting will drive the need for accuracy in VNT results, but tolerance for different levels of accuracy in aggregate results should be explored before final decisions about test accuracy requirements are reached. The VNT contractors have begun to discuss alternatives for reporting aggregate results, ranging from somewhat complex procedures for aggregating probabilities of being at each level for each student through ways of distancing results from the two programs so that conflicts will not be alarming and, perhaps, not even visible.

One way of resolving the aggregation issue that has not been extensively discussed would be to generate two scores for each student. The first, called a reporting score, would be the best estimate of each students' performance, calculated either from a tally of correct responses or using an IRT scoring model. The second, called an aggregation score, would be appropriate for estimating aggregate distributions and would be based on the plausible values methodology used for NAEP (see Allen et al., 1998, for a discussion of the plausible values method).

Providing More Complete Information on Student Achievement

A key question that parents and teachers are likely to have is how close a student is to the next higher (or lower) achievement-level boundary. This question is particularly important for the presumably large proportion of students whose performance will be classified as below the basic level of achievement.

Diagnostic information, indicating areas within the test frameworks for which students had or had not achieved targeted levels of proficiency, could serve very useful instructional purposes, pointing to specific areas of knowledge and skill in which students are deficient. The amount of testing time required for providing more detailed information accurately is likely to be prohibitive, however. In addition, the fact that the NAEP and VNT frameworks are designed to be independent of any specific curriculum further limits the instructional value of results from the VNT.

Using subcategories or a more continuous scale (such as the NAEP scale) for reporting nearness to an achievement boundary may be much more feasible given current test plans for length and accuracy levels. It might be possible, for example, to report whether students are at the high or low end (or in the middle) of the achievement level in which they are classified. Using such a scale, however, would require acceptance of an even greater level of uncertainty than would be needed for the achievement-level reporting.

Conclusions

Our key conclusion with regard to reporting is that a clear vision of how results will be reported should drive, rather than follow, other test development activities. If NAEP achievement-level descriptions are used in reporting, the map of test items to specific elements of these descriptions should be made evident. Decisions that affect factors that influence the accuracy of VNT results will also have

Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×

to be made well in advance of the dates proposed for NAGB approval of reporting plans. As described above, decisions about test length, a key determinant of test score accuracy, have already been made without careful consideration of the level of accuracy that can be obtained with the specified test length. Other factors, such as item calibration and equating and linking errors, also influence the accuracy of VNT test results.

Methods used by NAEP, including conditioning and plausible values, are not appropriate for reporting individual student results and are not needed for VNT. Without some adjustments, however, VNT results for individual students, when aggregated up to the state level will disagree, in some cases markedly, with NAEP estimates of student proficiency, so that the credibility of both programs will be jeopardized. This will occur even if there are no differences in the levels of student motivation for the VNT and NAEP.

No decision has been made about whether and how results will be reported in addition to the achievement levels. It seems likely that students, particularly those in the below basic category, will benefit from additional information, as will their parents and teachers.

Recommendations

6-1. NAGB should accelerate its discussion of reporting issues, with specific consideration of the relationship between test items and achievement-level descriptions.

Rather than waiting until August 1999, it would be prudent for NAGB and its contractors to determine how achievement-level information will be reported and examine whether items are sufficiently linked to the descriptions of the achievement levels. In addition, attention is needed to the level of accuracy likely to be achieved by the VNT as currently designed and to ways of communicating the corresponding degree of certainty to potential test users.

6-2. NAGB should develop ways of communicating information to users about measurement error and other sources of variation in test results.

6-3. NAGB should develop and review procedures for aggregating student test results prior to approving the field test reporting plan.

6-4. NAGB and AIR should develop and tryout alternative ways of providing supplemental test result information. Policies on reporting beyond achievement-level categories should be set prior to the field test in 2000, with a particular focus on students who are below the basic level of achievement.

Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×
Page 48
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×
Page 49
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×
Page 50
Suggested Citation:"6 Reporting." National Research Council. 1999. Evaluation of the Voluntary National Tests: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/6324.
×
Page 51
Next: References »
Evaluation of the Voluntary National Tests: Phase 1 Get This Book
×
Buy Paperback | $47.00 Buy Ebook | $37.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

In his 1997 State of the Union address, President Clinton announced a federal initiative to develop tests of 4th-grade reading and 8th-grade mathematics that would provide reliable information about student performance at two key points in their educational careers. According to the U.S. Department of Education, the Voluntary National Tests (VNT) would create a catalyst for continued school improvement by focusing parental and community-wide attention on achievement and would become new tools to hold school systems accountable for their students' performance. The National Assessment Governing Board (NAGB) has responsibility for development of the VNT. Congress recognized that a testing program of the scale and magnitude of the VNT initiative raises many important technical questions and requires quality control throughout development and implementation. In P.L. 105-78, Congress called on the National Research Council (NRC) to evaluate a series of technical issues pertaining to the validity of test items, the validity of proposed links between the VNT and the National Assessment of Educational Progress (NAEP), plans for the accommodation and inclusion of students with disabilities and English-language learners, plans for reporting test information to parents and the public, and potential uses of the tests. This report covers phase 1 of the evaluation (November 1997-July 1998) and focuses on three principal issues: test specifications and frameworks; preliminary evidence of the quality of test items; and plans for the pilot and field test studies, for inclusion and accommodation, and for reporting VNT results.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!