Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Prepublication copy, uncorrected proofs A Pragmatic Future for NAEP: Containing Costs and Updating Technologies Panel on Opportunities for the National Assessment of Educational Progress in an Age of AI and Pervasive Computation: A Pragmatic Vision Committee on National Statistics Division of Behavioral and Social Sciences and Education A Consensus Study Report of
Prepublication copy, uncorrected proofs THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001 This activity was supported by a contract between the National Academy of Sciences and the U.S. Department of Education, under Sponsor Award No. 9199-00-21-C-0002. Support for the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation, a National Agricultural Statistics Service cooperative agreement, and several individual contracts. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project. International Standard Book Number-13: International Standard Book Number-10: Digital Object Identifier: https://doi.org/10.17226/26427 Additional copies of this publication are available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu. Copyright 2022 by the National Academy of Sciences. All rights reserved. Printed in the United States of America Suggested citation: National Academies of Sciences, Engineering, and Medicine. (2022). A Pragmatic Future for NAEP: Containing Costs and Updating Technologies. Washington, DC: The National Academies Press. https://doi.org/10.17226/26427.
Prepublication copy, uncorrected proofs The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president. The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president. The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president. The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine. Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Prepublication copy, uncorrected proofs Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the studyâs statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committeeâs deliberations. Each report has been subjected to a rigorous and independent peer-review process and represents the position of the National Academies on the statement of task. Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies. For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
Prepublication copy, uncorrected proofs PANEL ON OPPORTUNITIES FOR THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS IN AN AGE OF AI AND PERVASIVE COMPUTATION: A PRAGMATIC VISION KAREN J. MITCHELL (Chair), Association of American Medical Colleges (retired) ISSAC I. BEJAR, Educational Testing Service (retired) SEAN PATRICK (JACK) BUCKLEY, Roblox, New York, NY BRIAN GONG, Center for Assessment, Dover, NH ANDREW D. HO, Harvard Graduate School of Education STEPHEN LAZER, Questar Assessment Incorporated, Cape May, NJ SUSAN M. LOTTRIDGE, Cambium Assessment, Inc., Harrisonburg, VA RICHARD M. LUECHT, School of Education, University of North Carolina at Greensboro ROCHELLE S. MICHEL, Curriculum Associates, Lawrenceville, NJ SCOTT NORTON, Council of Chief State School Officers, Baton Rouge, LA JOHN WHITMER, Federation of American Scientists, Davis, CA STUART W. ELLIOTT, Study Director JUDITH KOENIG, Senior Program Officer ANTHONY MANN, Program Associate Note: See Appendix B, Disclosure of Unavoidable Conflict of Interest. FM-v
Prepublication copy, uncorrected proofs COMMITTEE ON NATIONAL STATISTICS ROBERT M. GROVES (Chair), Office of the Provost, Georgetown University LAWRENCE D. BOBO, Department of Sociology, Harvard University ANNE C. CASE, Woodrow Wilson School of Public and International Affairs, Princeton University MICK P. COUPER, Institute for Social Research, University of Michigan JANET M. CURRIE, Woodrow Wilson School of Public and International Affairs, Princeton University DIANA FARRELL, JPMorgan Chase Institute, Washington, DC ROBERT GOERGE, Chapin Hall at the University of Chicago ERICA L. GROSHEN, School of Industrial and Labor Relations, Cornell University HILARY HOYNES, Goldman School of Public Policy, University of California, Berkeley DANIEL KIFER, Department of Computer Science and Engineering, The Pennsylvania State University SHARON LOHR, School of Mathematical and Statistical Sciences, Arizona State University, Emerita JEROME P. REITER, Department of Statistical Science, Duke University JUDITH A. SELTZER, Department of Sociology, University of California, Los Angeles C. MATTHEW SNIPP, School of the Humanities and Sciences, Stanford University ELIZABETH A. STUART, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health JEANNETTE WING, Data Science Institute and Computer Science Department, Columbia University BRIAN HARRIS-KOJETIN, Director MELISSA CHUI, Deputy Director CONSTANCE F. CITRO, Senior Scholar FM-vi
Prepublication copy, uncorrected proofs Preface The National Assessment of Educational Progress (NAEP) has long served an important role in helping educators, policy makers, and the public understand what students in the United States know and can do. It regularly reports on achievement in three grades, doing so with sophisticated sampling and estimation procedures that minimize the amount of testing time and maximize the quality and reliability of the scores. It is known for the integrity of the trend information it provides and for illuminating achievement differences among subgroups. The NAEP program recognizes the value of staying current with measurement practices. When the measurement field began relying on new item types, NAEP adapted, figuring out ways to incorporate new approaches into its practices: constructed-response items, performance tasks, hands-on science experiments, and multiformat tasks to measure complex problem-solving skills. However, NAEP has not kept pace with the measurement fieldâs pursuit of innovative ways to evaluate what students know and can do using artificial intelligence methods. Computer- adaptive testing, automated item generation, and automated scoring are all are rapidly making inroads into K-12 assessment with the promise of increased efficiency and lower costs. At the same time, cost containment has increasingly become an issue for NAEP. While NAEP is a highly respected program and a source of valuable information about Americaâs school children, it is also very expensive. Artificial intelligence and other contemporary methods offer the potential to control costs and increase efficiency, enabling NAEP to continue well into the future. In this context, the Institute of Education Sciences (IES) of the U.S. Department of Education asked the National Academies of Sciences, Engineering, and Medicine (the National Academies) for advice about ways to maintain NAEPâs role as a leader in educational testing without making it cost prohibitive. This report is the response to that request. The report would not have been possible without the contributions of many people. On behalf of the panel, I extend our deepest appreciation to the sponsor of this work: without support from IES and staff with the National Center for Education Statistics (NCES), this study would not have come to fruition. In particular, we thank Mark Schneider, director of IES; Peggy Carr, commissioner, and William Tirre, senior technical advisor, at NCES; and the staff in the Assessment Division of NCES, including Gina Broxterman, Jing Chen, Allison Deigan, Enis Dogan, Pat Etienne, Eunice Greer, Shawn Kline, Dan McGrath, Nadia McLaughlin, Eddie Rivers, Holly Spurlock, and Bill Ward. Our colleagues at NCES spent countless hours responding to the panelâs questions about different aspects of the NAEP program. We are grateful to Chair Haley Barbour of the National Assessment Governing Board and the members of NAGBâs Executive Committee, who met with members of the panel in August of 2021. In addition, we would like to thank the Governing Board staff, particularly Lesley Muldoon and Matt Stern, who provided the panel with insights about NAGBâs role and perspective on a number of issues. As part of the panelâs desire to place NAEP in context, we benefited from information about other testing programs. Andreas Schleicher, at the Organization for Economic Cooperation and Development, provided information about the Program for International Student Assessment FM-vii
Prepublication copy, uncorrected proofs (PISA). Joyce Zurkowski, of the Colorado Department of Education, provided us with an understanding of Coloradoâs state assessment program. In finalizing the draft report, the panel asked for help in fact-checking the sections of the report that described aspects of the NAEP program, as well as other assessments (PISA and the Colorado state assessment program). The individuals noted above who originally provided this informationâfrom IES, NCES, NAGB, OECD, and the Colorado Department of Educationâ reviewed portions of the text that reflected their input to the panelâs work and corrected any inaccuracies. The panel is grateful for this additional assistance. This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process. We thank the following individuals for their review of this report: Sybilla Beckmann, Department of Mathematics, Emeritus, University of Georgia; Matthew Chingos, Education and Data Policy, The Urban Institute; Steven A. Culpepper, Department of Statistics and Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign; Kristen Huff, Assessment and Research, Curriculum Associates, MA; Neal Kingston, Achievement and Assessment Institute and Department of Educational Psychology, University of Kansas; Kenneth R. Koedinger, Pittsburgh Science of Learning Center and School of Computer Science, Carnegie Mellon University; P. David Pearson, Graduate School of Education, University of California, Berkeley; Shelley Loving-Ryder, Virginia Department of Education; Mark D. Shermis, Principal, Performance Assessment Analytics, TX; Martha L. Thurlow, National Center for Educational Outcomes, University of Minnesota; David Williamson, Psychometrics, The College Board; Phoebe C. Winter, Independent Consultant, VA; Marcelo Aaron Bonilla Worsley, School of Education and Social Policy, Northwestern University; and Rebecca J. Zwick, Distinguished Presidential Appointee, Educational Testing Service. Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by Diana C. Pullin, Lynch School of Education and School of Law, Boston College, and Catherine L. Kling, Atkinson Center for Sustainability, Cornell University. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring panel and the National Academies. The panel also extends its gratitude to members of the staff of the National Academies for their significant contributions to this report. Anthony Mann organized our virtual meetings and guided us through the many administrative procedures. Kirsten Sampson Snyder and Yvonne Wise shepherded the report through the review and production process, and consultant Eugenia Grohman provided her always-sage editorial advice. Stuart Elliott, study director, and Judy Koenig, senior program officer, masterfully oversaw the design of the study, interviewed experts, recruited the panel, gathered resources and data, and guided the study with intelligence and care. They helped the panel get its bearings, FM-viii
Prepublication copy, uncorrected proofs become familiar with parts of the program they did not know, work their way through difficult topics, and focus on the most pressing issues. The panelâs work rests on their diligent efforts. To my colleagues on the panel, it would be an understatement to say that I was inspired by your wisdom and dedication to improving this important marker of the progress of U.S. students. Your deep knowledge, careful thought, and intelligent analysis form the foundation of this report. You gave generously of your expertise and time to ensure that the report represents the panelâs consensus findings and recommendations and that it suggests a viable path for NAEPâs future. Thank you. Karen J. Mitchell, Chair Panel on Opportunities for the National Assessment of Educational Progress in an Age of AI and Pervasive Computation: A Pragmatic Vision FM-ix
Prepublication copy, uncorrected proofs Contents Executive Summary ES-1 1 Introduction 1-1 Charge to the Panel The Panelâs Approach 2 NAEP Overview: Structure, Goals, and Costs 2-1 Structure Distinctive Goals and Processes Costs 3 Possible Structural Changes 3-1 Changing the Way Trends Are Monitored and Reported Integrating Assessments for Subjects with Overlapping Content 4 Item Development 4-1 Current Costs Automated and Structured Item Development Changing the Mix of Item Types 5 Test Administration: Moving to a Local Model 5-1 Current Costs Vision for a Device-Agnostic, Contactless NAEP Local Administration in the Paper-Based Era Challenges and Flexibility with Local Administration with Computer-Based Delivery Rethinking Standardization with Local Administration Anticipated Cost Savings from Local Administration 6 Test Administration: Other Possible Innovations 6-1 Testing Two Unrelated Subjects for Each Student Reconsidering the Sample Sizes Needed to Achieve NAEPâs Purposes Adaptive Testing Coordinating Resources with NCESâs International Assessments 7 Item Scoring 7-1 Current Costs Automated Scoring of Constructed-Response Items Anticipated Cost Reductions from Automated Scoring 8 Analysis and Reporting 8-1 Current Costs Innovative Analysis and Reporting 9 Technological Infrastructure 9-1 FM-x
Prepublication copy, uncorrected proofs Current Costs Vision of a Technological Infrastructure for NAEP Development of the Next-Gen eNAEP Platform 10 Program Management, Planning, Support, and Oversight 10-1 Current Costs Taking a Systemic Approach to Designing Assessment Programs 11 Summary: A New Path for NAEP 11-1 Clarifying and Detailing NAEPâs Costs Changing the Way Trends Are Monitored and Reported Integrating Assessments for Subjects with Overlapping Content Updating the Item Development Process Modernizing NAEP Administration Using Automated Item Scoring Adopting Innovative Analysis and Reporting Developing a Next-Generation Technology Platform Taking a Systematic Approach to Designing Assessment Programs A Vision for the Future References Ref-1 Appendix A: Biographical Sketches of Panel Members and Staff Appendix B: Disclosure of Unavoidable Conflict of Interest FM-xi