A Pragmatic
Future for NAEP
CONTAINING COSTS AND
UPDATING TECHNOLOGIES
Panel on Opportunities for the National Assessment of Educational
Progress in an Age of AI and Pervasive Computation: A Pragmatic Vision
Committee on National Statistics
Division of Behavioral and Social Sciences and Education
A Consensus Study Report of
THE NATIONAL ACADEMIES PRESS
Washington, DC
www.nap.edu
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a contract between the National Academy of Sciences and the U.S. Department of Education, under Sponsor Award No. 9199-00-21-C-0002. Support for the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation, a National Agricultural Statistics Service cooperative agreement, and several individual contracts. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-27532-3
International Standard Book Number-10: 0-309-27532-6
Digital Object Identifier: https://doi.org/10.17226/26427
Additional copies of this publication are available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2022 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine. (2022). A Pragmatic Future for NAEP: Containing Costs and Updating Technologies. Washington, DC: The National Academies Press. https://doi.org/10.17226/26427.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
PANEL ON OPPORTUNITIES FOR THE NATIONAL ASSESSMENT OF EDUCATIONAL PROGRESS IN AN AGE OF AI AND PERVASIVE COMPUTATION: A PRAGMATIC VISION
KAREN J. MITCHELL (Chair), Association of American Medical Colleges (retired)
ISAAC I. BEJAR, Educational Testing Service (retired)
SEAN PATRICK (JACK) BUCKLEY, Roblox, New York, NY
BRIAN GONG, Center for Assessment, Dover, NH
ANDREW D. HO, Harvard Graduate School of Education
STEPHEN LAZER, Questar Assessment Incorporated, Cape May, NJ
SUSAN M. LOTTRIDGE, Cambium Assessment, Inc., Harrisonburg, VA
RICHARD M. LUECHT, School of Education, University of North Carolina at Greensboro
ROCHELLE S. MICHEL, Curriculum Associates, Lawrenceville, NJ
SCOTT NORTON, Council of Chief State School Officers, Baton Rouge, LA
JOHN WHITMER, Federation of American Scientists, Davis, CA
STUART W. ELLIOTT, Study Director
JUDITH KOENIG, Senior Program Officer
ANTHONY MANN, Program Associate
COMMITTEE ON NATIONAL STATISTICS
ROBERT M. GROVES (Chair), Office of the Provost, Georgetown University
LAWRENCE D. BOBO, Department of Sociology, Harvard University
ANNE C. CASE, Woodrow Wilson School of Public and International Affairs, Princeton University
MICK P. COUPER, Institute for Social Research, University of Michigan
JANET M. CURRIE, Woodrow Wilson School of Public and International Affairs, Princeton University
DIANA FARRELL, JPMorgan Chase Institute, Washington, DC
ROBERT GOERGE, Chapin Hall at the University of Chicago
ERICA L. GROSHEN, School of Industrial and Labor Relations, Cornell University
HILARY HOYNES, Goldman School of Public Policy, University of California, Berkeley
DANIEL KIFER, Department of Computer Science and Engineering, The Pennsylvania State University
SHARON LOHR, School of Mathematical and Statistical Sciences, Arizona State University, Emerita
JEROME P. REITER, Department of Statistical Science, Duke University
JUDITH A. SELTZER, Department of Sociology, University of California, Los Angeles
C. MATTHEW SNIPP, School of the Humanities and Sciences, Stanford University
ELIZABETH A. STUART, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health
JEANNETTE WING, Data Science Institute and Computer Science Department, Columbia University
BRIAN HARRIS-KOJETIN, Director
MELISSA CHIU, Deputy Director
CONSTANCE F. CITRO, Senior Scholar
Preface
The National Assessment of Educational Progress (NAEP) has long served an important role in helping educators, policy makers, and the public understand what students in the United States know and can do. It regularly reports on achievement in three grades, doing so with sophisticated sampling and estimation procedures that minimize the amount of testing time and maximize the quality and reliability of the scores. It is known for the integrity of the trend information it provides and for illuminating achievement differences among groups.
The NAEP program recognizes the value of staying current with measurement practices. When the measurement field began relying on new item types, NAEP adapted, figuring out ways to incorporate new approaches into its practices: constructed-response items, performance tasks, hands-on science experiments, and multiformat tasks to measure complex problem-solving skills.
However, NAEP has not kept pace with the measurement field’s pursuit of innovative ways to evaluate what students know and can do using artificial intelligence methods. Computer-adaptive testing, automated item generation, and automated scoring are all are rapidly making inroads into K–12 assessment with the promise of increased efficiency and lower costs. At the same time, cost containment has increasingly become an issue for NAEP. While NAEP is a highly respected program and a source of valuable information about America’s school children, it is also very expensive. Artificial intelligence and other contemporary methods offer the potential to control costs and increase efficiency, enabling NAEP to continue well into the future.
In this context, the Institute of Education Sciences (IES) of the U.S. Department of Education asked the National Academies of Sciences, Engineering, and Medicine (the National Academies) for advice about ways to maintain NAEP’s role as a leader in educational testing without making it cost prohibitive. This report is the response to that request.
The report would not have been possible without the contributions of many people.
On behalf of the panel, I extend our deepest appreciation to the sponsor of this work: without support from IES and staff with the National Center for Education Statistics (NCES), this study would not have come to fruition. In particular, we thank Mark Schneider, director of IES; Peggy Carr, commissioner, and William Tirre, senior technical advisor, at NCES; and the staff in the Assessment Division of NCES, including Gina Broxterman, Jing Chen, Allison Deigan, Enis Dogan, Pat Etienne, Eunice Greer, Shawn Kline, Dan McGrath, Nadia McLaughlin, Eddie Rivers, Holly Spurlock, and Bill Ward. Our colleagues at NCES spent countless hours responding to the panel’s questions about different aspects of the NAEP program.
We are grateful to Chair Haley Barbour of the National Assessment Governing Board (NAGB) and the members of NAGB’s Executive Committee, who met with members of the panel in August of 2021. In addition, we would like to thank the Governing Board staff, particularly Lesley Muldoon and Matt Stern, who provided the panel with insights about NAGB’s role and perspective on a number of issues.
As part of the panel’s desire to place NAEP in context, we benefited from information about other testing programs. Andreas Schleicher, at the Organisation for Economic Co-operation and Development (OECD), provided information about the Program for International Student Assessment (PISA). Joyce Zurkowski, of the Colorado Department of Education, provided us with an understanding of Colorado’s state assessment program.
In finalizing the draft report, the panel asked for help in fact-checking the sections of the report that described aspects of the NAEP program, as well as other assessments (PISA and the Colorado state assessment program). The individuals noted above who originally provided this information—from IES, NCES, NAGB, OECD, and the Colorado Department of Education—reviewed portions of the text that reflected their input to the panel’s work and corrected any inaccuracies. The panel is grateful for this additional assistance.
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies in making each published report as sound as possible and to ensure that it meets the institutional standards for quality,
objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report: Sybilla Beckmann, Department of Mathematics, Emeritus, University of Georgia; Matthew Chingos, Education and Data Policy, The Urban Institute; Steven A. Culpepper, Department of Statistics and Beckman Institute for Advanced Science and Technology, University of Illinois at Urbana-Champaign; Kristen Huff, Assessment and Research, Curriculum Associates, MA; Neal Kingston, Achievement and Assessment Institute and Department of Educational Psychology, University of Kansas; Kenneth R. Koedinger, Pittsburgh Science of Learning Center and School of Computer Science, Carnegie Mellon University; P. David Pearson, Graduate School of Education, University of California, Berkeley; Shelley Loving-Ryder, Virginia Department of Education; Mark D. Shermis, Principal, Performance Assessment Analytics, TX; Martha L. Thurlow, National Center for Educational Outcomes, University of Minnesota; David Williamson, Psychometrics, The College Board; Phoebe C. Winter, Independent Consultant, VA; Marcelo Aaron Bonilla Worsley, School of Education and Social Policy, Northwestern University; and Rebecca J. Zwick, Distinguished Presidential Appointee, Educational Testing Service.
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by Diana C. Pullin, Lynch School of Education and School of Law, Boston College, and Catherine L. Kling, Atkinson Center for Sustainability, Cornell University. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring panel and the National Academies.
The panel also extends its gratitude to members of the staff of the National Academies for their significant contributions to this report. Anthony Mann organized our virtual meetings and guided us through the many administrative procedures. Kirsten Sampson Snyder shepherded the report through the review and production process, and consultant Eugenia Grohman provided her always-sage editorial advice.
Stuart Elliott, study director, and Judy Koenig, senior program officer, masterfully oversaw the design of the study, interviewed experts, recruited the panel, gathered resources and data, and guided the study with intelligence and care. They helped the panel get its bearings, become familiar
with parts of the program they did not know, work their way through difficult topics, and focus on the most pressing issues. The panel’s work rests on their diligent efforts.
To my colleagues on the panel, it would be an understatement to say that I was inspired by your wisdom and dedication to improving this important marker of the progress of U.S. students. Your deep knowledge, careful thought, and intelligent analysis form the foundation of this report. You gave generously of your expertise and time to ensure that the report represents the panel’s consensus findings and recommendations and that it suggests a viable path for NAEP’s future. Thank you.
Karen J. Mitchell, Chair
Panel on Opportunities for the National Assessment of Educational Progress in an Age of AI and Pervasive Computation: A Pragmatic Vision
Contents
2 NAEP Overview: Structure, Goals, and Costs
Distinctive Goals and Processes
Changing the Way Trends Are Monitored and Reported
Integrating Assessments for Subjects with Overlapping Content
Automated and Structured Item Development
Changing the Mix of Item Types
5 Test Administration: Moving to a Local Model
Challenges and Flexibility with Local Administration with Computer-Based Delivery
Rethinking Standardization with Local Administration
Anticipated Cost Savings from Local Administration
6 Test Administration: Other Possible Innovations
Testing Two Unrelated Subjects for Each Student
Reconsidering the Sample Sizes Needed to Achieve NAEP’s Purposes
Coordinating Resources with NCES’s International Assessments
Automated Scoring of Constructed-Response Items
Anticipated Cost Reductions from Automated Scoring
Innovative Analysis and Reporting
9 Technological Infrastructure
Vision for a Technological Infrastructure for NAEP
Development of the Next-Gen eNAEP Platform
10 Program Management, Planning, Support, and Oversight
Taking a Systemic Approach to Designing Assessment Programs
11 Summary: a New Path for NAEP
Clarifying and Detailing NAEP’s Costs
Changing the Way Trends Are Monitored and Reported
Integrating Assessments for Subjects with Overlapping Content
Updating the Item Development Process
Modernizing NAEP Administration