Division on Engineering and Physical Sciences
Consensus Study Report
NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by a contract between the National Academy of Sciences and the Department of the Air Force under award number FA955016D00001 FA865121-F-9323. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-70439-7
International Standard Book Number-10: 0-309-70439-1
Digital Object Identifier: https://doi.org/10.17226/27092
This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2023 by the National Academy of Sciences. National Academies of Sciences, Engineering, and Medicine and National Academies Press and the graphical logos for each are all trademarks of the National Academy of Sciences. All rights reserved.
Printed in the United States of America.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2023. Test and Evaluation Challenges in Artificial Intelligence–Enabled Systems for the Department of the Air Force. Washington, DC: The National Academies Press. https://doi.org/10.17226/27092.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
COMMITTEE ON TESTING, EVALUATING, AND ASSESSING ARTIFICIAL INTELLIGENCE-ENABLED SYSTEMS UNDER OPERATIONAL CONDITIONS FOR THE DEPARTMENT OF THE AIR FORCE
MAY CASTERLINE, NVIDIA, Co-Chair
THOMAS A. LONGSTAFF, Carnegie Mellon University, Co-Chair
CRAIG R. BAKER, Baker Development Group, LLC
ROBERT A. BOND, Massachusetts Institute of Technology
RAMA CHELLAPPA (NAE), Johns Hopkins University
TREVOR DARRELL, University of California, Berkeley (until December 2022)
MELVIN GREER, Intel Corporation
TAMARA G. KOLDA (NAE), Independent Consultant, MathSci.ai
NANDI O. LESLIE, Raytheon Technologies (until December 2022)
ROBIN R. MURPHY, Texas A&M University
DAVID S. ROSENBLUM, George Mason University
JOHN (JACK) N.T. SHANAHAN, United States Air Force (retired)
HUMBERTO SILVA III, Sandia National Laboratories (until December 2022)
REBECCA WILLETT, University of Chicago
Staff
RYAN MURPHY, Program Officer
GEORGE COYLE, Senior Program Officer
EVAN ELWELL, Research Associate
CHARLES YI, Research Assistant
MARTA HERNANDEZ, Program Coordinator
AMELIA A. GREEN, Senior Program Assistant (until July 2022)
AIR FORCE STUDIES BOARD
ELLEN M. PAWLIKOWSKI (NAE), Independent Consultant, Chair
CHRISTOPHER P. AZZANO, Booz Allen Hamilton
KEVIN G. BOWCUTT (NAE), Boeing Company
RAMA CHELLAPPA (NAE), Johns Hopkins University
MARK F. COSTELLO, Georgia Institute of Technology
DANIEL A. DELAURENTIS, Purdue University
BONNIE J. DUNBAR (NAE), Texas A&M University
JAMES M. HOLMES, Red 6
DEBORAH L. JAMES, Independent Consultant
CHRISTOPHER T. JONES (NAE), Leadership Compass
EDWARD M. LAWS (NAM), Harvard University
LESTER L. LYLES (NAE), Independent Consultant
VALERIE M. MANNING, Overair
WENDY MASIELLO, Independent Consultant
LAURA J. MCGILL (NAE), Sandia National Laboratories
HENDRICK W. RUCK, Edaptive Computing, Inc.
JULIE J.C.H. RYAN, Wyndrose Technical Group
MICHAEL SCHNEIDER, Lawrence Livermore Laboratory
Staff
ELLEN CHOU, Board Director
GEORGE COYLE, Senior Program Officer
RYAN MURPHY, Program Officer
ALEX TEMPLE, Program Officer
MARTA HERNANDEZ, Program Coordinator
EVAN ELWELL, Research Associate
AMELIA A. GREEN, Senior Program Assistant (until July 2022)
CHARLES YI, Research Assistant
DONOVAN THOMAS, Financial Business Partner
Reviewers
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report:
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by STEVE BELLOVIN, Columbia University, and BOB SPROULL, University of Massachusetts Amherst. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.
Contents
1.1 A Central Question: How to Achieve Sufficient Confidence in AI-Enabled Systems?
1.2 Study Questions to Be Addressed
1.3 What Do We Mean by “Artificial Intelligence”?
1.4 Current State of the Art of AI
1.5 Current State of the Practice of AI in the DAF
1.6 Algorithmic Warfare Cross-Functional Team (Project Maven) Case Study
2 DEFINITIONS AND PERSPECTIVES
2.2 Role of Data in AI-Enabled Systems
2.3 History of T&E in AI-Enabled Systems
3 TEST AND EVALUATION OF DAF AI-ENABLED SYSTEMS
3.3 OSD and DAF T&E Policies for AI-Enabled Systems
3.4 AI T&E in the Commercial Sector
3.5 Contrast of Commercial and DoD Approaches to AI T&E
3.6 Trust, Justified Confidence, AI Assurance, Trustworthiness, and Buy-In
3.7 Risk-Based Approach to AI T&E
4 EVOLUTION OF TEST AND EVALUATION IN FUTURE AI-BASED DAF SYSTEMS
4.2 Appointing a DAF AI T&E Champion
4.3 Establishing AI T&E Requirements
4.4 Culture Change and Workforce Development
4.5 Summary of Implications of Future AI for DAF T&E
5 AI TECHNICAL RISKS UNDER OPERATIONAL CONDITIONS
5.2 General Risks of AI-Enabled Systems
5.3 AI Corruption Under Operational Conditions
5.4 Attack Surfaces for AI-Enabled Systems
5.5 Risk of Adversarial Attacks
5.6 Network Security and Zero Trust Implications
5.7 Robust and Secure AI Models
5.8 Research in T&E to Address Adversarial AI
6 EMERGING AI TECHNOLOGIES AND FUTURE T&E IMPLICATIONS
6.3 Informed Machine Learning Models
This page intentionally left blank.
Preface
At the request of the 96th Test Wing of the U.S. Air Force and Air Force Materiel Command, the National Academies of Sciences, Engineering, and Medicine were asked to convene a committee to conduct a consensus study to examine the Air Force Test Center’s technical capabilities and capacity to conduct rigorous and objective tests, evaluations, and assessments of artificial intelligence (AI)-enabled systems under operational conditions and against realistic threats.
The National Academies of Sciences, Engineering, and Medicine appointed the Committee on Testing, Evaluating, and Assessing Artificial Intelligence-Enabled Systems Under Operational Conditions for the Department of the Air Force to conduct this study, per the Statement of Task found in Appendix A and Box P-1. The committee held its initial kickoff meeting in April 2022, conducted a data-gathering workshop in June 2022 (a Proceedings of a Workshop—in Brief of which can be found in Appendix E), and held further data-gathering sessions throughout 2022 and early 2023, including a site visit to Eglin Air Force Base. Agendas for the data-gathering meetings can be found in Appendix B. Biographies of the committee members can be found in Appendix C. Appendix D contains a list of acronyms and abbreviations used in the report.