The National Academies Press

Currently Skimming:

Testing, Evaluating, and Assessing Artificial IntelligenceEnabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop - in Brief
Pages 1-17

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 1... ... The goals of the study are as statement of task will be done in future data-gathering follows: meetings by the workshop planning committee. Read the entire page →
From page 2... ... Longstaff (Software Engineering Institute; subordinate to the AFTC's 96th Test Wing, and the workshop planning committee co-chair) was curious importance of DT. Read the entire page →
From page 3... ... Shanahan (USAF; workshop planning committee data points. Pinelis also talked about a new method member) Read the entire page →
From page 4... ... smaller but more frequent tests in multiple contexts and environments. There is also a push to evaluate the quality DAY 1: WORKSHOP PLANNING COMMITTEE DISCUSSION of decision making as performance. Read the entire page →
From page 5... ... There was difficult for a fee-for-service organization to solve discussion regarding unknowns, such as the lack of a problem when they need a contract before hiring, ownership regarding liability and requirements. One tasking, building, and testing are available to address workshop planning committee member commented on a the problem. Read the entire page →
From page 6... ... CDAO requires model cards. In closing, Kolda and Bieber Casterline and Chellappa discussed the applicability engaged in discussion regarding data sequestration. Read the entire page →
From page 7... ... She responded that MIT training data are collected and validated. The model LL was exploring ways to use simulators and to figure then tests on a test data set similar to the one on which out how to integrate those into the training process. Read the entire page →
From page 8... ... Overall, some key challenges concern particularly given how basic their capabilities are right trade-offs between the evaluation metrics that someone now. Wellman said he framed his stock trading example chooses and the quality of ground truth making a big as a cautionary tale to show that it may not be possible to difference. Read the entire page →
From page 9... ... the less reason there is to doubt its efficacy for training Cities are not uniform and have many factors that could or evaluating AI algorithms. Longstaff asked about affect the algorithm. Read the entire page →
From page 10... ... He Industry is profit-driven, has access to massive amounts reviewed some current societal drivers regarding future of data, has a low cost of errors, and faces threats from conflict, such as the lack of guidelines for managing commercial adversaries. DoD is purpose-driven, has escalation and the changing geography for future conflict access to limited amounts of data, has a high cost of (land, sea, air, space, cyber, etc.) Read the entire page →
From page 11... ... She defined a team as two or more teammates Longstaff asked, regarding the Synthetic Teammate with heterogeneous roles and responsibilities who Project, how she would write a requirement for someone work independently toward a common goal. She then else to develop that pilot program? Read the entire page →
From page 12... ... " if there were any takeaways regarding their work with The workshop planning committee ended the day by HSI. Cooke responded that they had not done any three- discussing the break between what happens in the agent teaming. Read the entire page →
From page 13... ... Longstaff asked where in the possible when developing models. Kolda and Yu talked requirements process they would know that a certain about embedding T&E with operations. Read the entire page →
From page 14... ... They may have back Zelnio introduced "some things that would be nice doors, either unintentionally or from poisoning, and if to measure in terms of evaluation." He spoke about an adversary has access to the model, it enables white- measuring the reliability and confidence in a system, box attacks. Draper also stated that when training your measuring understandability and trust of a system, models, you should avoid using untrusted training data, measuring the robustness of a system, measuring the avoid boot-strapping from untrusted models, and keep effectiveness of out-of-library confusers, measuring the information about training data private. Read the entire page →
From page 15... ... Bjorkman spoke about AFTC's current objective The workshop planning committee began its wrap-up of looking at the unique infrastructure needs within the by discussing its final thoughts from the workshop. test center and across different organizations to set itself Casterline commented that she does not think that any of up to test autonomous systems. Read the entire page →
From page 16... ... advances are incorporated. Rosenblum commented that, regarding the third question, he is worried that anything Longstaff discussed the major questions from the the workshop planning committee says will be outdated statement of task. Read the entire page →
From page 17... ... 2023. Testing, Evaluating, and Assessing Artificial Intelligence–Enabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop -- in Brief. Read the entire page →

From page 1...

... The goals of the study are as statement of task will be done in future data-gathering follows: meetings by the workshop planning committee.

Read the entire page →

From page 2...

... Longstaff (Software Engineering Institute; subordinate to the AFTC's 96th Test Wing, and the workshop planning committee co-chair) was curious importance of DT.

Read the entire page →

From page 3...

... Shanahan (USAF; workshop planning committee data points. Pinelis also talked about a new method member)

Read the entire page →

From page 4...

... smaller but more frequent tests in multiple contexts and environments. There is also a push to evaluate the quality DAY 1: WORKSHOP PLANNING COMMITTEE DISCUSSION of decision making as performance.

Read the entire page →

From page 5...

... There was difficult for a fee-for-service organization to solve discussion regarding unknowns, such as the lack of a problem when they need a contract before hiring, ownership regarding liability and requirements. One tasking, building, and testing are available to address workshop planning committee member commented on a the problem.

Read the entire page →

From page 6...

... CDAO requires model cards. In closing, Kolda and Bieber Casterline and Chellappa discussed the applicability engaged in discussion regarding data sequestration.

Read the entire page →

From page 7...

... She responded that MIT training data are collected and validated. The model LL was exploring ways to use simulators and to figure then tests on a test data set similar to the one on which out how to integrate those into the training process.

Read the entire page →

From page 8...

... Overall, some key challenges concern particularly given how basic their capabilities are right trade-offs between the evaluation metrics that someone now. Wellman said he framed his stock trading example chooses and the quality of ground truth making a big as a cautionary tale to show that it may not be possible to difference.

Read the entire page →

From page 9...

... the less reason there is to doubt its efficacy for training Cities are not uniform and have many factors that could or evaluating AI algorithms. Longstaff asked about affect the algorithm.

Read the entire page →

From page 10...

... He Industry is profit-driven, has access to massive amounts reviewed some current societal drivers regarding future of data, has a low cost of errors, and faces threats from conflict, such as the lack of guidelines for managing commercial adversaries. DoD is purpose-driven, has escalation and the changing geography for future conflict access to limited amounts of data, has a high cost of (land, sea, air, space, cyber, etc.)

Read the entire page →

From page 11...

... She defined a team as two or more teammates Longstaff asked, regarding the Synthetic Teammate with heterogeneous roles and responsibilities who Project, how she would write a requirement for someone work independently toward a common goal. She then else to develop that pilot program?

Read the entire page →

From page 12...

... " if there were any takeaways regarding their work with The workshop planning committee ended the day by HSI. Cooke responded that they had not done any three- discussing the break between what happens in the agent teaming.

Read the entire page →

From page 13...

... Longstaff asked where in the possible when developing models. Kolda and Yu talked requirements process they would know that a certain about embedding T&E with operations.

Read the entire page →

From page 14...

... They may have back Zelnio introduced "some things that would be nice doors, either unintentionally or from poisoning, and if to measure in terms of evaluation." He spoke about an adversary has access to the model, it enables white- measuring the reliability and confidence in a system, box attacks. Draper also stated that when training your measuring understandability and trust of a system, models, you should avoid using untrusted training data, measuring the robustness of a system, measuring the avoid boot-strapping from untrusted models, and keep effectiveness of out-of-library confusers, measuring the information about training data private.

Read the entire page →

From page 15...

... Bjorkman spoke about AFTC's current objective The workshop planning committee began its wrap-up of looking at the unique infrastructure needs within the by discussing its final thoughts from the workshop. test center and across different organizations to set itself Casterline commented that she does not think that any of up to test autonomous systems.

Read the entire page →

From page 16...

... advances are incorporated. Rosenblum commented that, regarding the third question, he is worried that anything Longstaff discussed the major questions from the the workshop planning committee says will be outdated statement of task.

Read the entire page →

From page 17...

... 2023. Testing, Evaluating, and Assessing Artificial Intelligence–Enabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop -- in Brief.

Read the entire page →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

Testing, Evaluating, and Assessing Artificial IntelligenceEnabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop - in Brief Pages 1-17

Testing, Evaluating, and Assessing Artificial IntelligenceEnabled Systems Under Operational Conditions for the Department of the Air Force: Proceedings of a Workshop - in Brief
Pages 1-17