The National Academies Press

Currently Skimming:

Assessing and Improving AI Trustworthiness: Current Contexts and Concerns Proceedings of a Workshop in Brief
Pages 1-12

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 1... ... , which recently launched an AI trustworthiness initiative. In a series of five panel discussions and one keynote address, the workshop planning committee led participants through an overview of current practices in AI trustworthiness, attributes of trustworthy systems, and tools and assessments to better understand and communicate a system's trustworthiness. Read the entire page →
From page 2... ... David Palmer of the Federal Reserve Board opened by discussing risk in financial modeling in general, and how it relates to AI. Palmer noted that oversight 1 National Security Commission on Artificial Intelligence, 2021, Final Report, March 1, https://www.nscai. Read the entire page →
From page 3... ... , discussed quality assurance protocols for aviation software and how similar protocols might be used to ensure the quality of AI used in transportation. As Hart pointed out, existing protocols ensure the engagement of end users in software design, training for affected users and parties, and operational feedback to improve future design and 2 Food and Drug Administration, 2019, "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML) Read the entire page →
From page 4... ... Palmer observed that the Federal Reserve Board develops principlesbased guidance to ensure that banks understand the operations and risks of their models, but that prescriptive guidance specific to AI is difficult to produce and likely less useful given the rapid evolution of AI and its applications. ATTRIBUTES OF AI TRUSTWORTHINESS: ROBUST, EXPLAINABLE, AND GENERALIZABLE Katherine Heller of Duke University and Google Medical Brain opened the session by discussing her work on underspecification in AI.3 This work addresses the significant gap observed between AI systems' abilities to generalize well from their testing and training domains and their abilities to encode a credible claim about the world. Read the entire page →
From page 5... ... Ensuring trustworthiness in this context, then, requires rethinking AI assessment. As an alternative to ensuring trustworthiness through technical specifications that may be insufficient to adequately capture the dimensions of trustworthiness, Madry suggested borrowing safety practices from medicine, wherein patients trust doctors in part because there is an infrastructure for reporting errors and malpractice, helping to align physician performance with understood principles for good medical care. Read the entire page →
From page 6... ... First, she discussed a 2018 Microsoft Research investigation4 of industry challenges and needs for developing fair ML systems. Structured interviews with 35 practitioners across 10 technology companies uncovered differences between academic and industry approaches to ML fairness. Read the entire page →
From page 7... ... The session organizers and moderators Deirdre Mulligan and Aleksander Madry then led a moderated discussion. The discussion began with panelists being asked, what would be an acceptable baseline for assessing fairness, privacy, and other desiderata of trustworthy AI systems? Read the entire page →
From page 8... ... Alexandra Chouldechova of Carnegie Mellon University presented her work with risk assessment instruments, statistical models that output the probability of an undesirable outcome based on a set of features. In many cases, these models suffer from omitted payoff bias, wherein high predictive accuracy does not always translate into optimal decision making, as decisions rely on factors beyond simple error prediction accuracy. Read the entire page →
From page 9... ... Last, addressing hate speech on Facebook may require an outcome-focused notion of fairness that incorporates equity considerations, as hate speech against certain groups may have much more pronounced consequences than against other groups. In moderated discussion, led by session organizers and moderators Jeannette Wing and Susan Dumais, the participants both posed questions to one another and also offered suggestions to NIST for future work. Read the entire page →
From page 10... ... The participants also offered other suggestions for NIST, including developing annual benchmarks for defined, desirable objectives on the use of AI, identifying measurable properties, presenting findings from NIST's studies on these topics in both a scientific format and a format more accessible to ordinary citizens, and the production of real-world case studies. WORKSHOP SYNTHESIS AND OUTCOMES In the final session of the workshop, members of the workshop planning committee served as panelists and identified key themes emerging from the discussion and discussed potential future work to advance the development of trustworthy AI systems. Read the entire page →
From page 11... ... to promote more prospective explanatory models and promote the development of audit trail mechanisms that allow for the interrogation of model's operations. As longer-term and larger-scale work, Wing discussed the possibility of NIST fostering the development of a third-party repository for sensitive data, or using its convening power to develop a series of sector-specific external review boards to complement the internal review processes that currently exist at many companies. Read the entire page →
From page 12... ... BOARD STAFF: Jon Eisenberg and Brendan Roach, Computer Science and Telecommunications Board, Division on Engineering and Physical Sciences, National Academies of Sciences, Engineering, and Medicine. REVIEWERS: To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop -- in Brief was reviewed by Deborah Prince, Underwriters Laboratories; Ben Shneiderman, University of Maryland, College Park (NAE) Read the entire page →

From page 1...

... , which recently launched an AI trustworthiness initiative. In a series of five panel discussions and one keynote address, the workshop planning committee led participants through an overview of current practices in AI trustworthiness, attributes of trustworthy systems, and tools and assessments to better understand and communicate a system's trustworthiness.

Read the entire page →

From page 2...

... David Palmer of the Federal Reserve Board opened by discussing risk in financial modeling in general, and how it relates to AI. Palmer noted that oversight 1 National Security Commission on Artificial Intelligence, 2021, Final Report, March 1, https://www.nscai.

Read the entire page →

From page 3...

... , discussed quality assurance protocols for aviation software and how similar protocols might be used to ensure the quality of AI used in transportation. As Hart pointed out, existing protocols ensure the engagement of end users in software design, training for affected users and parties, and operational feedback to improve future design and 2 Food and Drug Administration, 2019, "Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)

Read the entire page →

From page 4...

... Palmer observed that the Federal Reserve Board develops principlesbased guidance to ensure that banks understand the operations and risks of their models, but that prescriptive guidance specific to AI is difficult to produce and likely less useful given the rapid evolution of AI and its applications. ATTRIBUTES OF AI TRUSTWORTHINESS: ROBUST, EXPLAINABLE, AND GENERALIZABLE Katherine Heller of Duke University and Google Medical Brain opened the session by discussing her work on underspecification in AI.3 This work addresses the significant gap observed between AI systems' abilities to generalize well from their testing and training domains and their abilities to encode a credible claim about the world.

Read the entire page →

From page 5...

... Ensuring trustworthiness in this context, then, requires rethinking AI assessment. As an alternative to ensuring trustworthiness through technical specifications that may be insufficient to adequately capture the dimensions of trustworthiness, Madry suggested borrowing safety practices from medicine, wherein patients trust doctors in part because there is an infrastructure for reporting errors and malpractice, helping to align physician performance with understood principles for good medical care.

Read the entire page →

From page 6...

... First, she discussed a 2018 Microsoft Research investigation4 of industry challenges and needs for developing fair ML systems. Structured interviews with 35 practitioners across 10 technology companies uncovered differences between academic and industry approaches to ML fairness.

Read the entire page →

From page 7...

... The session organizers and moderators Deirdre Mulligan and Aleksander Madry then led a moderated discussion. The discussion began with panelists being asked, what would be an acceptable baseline for assessing fairness, privacy, and other desiderata of trustworthy AI systems?

Read the entire page →

From page 8...

... Alexandra Chouldechova of Carnegie Mellon University presented her work with risk assessment instruments, statistical models that output the probability of an undesirable outcome based on a set of features. In many cases, these models suffer from omitted payoff bias, wherein high predictive accuracy does not always translate into optimal decision making, as decisions rely on factors beyond simple error prediction accuracy.

Read the entire page →

From page 9...

... Last, addressing hate speech on Facebook may require an outcome-focused notion of fairness that incorporates equity considerations, as hate speech against certain groups may have much more pronounced consequences than against other groups. In moderated discussion, led by session organizers and moderators Jeannette Wing and Susan Dumais, the participants both posed questions to one another and also offered suggestions to NIST for future work.

Read the entire page →

From page 10...

... The participants also offered other suggestions for NIST, including developing annual benchmarks for defined, desirable objectives on the use of AI, identifying measurable properties, presenting findings from NIST's studies on these topics in both a scientific format and a format more accessible to ordinary citizens, and the production of real-world case studies. WORKSHOP SYNTHESIS AND OUTCOMES In the final session of the workshop, members of the workshop planning committee served as panelists and identified key themes emerging from the discussion and discussed potential future work to advance the development of trustworthy AI systems.

Read the entire page →

From page 11...

... to promote more prospective explanatory models and promote the development of audit trail mechanisms that allow for the interrogation of model's operations. As longer-term and larger-scale work, Wing discussed the possibility of NIST fostering the development of a third-party repository for sensitive data, or using its convening power to develop a series of sector-specific external review boards to complement the internal review processes that currently exist at many companies.

Read the entire page →

From page 12...

... BOARD STAFF: Jon Eisenberg and Brendan Roach, Computer Science and Telecommunications Board, Division on Engineering and Physical Sciences, National Academies of Sciences, Engineering, and Medicine. REVIEWERS: To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop -- in Brief was reviewed by Deborah Prince, Underwriters Laboratories; Ben Shneiderman, University of Maryland, College Park (NAE)

Read the entire page →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

Assessing and Improving AI Trustworthiness: Current Contexts and Concerns Proceedings of a Workshop in Brief Pages 1-12

Assessing and Improving AI Trustworthiness: Current Contexts and Concerns Proceedings of a Workshop in Brief
Pages 1-12