Page 11 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

2

Human-AI Teaming Methods and Models

There is a rich history of research on human-human teams and human-automation teams across military and civilian domains, including healthcare, manufacturing, process control, emergency response, engineering, and design. Across these and other domains, teams are recognized for the ability to coordinate and perform multiple roles beyond the skills or capabilities of a single individual (Salas, Cooke, and Rosen, 2008; Tsifetakis and Kontogiannis, 2019). This chapter reviews the findings of team research, suggests implications for human-AI teaming, and addresses several associated challenges. Although there are also important ethical considerations around the development of AI systems (Defense Innovation Board, 2019; Flathmann et al., 2021; Hagendorff, 2020; Montréal Responsible AI Declaration Steering Committee, 2018), this report will primarily focus on the development of effective AI systems for supporting human control and interaction to achieve mission goals.

TEAMS

Field-based studies of the long-term dynamics of military teams and operations, as well as team development and training, have been performed since the 1950s and 1960s (Goodwin, Blacksmith, and Coats, 2018; McGrath, 1984; Morgan, Salas, and Glickman, 1993). Nuclear submarine crews, Antarctic research deployments, and undersea research facilities have also been studied (Driskell, Salas, and Driskell, 2018; Gunderson, 1973; Radloff and Helmreich, 1968). Broader examinations of team processes and dynamics, such as McGrath (1984) and Sundstrom, DeMeuse, and Futrell (1990), describe military teams as a particular application of action/performance teams, with the distinguishing characteristics of skilled specialist roles, focused performance events, and improvisation due to the dynamics and unpredictable nature of tasks. In a review of team-studies literature, Salas, Bowers, and Cannon-Bowers (1995) define teams as including two or more individuals with common goals, role assignments, and interdependence. Additional team characteristics include decision making within a task context, specialized task-related knowledge and skills, and performance within the task-context constraints of time pressure, workload, and other conditions. The concept of mental models is an important element of task-related knowledge. Mental models refer to a team member’s organized information and perception of current states, situational dynamics, and contextual cues. Individual mental models allow for anticipation and prediction of future task conditions. Mental models that are shared among team members allow team members to anticipate and predict the needs and processes of other team members that are important for supporting mutual coordination (Goodwin, Blacksmith, and Coats, 2018). The shared mental models of team members often strongly affect performance in terms of understanding

Page 12 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

each other’s roles and predicted behaviors. The way team members are trained and aided to become effectively performing teams is of particular importance, and it is an area where AI has great potential (see Chapter 9).

Teams are created to perform a variety of tasks that require the coordination of multiple interdependent individuals (Cooke et al., 2007), and this definition does not require all team members to be human (see Chapter 3). Further, the performance of a team is not decomposable to, or an aggregation of, individual performances. This description emphasizes the interdependence of team members (Salas, Bowers, and Cannon-Bowers, 1995; Tsifetakis and Kontogiannis, 2019).

Task demands and team composition often vary over time. Current and future human-AI teaming military tasks will similarly be characterized primarily by their dynamic nature. The specific types of tasks and activities that a team performs must be accompanied by the component elements of a team, such as interdependent roles and expectations, support from other team members, common understandings, effective interactions, and mutual trust in others’ capabilities and performance (Cooke, 2018).

Relevant team characteristics include dimensions of team membership and team configurations (e.g., human-human, human-non-human, human-AI, or combinations thereof), sources of information and instruction, superordinate goals and priorities, and interdependence of teammate goals, as well as factors such as team cohesion, communication, and coordination (see Chapter 3). Further, teammates also perform their roles with a certain amount of operational independence or autonomy. When used in this sense, autonomy is not synonymous with AI; it refers to a dynamic functional state. The degree to which a human or AI system is autonomous is an operational question of independent function, addressing two performance-related queries: autonomy from whom, and autonomy to do what (Caldwell and Onken, 2011). In other words, a human who is ordered to perform an action only upon receiving an order to do so is acting with low autonomy for that action. A system that automatically goes into shutdown mode when detecting specific onboard conditions, such as a piece of space hardware going into “safe mode,” is demonstrating high autonomy to protect critical performance capability (see Chapter 5).

HUMAN-AI TEAMING MODELS AND PERSPECTIVES

The use of AI in future military systems requires that humans can effectively control any systems that could potentially have lethal outcomes. The Department of Defense (DOD) stipulates that lethal autonomous weapons systems be designed to “allow commanders and operators to exercise appropriate levels of human judgment over the use of force” (DOD, 2012, p. 2). This does not require real-time control, “but rather broader human involvement in decisions about how, when, where, and why the weapon will be employed” (CRS, 2020, p. 2). This stipulation also requires the human-AI interface be readily understandable to trained operators and that adequate training be provided. In the committee’s opinion, because the employment of force may follow from a wide variety of AI actions and recommendations in the multi-domain operations (MDO) context, this places considerable focus on the need for effective human understanding of and control over AI systems.

A recent review of AI in military systems found that “failure to advance reliable, trustworthy, and resilient AI systems could adversely affect deterrence, military effectiveness, and interoperability” (Konaev et al., 2020, p. 6). In addition, the National Security Commission on Artificial Intelligence recently stated:

to establish justified confidence, the government should focus on ensuring that its AI systems are robust and reliable, including through research and development (R&D) investments in AI security and advancing human-AI teaming through a sustained initiative led by the national research labs. It should also enhance DOD’s testing and evaluation capabilities as AI-enabled systems grow in number, scope, and complexity. Senior-level responsible AI leads should be appointed across the government to improve executive leadership and policy oversight (Schmidt et al., 2021, p. 11).

These reviews have increased the focus on the importance of human-AI teaming for military operations. A human-AI team is defined as “one or more people and one or more AI systems requiring collaboration and coordination to achieve successful task completion” (Cuevas et al., 2007, p. 64). Similarly, McNeese et al. (2018) define a human-autonomy team as a team in which humans and autonomous agents function as coordinated units;

Page 13 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

this is also applicable to human-AI teams. The consideration of AI as a teammate to human operators goes back several decades (Taylor and Reising, 1995). Recent work by the National Aeronautics and Space Administration posits three major tenets for human-autonomy teams: (1) bi-directional communication about mission goals and rationale; (2) transparency regarding what the automation is doing and why; and (3) operator-directed interfaces for dynamic function allocation (Brandt et al., 2017; Shively et al., 2017). In Forbus’s (2016) discussion of the need for AI to develop as a social organism to work effectively with humans, he states that AI must (1) have autonomy, including needs and drives to improve, and have good relationships with humans; (2) be capable of having a “shared focus” with humans; (3) be capable of natural language understanding to build shared situation awareness and formulate joint plans with humans; (4) learn to build models of the intentions of others; and (5) interact with others, including by helping and teaching. Other researchers stress the importance of both team cognition and collective intelligence for human teaming with autonomous systems (Canonico, Flathmann, and McNeese, 2019). Johnson and Vera (2019) also highlight the importance of team intelligence, which they define as “knowledge, skills, and strategies with respect to managing interdependence” in teams (p. 18). (See O’Neill et al., 2020 for a literature review on human-autonomy teaming.)

A North Atlantic Treaty Organization (NATO) working group focused on the importance of meaningful control over AI systems. “Meaningful human control can be described as the ability to make timely, informed choices to influence AI-based systems that enable the best possible operational outcomes” (Boardman and Butcher, 2019, p. 7-1). Meaningful human control includes both freedom of choice for the human and sufficient human understanding of the situation and system. Boardman and Butcher (2019) concluded that, to have meaningful control, the human must have (1) freedom of choice; (2) the ability to impact the behavior of the system; (3) time to engage with the system and alter its behavior; (4) sufficient situation understanding; and (5) the ability to predict the behavior of the system and the effects of the environment.

Wynne and Lyons (2018) noted the importance of understanding how humans perceive autonomous partners. They employ the term “autonomous agent teammate-likeness,” which they define as “the extent to which a human operator perceives and identiﬁes an autonomous, intelligent agent partner as a highly altruistic, benevolent, interdependent, emotive, communicative and synchronized agentic teammate, rather than simply an instrumental tool” (p. 355). Factors such as perceived agency, the ability to communicate, the presence of shared mental models to direct information sharing, and shared intent contribute to the willingness of humans to consider an AI system as a teammate (Lyons et al., 2021). “Effective team processes can: (1) signal shared intent toward collective goals, (2) promote team cognition in support of the development and maintenance of shared mental models, and (3) promote aiding and performance monitoring via communication” (Lyons et al., 2021, p. 5). Lyons et al. (2021) conclude that “the challenges of human-autonomy teaming rest in developing (1) team-based affordances for fostering shared awareness and collective motivation, (2) an understanding of the types of tasks and interactions that stand to beneﬁt from social cueing, and (3) developing techniques for using these cues to enhance [human-autonomy team] performance” (p. 5).

There are multiple ways of combining humans and AI into teams, including humans supervising an AI system that is serving as an aide or helper, humans collaborating with an AI system as equal teammates, and an AI system acting as a limiter of human performance (Endsley, 2017). It should also be recognized that AI systems may play a variety of roles, ranging from decision-support tool to assistant, collaborator, coach, trainer, or mediator. Within the human-AI teaming literature, it is generally accepted that the human should be in charge of the team, for reasons that are both ethical and practical (Boardman and Butcher, 2019; Bryson and Theodorou, 2019; Shneiderman, 2020; Taylor and Reising, 1995). Not only are humans legally and morally responsible and accountable for their actions, they also function more effectively when their level of engagement is high (Endsley and Jones, 2012). While it is assumed that human-AI teams will be more effective than either humans or AI systems operating alone, in the committee’s judgment this will not be the case unless humans can (1) understand and predict the behaviors of the AI system (see Chapters 4 and 5); (2) develop appropriate trust relationships with the AI system (see Chapter 7); (3) make accurate decisions based on input from the AI system (see Chapter 8); and (4) exert control over the AI system in a timely and appropriate manner (see Chapter 6).

Page 14 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

SHOULD HUMANS TEAM WITH AI?

As an alternative perspective, Shneiderman (2021) argues against using the “teaming” metaphor in the design of AI systems, stating, “A perfect teammate, buddy, assistant, or sidekick sounds appealing, but can designers deliver on this image or will users be misled, deceived, and disappointed?”. He argues that alternative metaphors, such as supertools, tele-bots, or active appliances, are preferable because they more effectively communicate that the AI system is in the service of the human(s)’ goals, with the human(s) remaining in control. By leveraging such alternative metaphors, it is possible to convey the benefits of the teaming metaphor, such as helpfulness, while broadening the options for how that help could be provided and avoiding unrealistic expectations that may arise when an AI agent is referred to as a teammate. Similar points have been made by others (Groom and Nass, 2007; Klein, Feltovich, and Woods, 2005).

While these arguments have merit, this committee strongly feels that there are important benefits to adopting the teaming metaphor for research and design, especially as AI systems grow in capability and autonomy. While current AI systems fall substantially short of the criteria for an effective teammate, there is value in highlighting what those criteria are and striving to build AI systems that can meet them. While Shneiderman (2021) argues against classifying a non-human as a teammate, the military already has a long history of humans working with birds and non-human mammals; thus, the committee rejects the notion that all members of a military team must be human. Instead, we focus on functional considerations of what individual actors (regardless of type) must do, need to know, and contribute to be considered effective team members. McGrath describes teams as co-acting agents with a shared mission and task-oriented goals, and distinguishes various typologies of teams, ranging from naturally occurring, long-duration standing crews to dynamic, problem-solving teams purposefully created for mission-specific functions (McGrath, 1984, 1990). To adapt to changing environmental and task conditions, teams, according to McGrath’s definition, require the effective (and effectively integrated) performance of each team member. Thus, the use of the teaming metaphor is based on the coordination and interdependence that needs to occur in a dynamic setting.

Landmark studies of military tactical teams during training provide a fundamental assessment of the behavior and performance attributes of successfully performing teams (Oser et al., 1989; Salas, Bowers, and Cannon-Bowers, 1995). The most successful teams demonstrated clear, effective, and assistance-based communications, as well as the ability to identify and begin additional tasks when needed. Team performance, then, represents not only interdependent performance, but also temporal and functional performance alignment and communication to support that alignment.

Further, AI is essentially different from other forms of technology with which humans interact. In multiple task settings, humans can develop synergistic interactions with tools that enhance their own task performance, but these interactions do not constitute a team. Salas et al. (1992) explicitly define a team as a group whose members are inherently interdependent in carrying out a common goal. Although an infantry soldier may rely heavily on a gun, helmet, or map for improved performance, these tools do not represent team members with interdependent capabilities or shared understandings. Likewise, an assemblage of humans, individually and independently providing information to a superior officer and receiving individual orders for next actions, also would not be characterized as a team. These types of exclusions imply some important considerations for future studies of human-AI team dynamics. For an AI system to be a part of a team, it must be capable of interdependence in its operations, as well as a degree of autonomy in its execution (Reyes, Dinh, and Salas, 2019).

For this reason, the committee feels that there is considerable value in the team metaphor. First, it is possible for a human-AI unit to meet the definition of a team, with interdependent capabilities, contributions, and roles in the performance of a complex task beyond the capacity of a single agent. Second, by considering humans and AI as teammates, the value of team interactions in producing performance superior to that of independent individuals can be brought to bear, including an improved ability to adapt to changing demands and to provide each other with mutual support and back-up. Third, it has been noted that the need for team coordination increases as the capabilities of a technology or agent increase, as is the case with AI systems (Johnson, Vignatti, and Duran, 2020). Finally, the committee rejects the assumption that defining a person and an AI system as a team implies that those agents

Page 15 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

are equivalent in their agency, functionality, capabilities, responsibilities, or authority. Additional discussion of the processes and capabilities associated with shared mental models is provided in Chapter 3.

Frameworks to describe information and task coordination at higher levels of aggregation, such as Malone’s collective intelligence (Malone, 2018; Malone and Crowston, 2001), or Miller’s supranational systems level of living systems (Miller and Miller, 1991), further elaborate the need to allocate functions of cognitive processing, information flow, and task coordination beyond the scope or capability of individuals. As an example, the coordinated humanitarian aid and disaster response after the Surfside condominium collapse in Florida in 2021 included the interdependent roles of humans, from military and local law enforcement agencies, with trained search-and-rescue dogs and uninhabited flight vehicles requiring manual post-flight processing (Murphy, 2021). From an operational standpoint, these operations support a metaphor of human-AI teaming as the joint activity of multiple, heterogenous actors with coordination requirements. Murphy’s analysis underscores the importance of distinguishing team member functions from the attributes of specific actors. For example, for a dog or a drone to be seen as an important part of a disaster response team, it should not be assumed that the dog or drone must perform the search function exactly the same way a human would, using the same perceptual cues (Burke et al., 2004; Murphy, 2021). Stipulating that AI teammates must function as though they were equivalently capable humans contradicts extant research on the performance of various types of groups and teams.

Simplifying assumptions about the nature of effective human-human task coordination, including studies of military teams, often underestimate the teamwork functions necessary for mission-essential competencies and appropriate team performance outcomes (Alliger et al., 2007; Salas, Bowers, and Cannon-Bowers, 1995). Teamwork is defined as an interrelated set of knowledge, skills, and attitudes that enables teams to perform in a coordinated, adaptive manner. Teamwork includes an understanding of roles, responsibilities, interdependencies, interaction patterns, communications, and information flow (Cannon-Bowers, Salas, and Converse, 1993). Teamwork is often contrasted with taskwork, which focuses on the activities, skills, and knowledge associated with performing the tasks required for a job (i.e., operating procedures, capabilities, and limitations of equipment and technology; task procedures, strategies, constraints; relationships between components; and likely contingencies and scenarios) (Cannon-Bowers, Salas, and Converse, 1993). The use of AI capabilities in these contexts extends the work of prior authors, such as Hutchins (1990), who emphasize the growing role of information technologies to support the communication and coordination of distributed expertise and to provide dynamic, current updates of a situation.

Across much of teamwork research, coordination is defined as “managing dependencies between activities” (Malone and Crowston, 2001, p. 10), while the related concept of groupwork highlights not only member characteristics, but also local situations, tasks, and organizational contexts (Olson and Olson, 2001) (see Chapter 3 and Marks, Mathieu, and Zaccaro, 2001 for a discussion of team processes). The process of coordinating between team members involves using distributed expertise and technologies to manage time constraints, resolve uncertainty, and support shared information needs (Cannon-Bowers, Salas, and Converse, 1993; Hutchins, 1990; Malone, 2018). Based on these considerations, in the committee’s opinion the dynamic, performance-based contexts, tasks, and timescales of MDO present a major challenge for defining and evaluating effective human-AI teaming configurations.

The terms “human supervisory control” and “levels of automation” (Sheridan, 1988, 1992, 2011; Sheridan and Verplank, 1978), originally used as general descriptions of human-automation interactions, have been inaccurately interpreted to imply a conflict between human and AI control (see Chapter 6). Roethlisberger and Dickson (1934) used the term supervisory control to describe differences in function allocation between human workers in a production team. In supervisory control, the human handles high-level tasks, decides on overall system goals, and monitors the system “to determine whether operations are normal and proceeding as desired, and to diagnose difficulties and intervene in the case of abnormality or undesirable outcomes” (Sheridan and Johannsen, 1976, p. v). Determination of appropriate task functions and evaluations of appropriate performance (both quantity and quality) are traditionally considered to be elements of human supervisory responsibility. Assignments of control and responsibility between humans and automation include determinations of who should be assigned which tasks and where responsibility should lie in cases of performance breakdown. Equal participation or distinct independence of action or decision making by all team members is never assumed.

Page 16 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

IMPROVED MODELS FOR HUMAN-AI TEAMS

The use of the term “model” here is deliberately ambiguous, as it can relate alternatively to computational descriptions of performance dynamics, theoretical constructs of required components and processes, or best practices demonstrated from operational experience. In the committee’s judgment, while there has been some work, particularly using descriptive models, to describe the elements and factors relevant to human-AI teaming, to date none of these efforts has progressed toward computational models or quantifications of the relative importance of team characteristics, processes, or other factors. Further, in the committee’s opinion, teaming models need to be informed by an understanding of the real-world demands and needs associated with military command and control operations.

Studies conducted in the New Command and Control Concepts and Capabilities (NATO SAS-050) program examined the evolution from traditional command and control to network-enabled capability paradigms (Stanton, Baber, and Harris, 2008; Walker et al., 2009), reinforcing similar research conducted in the U.S. military context (Bolstad et al., 2002; Burns, Bryant, and Chalmers, 2005; Cooke et al., 2007; Graham et al., 2004; Kott, 2008; Moore et al., 2003; Riley et al., 2006). These studies, while not specifically focused on incorporating AI systems as functional team members, strongly emphasize that information distribution, patterns of interaction, and allocation of decision rights are crucial to coordinating the expertise of team members to achieve effective task execution. The results of these studies and others addressing network-enabled capability and mission-essential competencies in the military environment (Alliger et al., 2007; Bennett et al., 2017), provide important insights and research priorities for the development of human-AI teams, as well as for both human warfighter training and the creation of simulations that could be used in human-AI development and performance testing (McDermott et al., 2018).

As described above, teams exist as, and are trained to function as, integrated systems—not simply as aggregated components (Burke et al., 2004; Salas, Bowers, and Cannon-Bowers, 1995; Tsifetakis and Kontogiannis, 2019). Feedback-based mechanisms that allow team members to monitor and assess task performance, and opportunities to improve skills through ongoing practice, are important mechanisms to improve the performance of human team members (Salas et al., 1992; Sottilare et al., 2017; Swezey and Salas, 1992). Communication and support behaviors between team members represent feedback-based processes for developing shared experience on which mutual trust is based (Cuevas et al., 2007). Human-AI interactions present an opportunity for humans (and AI systems) to develop and calibrate mutual understanding and expectations of how other team members will function, across a range of task scenarios and environmental constraints.

In the committee’s opinion, another key challenge lies in the development of AI systems that can function in the challenging real-world complexity of MDO, which may be very different from laboratory scenarios, and the use of metrics to quantify system performance. For example, the explainability and transparency of AI systems performing tasks as a part of human-AI teams is one of the most important AI design challenges (see Chapter 5) and highlights the differences between sandbox-based research and real-world applications (see Chapter 10).

From a computer science perspective, the explainability and transparency of a machine learning (ML) algorithm is often based on its ability to be queried by a computer scientist in a post-hoc examination (Bhatt et al., 2020; Burkart and Huber, 2021). However, many ML-based AI systems are especially brittle in the face of unanticipated data in training sets, or when training sets do not apply to the real-world context of application. More importantly, post-hoc querying by a computer scientist to assess an AI system’s team performance is in no way equivalent to real-time understanding by human members of a human-AI team, who may be facing life-and-death decisions and experiencing significant uncertainty and time constraints. Therefore, the committee believes that measures of AI explainability derived in research environments may not generalize well to the levels of explainability necessary in real-world MDO.

Resolution of uncertainty and reduction of entropy are essential intelligence functions associated with any complex, dynamic, evolving task. The nature of these tasks often precludes training under relevant real-world conditions. In the committee’s judgment it is therefore highly unlikely that future generations of AI systems will be able to address such unstructured challenges within mission-relevant time constraints. The higher the proposed or expected level of autonomous capability of AI systems, the greater the frustration and distrust of real-world users asked to rely on such systems, regardless of the results of testing in a constrained research context. Thus, the

Page 17 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

committee emphasizes the importance of computational and functional models of AI systems relevant to real-world challenges of MDO, as opposed to those developed using traditional assessments in research settings.

KEY CHALLENGES AND RESEARCH GAPS

The committee finds four key challenges in the development of effective models and measures for human-AI teams.

Existing human-AI research is severely limited in terms of the conceptualizations of functions, metrics, and performance-process outcomes associated with dynamically evolving, distributed, and adaptive collaborative tasks. Research programs that focus primarily on the independent performance of AI systems generally fail to consider the functionality that AI must provide within the context of dynamic, adaptive, and collaborative teams. Research should specifically consider the dynamic process factors and timing constraints involved when human-AI team members address uncertainties in task progress or the evolution of performance over work sessions, shifts, task episodes, software updates, and longer time horizons (see Goodwin, Blacksmith, and Coats, 2018).
Many measures of team performance do not address the real-world performance demands of complex and dynamic MDO tasks, which often have high consequences and low tolerance for either delay or information input classification errors. These challenges are multiplied when researchers do not understand, value, or weight the cost of timely, high-confidence resolution of crucial sources of uncertainty in the entropic fog of war.
Currently, human-AI team performance evaluation does not adequately address the role of AI systems in providing support and coordination as an effective and trusted teammate. These performance evaluation considerations are needed for model-optimization criteria and/or as skill assessments of AI performance in real-world tasks. One operational example illustrating the trust that is needed in an AI system is whether the system performs as promised, with degradations in trust occurring due to violations of promised functional capabilities (Bhatti et al., 2021; Demir et al., 2021).
Descriptive models of human-AI team performance need to be extended into computational models that can predict the relative value of teaming compositions, processes, knowledge structures, interface mechanisms, and other characteristics.

RESEARCH NEEDS

The committee recommends addressing four major research objectives to improve human-AI teaming performance.

Research Objective 2-1: Human-AI Team Effectiveness Metrics.

Research is needed to define metrics describing how AI systems help to manage dependencies between themselves and other team members performing mutually supportive, dynamic, adaptive, and collaborative tasks, within relevant functions. It is advisable that this research consider the limits of how and when an AI system is fixed, meaning unable to recognize the functional roles required and how its capabilities might support those roles. Further, metrics associated with the flexibility of an AI system to adjust its role and contributions to team needs are needed, similar to those metrics assessed in human-only, network-enabled capability teams (Stanton, Baber, and Harris, 2008; Walker et al., 2009). These metrics would best be considered components and figures of merit in the performance specifications and skill-evaluation scores associated with AI systems, similar to skills assessments for human warfighters. This research should specifically consider the different timing constraints on the team members as well as the evolution of performance over work sessions, shifts, software updates, and longer time horizons.

Research Objective 2-2: AI Uncertainty Resolution.

The capability of the AI to resolve temporal and operational uncertainty in situation, role, and plan needs to be quantified. This includes the time to resolve uncertainty (TRU)

Page 18 Cite

Suggested Citation:"2 Human-AI Teaming Methods and Models." National Academies of Sciences, Engineering, and Medicine. 2022. Human-AI Teaming: State-of-the-Art and Research Needs. Washington, DC: The National Academies Press. doi: 10.17226/26355.

×

in situation assessments, in the required AI role in the current network-enabled capability/multi-domain operations configuration, and in the required AI role in updated plans for action. These TRU measures can be integrated with existing network-enabled capability studies of combat estimates or other high-fidelity simulations/synthetic environments (Goodwin, Blacksmith, and Coats, 2018). Even if an AI system has high confidence assessments and good post-hoc explainability, its role in a dynamic human-AI team is extremely limited if the TRU is large, especially compared to the time available for decision making and performance (Caldwell and Wang, 2009). Thus, the committee suggests that modeling emphasize TRU rates and the ratio of TRU to time available as parameters to minimize, in a variety of dynamic contexts with varying situation and information entropy and uncertainty levels.¹

Research Objective 2-3: AI Over-Promise Rate.

The ability of an AI system to appropriately calibrate and execute its expected functions needs to be quantified. The ability of the AI system to deliver as promised contributes to human trust of autonomous systems (Sheridan and Parasuraman, 2005). Trust of others, either human team members or AI systems, is an experiential, asymmetric process based on whether the actor meets/exceeds or falls short of performance demands compared to performance expectations. For example, an analog watch has a limited range of functions and performance capabilities compared to a modern software-enabled smartwatch; however, trust in the analog watch is not based on its performance of complex smartwatch operations, but on its ability to perform its required function of displaying time accurately. Therefore, a relevant performance (and model-optimizing) measure for an AI system might be its over-promise rate (OPR), defined as the number and variety of situations in which its level of automation, expertise, or support performance does not meet expectations, expressed as a fraction of the total number of relevant human-AI task situations in which the AI system is involved. Both a reduction in expectations and an increase in AI capability can reduce an OPR to an ideal minimum, close to zero. This conceptualization of OPR is in opposition to a marketing-based AI approach, in which proposed expectations for system performance are intentionally set high to increase the probability of research funding or product purchase. However, in the proposed area of multi-domain operations, an OPR rate based on well-calibrated expectations is far more likely to engender trust and effective overall human-AI team performance.

Research Objective 2-4: Human-AI Team Models.

Predictive models of human-AI performance are needed to provide quantitative predictions of operator performance and interaction in both routine and failure conditions (Kaber, 2018). These models would do well to build on existing modeling approaches to specifically address design decisions for the human-AI interaction (Kaber, 2018; Sebok and Wickens, 2017). Computational models of human-AI team performance need to be developed to quantify expected performance outcomes along relevant metrics, and across relevant team compositions, characteristics, processes, and designs. These models would benefit from a consideration of both normal and unexpected events (outside of AI training sets), as well as issues of situation awareness, trust, and the potential for both human and AI biases.

SUMMARY

Teaming provides significant performance advantages that go beyond the aggregation of individual teammate performances. Given sufficient levels of team intelligence, including the processes, knowledge structures, and behaviors necessary to promote effective teamwork, humans can team with AI systems to achieve these benefits. Methods for promoting effective teaming between humans and AI systems need to be captured in both descriptive and computational models that can quantify the nature of human-AI team performance, its constituent components, and outcome metrics that capture team dynamics, uncertainty resolution, and the ability to meet performance expectations.

___________________

¹ The ratio parameter should have a maximum acceptable level much less than 1.0.