National Academies Press: OpenBook
« Previous: 1 Introduction
Page 13
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 13
Page 14
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 14
Page 15
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 15
Page 16
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 16
Page 17
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 17
Page 18
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 18
Page 19
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 19
Page 20
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 20
Page 21
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 21
Page 22
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 22
Page 23
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 23
Page 24
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 24
Page 25
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 25
Page 26
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 26
Page 27
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 27
Page 28
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 28
Page 29
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 29
Page 30
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 30
Page 31
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 31
Page 32
Suggested Citation:"2 Managing Safety in Complex Systems." National Academies of Sciences, Engineering, and Medicine. 2022. Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes. Washington, DC: The National Academies Press. doi: 10.17226/26673.
×
Page 32

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

13 Commercial aviation, and other safety-critical activities that society depends on, recognize that hazards1 can never be fully eliminated, particularly those that come from outside the system (e.g., weather’s impact on aviation opera- tions). Therefore, these systems employ systematic processes that begin with identifying hazards and, where possible, eliminating them. Where not pos- sible to eliminate them, these systems work to reduce their occurrence (e.g., building in safety margins or redundancy), control the hazard or the system’s response to the hazard (e.g., dynamically adjusting air traffic routes around convective weather), and developing backup systems to mitigate the effect of a hazardous state that has arisen. An example would be the on-board alerting systems which command avoidance maneuvers to pilots or auto- flight systems if their trajectories reflect a breakdown in the separation from terrain or other aircraft expected from air traffic control (ATC). In doing so, aviation is well protected against single-point failures by multiple “layers of defense.” Accidents that do occur involve multiple lapses throughout the organization. Some of these lapses stem from decisions made well before the accident during design and certification or approval of the technology and of the operating procedures; during selection and train- ing of personnel; and in the operational practices creating the environment 1 The term “hazard” is defined in the International Civil Aviation Organization Safety Man- agement Manual (Doc 9859) as “a condition or object with the potential of causing injuries to personnel, damage to equipment or structures, loss of material, or reduction of ability to perform a prescribed function.” See https://www.icao.int/SAM/Documents/2017-SSP-BOL/ CICTT%20Hazard%20Taxonomy.pdf. 2 Managing Safety in Complex Systems

14 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 in which the “front-line” personnel operate with its corresponding supports for, or stressors on, their performance. Thus, these accidents are sometimes termed “organizational accidents,” reflecting the many contributors to them, and the many facets of processes required throughout all aspects of the organization to protect against them (Reason, 1997). Over the past decades, simultaneous with the development of safety processes extending across all aspects of commercial aviation, the air trans- portation system itself has grown in size, diversity, and complexity. The broader aviation system spans the operations of air carriers, general avia- tion, and new entrants such as unmanned aviation system (UAS) and com- mercial space operations, airports, and air traffic controllers; the design and manufacturing of aircraft and safety critical communications, navigation, and surveillance (CNS) and ATC and air traffic management systems and technologies; and the regulation, oversight, and certification or approval of aircraft, equipment, pilots, procedures, operations, airspace design, and training by the Federal Aviation Administration (FAA). This chapter reviews the processes for identifying and analyzing haz- ards, and developing mitigations against them, to sustain and enhance the safety of commercial air travel in the United States and of U.S. carriers everywhere. It begins with an overview of key safety management prin- ciples and then discusses how these principles are applied within or across organization(s). The chapter then describes the major components of the aviation safety socio-technical system, which include air carriers, airports, manufacturers, suppliers, and regulators, and air traffic service providers, their respective roles, and how they capture and share, or should share, safety-related information. The sections on safety management by aviation organizations and the system as a whole provide illustrative examples of processes by which data are generated by the various components of the aviation system to help with identifying and characterizing emerging haz- ards. A summary of generic types of accident precursor measures that could be monitored and some of the challenges in doing so are discussed in the final section and developed further in subsequent chapters. KEY SAFETY MANAGEMENT PRINCIPLES Industries such as nuclear power, petroleum exploration and refining, chem- ical processing, and commercial aviation all create the possibility of cata- strophic events. All of these industries have developed conceptually similar approaches for identifying and managing the hazards inherent in their operations, although implementation differs considerably across industries. Central to these approaches are the hazard identification and risk assess- ment process and the use of barriers, controls, or other defenses to pre- vent accidents and incidents. Hazard identification and risk assessment

MANAGING SAFETY IN COMPLEX SYSTEMS 15 are processes to identify and characterize all the hazards that could lead to catastrophes to inform safety management. Industries that produce ex- plosive or otherwise hazardous products typically use the term “barrier” to control hazards because their primary intent is to contain hazardous materials from release into the workspace and broader environment. In commercial aviation, the terms “controls,” “safeguards,” and “defenses” are more common, but the principle is the same. Once hazards are identi- fied and characterized, controls or safeguards are put in place to eliminate, control, or mitigate them. These controls can take several basic forms in aviation. Physical controls include design and material features compliant with standards that ensure aircraft frames, wings, and other elements can withstand the vicissitudes of assumed flight conditions. They also include redundancy of safety-critical operational elements to reduce the impact of single-point failures. Aircraft also contain multiple alarm systems to alert pilots and/or automatically resolve a range of hazards. Aviation also increasingly relies on automation and software to execute flights safely, often minimizing the impact of pilot “slips” (Reason, 1990), though this reliance can introduce other hazards as discussed later. Operational controls typically take the form of procedures applied to all operations in general combined with selecting and training the humans expected to execute them. These operational controls can take many forms, ranging from how pilots fly aircraft and are trained to do so, to the stan- dards for separation assurance between aircraft required of air traffic controllers, to procedures mechanics follow in maintenance and repair. Furthermore, these operational controls may extend to creating a proper operating environment through controls on factors potentially degrading individuals’ performances, such as fatigue and distraction, or implied pres- sure to “cut corners” within an organization. Additional controls operate over longer time frames than the day-to-day controls, such as organizational safety management programs that review their processes and operational data. Similarly, regulatory processes, includ- ing both certification of systems and personnel and operational approval of operating processes, must be conducted primarily in advance of the opera- tion. (When updated requirements on the design are recognized as necessary during subsequent operation, they are usually implemented in the form of airworthiness directives, which usually only call out small, focused changes; the grounding of an entire fleet is rare.) A common path to certification for safety-critical systems relies on design, analysis, and testing according to technical performance standards. These standards include a description of the assumed operational environment, minimum operational system-wide performance requirements, and minimum technical performance require- ments of each piece of equipment and system on the aircraft. Standards

16 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 also cover principles for the design and test of safety-critical hardware and software. The standards are developed based on in-depth assessment of hazards and build mitigations into the design. Similar guidelines are used when developing tools for air traffic controllers. Similarly, standards or regulatory guidance typically define minimum specifications on operations in general and use specific systems to support operational approval of many facets of commercial aviation. For example, in document DO-185B, “Minimum Operational Performance Standards (MOPS) for TCAS II version 7.1,” defining the technical standards for certifying the Traffic Collision Avoidance System (TCAS; a tech nological component) is complemented by the FAA Advisory Circular 120-55, “Air Carrier Operational Approval and Use of TCAS II,” which describes re- quirements for operational approval for airlines to use TCAS on their aircraft, such as stipulations on pilot training of its use. Some safety theorists have conceived of controls or defenses as layered in a way to simultaneously provide multiple mitigations against a hazard, often referred to as “defense in depth.” James Reason popularized (Reason, 1997, pp. 12–17), but did not invent, this concept, referred to as the “Swiss cheese” model of accidents (Larouzee and Le Coze, 2000). This model rec- ognizes that all defenses have limits. When the “holes” in multiple defenses line up, a hazardous state can progress to an accident. A new, more comprehensive accident causality model has been pro- posed by Leveson (2011). The simple model of accident causality as the result of linear chains of failure events, with each failure event following from the preceding one, is several hundred years old. Linear models cannot fully explain the occurrence of most accidents in today’s complex systems. To understand why accidents occur in today’s high-tech socio-technical sys- tems, Leveson suggests that a more sophisticated model is needed. Leveson’s new causality model, called STAMP (System-Theoretic Accident Model and Processes), is based on systems theory and includes all types of interactions between system components, not just the traditional component failures found in linear causality models. In this sense, it is an extension of the old models and can handle today’s and future systems that contain software, sophisticated automated components, management decision making, com- plex cognitive decision making by operators such as pilots or air traffic controllers, etc. Traditional linear models assume that accidents can be prevented by preventing component failures or by handling them in some way, such as designing barriers to ensure that one failure does not propagate and cause another failure. In contrast, STAMP treats causality as a control and sug- gests that accidents result from inadequate control over unsafe behavior of system components and their interactions, including but not limited to component failures. In this way, STAMP includes more sophisticated types

MANAGING SAFETY IN COMPLEX SYSTEMS 17 of causality including complex interactions among system components and even circular causality, which forms the basis for system dynamics (Sterman, 2000). STAMP not only includes traditional accident causes, such as failure events slipping past defenses as described in the Swiss cheese model, but also accidents resulting from system design errors where each component operates as designed but their operation together results in a hazardous state, perhaps in circumstances not considered during design or in operating conditions that violate design assumptions. In this model, accidents occur when there are inadequate constraints on the interactions of the compo- nents of the system by some layer of the socio-technical system. Accidents are considered to result from a lack of appropriate constraints on system design. The role of the system engineer or system safety engineer is to identify the design constraints necessary to maintain safety and to ensure that the system design, including the social and organizational aspects of the system and not just the physical ones, enforces them (Leveson, 2004, p. 254). Designers attempt to limit or eliminate hazards during the design pro- cess but typically can only do so within an assumed operating envelope. For example, air traffic operations may specify constraints on aircraft spacing on final approach and flight deck procedures may constrain how fast the aircraft should be flown in turbulence. Violations of these con- straints during operations can lead to accidents. Fundamental to Leveson’s concept of constraints to avoid accidents are feedback loops across the aviation system, such as aircraft components providing feedback to soft- ware and to pilots and providing feedback to operators about design information, and operators feeding back insights about performance and operations beyond design assumptions to designers, so that controls can be better improved and future designs (and criteria for certification and operational approval) modified. An example is the initial deployment of the TCAS, which was issuing false alerts because traffic density had not been fully understood in the design; subsequent designs and ver- sion remedied that problem. Thus, such feedback—between many parties within the overall system—is a valuable information source when analyz- ing for emerging trends in safety. Other safety management principles are mentioned in other chapters, but one more example is included here because of its importance and common misunderstanding about it. Almost all catastrophes include some combination of human error during design, management decision making or operation, violation of correct procedure or policy, and flaws in the procedures (Reason, 1997, p. 61). Human beings, including highly trained professionals, have cognitive biases that lead to mistaken judgments, par- ticularly about risk (Kahneman, 2011, pp. 234–244), and errors can result

18 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 from misunderstandings and faulty sense making2 (Weick and Sutcliffe, 2015). Limits on working memory are considered by some as a particularly important contributor to human error, particularly when individuals are overloaded with multiple tasks and large amounts of information (Reason, 1990). Humans also are prone to unintentional slips,3 particularly when interacting with poorly designed interfaces, or when they are fatigued by their work schedules, and for many other reasons. The factors that tend to promote such errors are well characterized and predictable. Thus, care- ful accident investigations today consider human errors as typically being promoted or triggered by workplace or organizational factors and, as such, are often consequences rather than causes (Reason, 1997, p. 126). It is im- portant to bear in mind that, overall, human operators also contribute to safety by catching errors (their own and those of others), and often are the assumed reversionary mode4 in the case of any limitation or malfunction of the technology. Of note, detailed analyses find 20% or more of normal com- mercial aviation flights involve some “aircraft malfunction” that requires human intervention to complete the flight as desired (Performance-based Operations Rulemaking Committee and Commercial Aviation Safety Team Flight Deck Automation Working Group, 2013, pp. 29–31). Such failure resolution is usually dependent on real-time operators, such as on-board pilots or air traffic controllers. A pilot can fly a B777, for example, even if the autopilot or autothrottle is inoperative. Thus, dealing with failure is an active part of normal operations, and current operations routinely assume operator intervention to address problems that emerge during flight. ORGANIZATIONAL SAFETY CONTROLS As one important form of safety control, organizations (whether carriers, manufacturers, suppliers, or air traffic operations) have developed multiple procedures and policies designed to guide the activities of personnel and organizations to stay within the behavioral constraints assumed during design and safety assurance.5 For air carriers, these take many forms that apply to different tasks appropriate to pilots, mechanics, schedulers, and others whose work influences the safety of flights. Often these procedures are detailed and prescriptive, to a degree requiring detailed design analysis 2 “Sense making” is defined by Maitlis and Christianson (2014) as a process through which people work to understand issues or events that are novel, ambiguous, confusing, or in some way violate expectations. 3 A slip is defined as when person has the correct intention, but due to some cognitive block, does not execute it correctly (e.g., pressing the wrong button). 4 In reversionary mode, essential flight information is shown on the flight display should there be a failure in the display. 5 This section draws considerably from the first four chapters of Reason (1997).

MANAGING SAFETY IN COMPLEX SYSTEMS 19 and testing to ensure the procedure is accurate and appropriate to the task at hand (Ockerman and Pritchett, 2000). When the procedures are experienced as being incompatible with the task at hand they can be by- passed, often without full cognizance of the attendant risks (Leveson, 2004, p. 245), a general phenomenon noted in several accidents attributed to non-compliance to procedures by maintainers and ground handlers. (See the accident reports for Alaska Airlines flight 261, 2000, and the B747-400 freighter flight out of Bagram on April 29, 2013, for examples implicating procedures in maintenance and ground handling, respectively.) Aviation is heavily focused on procedures as both an effective guide to pilots’ activity and teamwork on the flight deck, and to establish a con- sistent structure supporting ATC and overall traffic management. Flying aircraft in variable conditions requires adherence to procedures, while also exercising a considerable judgment on the part of pilots to apply the correct procedure for the circumstance, or to respond appropriately when no procedure applies or if the procedure is not working. Carriers address this need through recurrent training, determinations of competence, on- going observation, and thorough adherence to multiple checklists to assist in decision making and ensure that no shortcuts are taken in pre-flight, flight, or post-flight. Carriers, manufacturers, and other aviation service providers exist to serve and profit from the public’s demand for travel. These private com- panies operate in a highly competitive, highly regulated industry in which safety is paramount, but they must also generate sufficient revenues to continue operating.6 Although FAA’s mission for both air traffic operations and the regulatory function of its aviation safety office is primarily safety, it also must operate an efficient system reflecting the nation’s demand for air travel. The ability of the aviation system, private and public sectors col- lectively, to safely serve the demand for air travel requires constant vigilance and ongoing application of safety controls that can be difficult to sustain. One of the hazards in managing against major accidents is that vigilance and ongoing safety controls are hard to sustain when fatal accidents are rare and the controls may be seen as inefficient or unduly onerous. SYSTEM ACCIDENTS A paradox of all organizational safety management efforts dealing with potential catastrophic events is that the complexity of the controls put in place to avoid accidents, especially those involving complex technologies and software, can itself create hazardous situations (Reason, 1997; see also Perrow, 1984; Reason, 2000; Turner, 1978). For example, a common safety 6 Reason (1997) refers to this as the “production-protection” space.

20 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 system within modern autoflight serves to protect against low speed and stall. However, several accident reports revealed that when the auto flight system is commanded in an unanticipated manner, or used in unusual flight operations, or a vital sensor fails, these systems can instead create unexpected behaviors or fail to act as expected, contributing to accidents, particularly when personnel are trained to depend on them.7 In 2015, FAA required Part 121 air carriers to implement safety manage- ment systems (SMSs).8 (For other aviation organizations— manufacturers, repair stations, and small carriers—use of SMSs is voluntary, and may be required at a later date.) In a robust SMS, an organization routinely checks or audits whether procedures and policies are adhered to and/or require updating or replacement, including whether the organization has a rigorous process for identifying and mitigating hazards. When incidents or deviances occur, managers are expected to investigate with a root-cause analysis that extends beyond proximate causes to determine the probable contributing causes, develop a corrective action plan (CAP), and carry it out in a timely manner. SMSs are built around the basic components of all quality management systems: a continuous cycle of “plan, do, check, act.” They provide the formal management processes by which hazards are identified and safety controls are put in place, assessed, and improved. SMSs depend heavily on voluntary non-punitive reporting of incidents and concerns by employees, the tracking and analysis of incidents and deviances, and follow-up. The necessary motivation and environment for reporting and imple- menting an SMS as intended depend on an organization’s culture, particu- larly the priority it gives to safety compared with other values and goals. These topics are discussed in more detail in the next chapter, but for this section the fundamental concepts to carry forward are that (a) organization culture represents the organization’s espoused beliefs, values, and actions and underlying, deeply rooted assumptions that drive organizational behav- ior and (b) safety culture can be represented by an organization’s subset of values and assumptions about safety (Schein, 2010). Thorough reporting on mistakes, errors, and incidents depends on a culture that avoids blame, encourages and rewards reporting, provides non-punitive reporting systems, seeks to understand the range of contributing causes, and follows up to correct the conditions that can lead to errors (Reason, 1997, pp. 196–218). 7 See the following aircraft accident reports: National Transportation Safety Board 2014 report on Asiana Airlines Flight 214 (https://www.ntsb.gov/investigations/accidentreports/ reports/aar1401.pdf) and Bureau d’Enquêtes et d’Analyses 2012 report on Air France 447 (https://bea.aero/docspa/2009/f-cp090601.en/pdf/f-cp090601.en.pdf). 8 FAA grants the authority to operate scheduled air service in the form of a Federal Aviation Regulations 121 certificate. Air carriers authorized to operate under a Part 121 certificate are generally large, U.S.-based airlines, regional air carriers, and all cargo operators.

MANAGING SAFETY IN COMPLEX SYSTEMS 21 Importantly for this report, an effective SMS will ensure that an orga- nization will generate and analyze indicator and precursor measures that managers can apply to a continuous safety improvement process. This information can be generated from all facets of safety-critical organization operations, including flight, communications, and maintenance. SMSs as a source of information for precursor measures are described further in Chapter 4. PRECURSORS The committee is certainly aware of the accident data it was asked to re- view in the Statement of Task, but it is also aware that fatal commercial aviation accidents in the United States have become too infrequent to iden- tify emerging trends. Instead, the aviation industry needs the capability to identify emerging trends in safety before they manifest as accidents or seri- ous incidents, as enabled by the significant amount of data that is collected about many aspects of the aviation system. Thus, the remainder of this chapter focuses on precursor measures that are reasonably hypothesized to reflect hazards warranting investigation, characterization, and research into proper mitigations. This term may be modified by “potential” as an adjective where it is important to avoid the implication that the measure has a proven link to hazard and risk. Precursor measures are used in many industries to monitor practice in controlling hazards, as illustrated in Figure 2-1. FIGURE 2-1 Precursor role in risk identification and mitigation.

22 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 Accidents typically involve more than a single contributing cause. Trends in any of these measures, however, could be worrisome and high- light where deeper analysis of the system of controls and shifts in safety management may be needed. We try to avoid confusion in the use of the term “incidents” because they have formal meanings for FAA and National Transportation Safety Board data collection, as described in Chapter 4. Incidents, however, can be thought of as precursors in the sense that they were events that might have turned into accidents, but did not because some other control, or random circumstances, kept these events from escalating. In general, the term “precursor measure” is viewed as a leading indicator and “incidents” as lagging indicators, but these distinctions are inexact. Leveson (2015) proposes an alternate approach to the process described above, which is to develop leading indicators based on assumptions used in design and operations and their vulnerability rather than the likelihood of accidents. This approach shows promise because it focuses on hazards that emerge from activities that exceed what systems were designed for, which can emerge as the overall system evolves and aspects of the system are extended to new types of operation. This approach does depend on being able to identify the myriad assumptions supporting design and operations, to appropriately relate them to the safety concern, and to minimize the cognitive biases that can cloud analysts’ judgment. The committee intends to explore this concept further in future reports. For the purposes of this report, the committee also considers emerging trends along a time dimension. Some emergent hazards may be known and actively monitored as controls are implemented to reduce or manage the hazard. These we define as within the immediate time frame. Some emerg- ing hazards may represent low risk today and be monitored, but may be growing and require some sort of control in the reasonably near future, say, within 5–10 years. These we define as within the intermediate time frame. Beyond 5–10 years are potential emerging hazards that may be considered potential issues of concern, but for which data do not yet exist to assess potential hazards—indeed, in the case of significant changes to operations and technologies, predictive modeling of potential hazards may be required. These we refer to as future concerns. Responses to emerging trends in the immediate, intermediate, and future time frames may differ. In the imme- diate term, operational mitigations are often the most appropriate unless a radical change to the technology is found necessary. Future emerging hazards, if sufficiently concerning, lend themselves to mitigations such as aircraft and equipment design changes and revised standards. In the inter- mediate time frame, appropriate responses would be a mix of operational mitigations and redesign of specific equipment, procedures, or processes. The approaches one uses to identify and analyze emerging hazards along

MANAGING SAFETY IN COMPLEX SYSTEMS 23 this time dimension would also differ. For future and intermediate emerging hazards for which no measures are available, one might rely on simulations or workshops to elicit judgments from subject-matter experts representing a range of perspectives. In contrast, for immediate and intermediate emerg- ing hazards for which metrics are available, a mix of hypothesis-driven and hypothesis-free discovery methods, as these terms are defined in Chapter 4, would be preferable. The committee returns to this time dimension in Chapters 4 and 5. AVIATION SOCIO-TECHNICAL SYSTEM SAFETY CONTROLS Commercial aviation safety is managed by a complex set of interdependencies among carriers; designers; manufacturers; suppliers; maintenance, repair, and overhaul organizations; the Air Traffic Organization; regulators; and, ultimately, Congress. These interdependencies are depicted in Figure 2-2, which is an example of how the STAMP model can be applied to convey the inter relationships among the multiple components of the aviation sys- tem that create the level of safety in commercial operations that currently exists. In the figure, the downward arrows indicate controls exercised by one layer of the system on other layers below. The upward and sideway arrows reflect actual, as well as desirable, feedback of information within and across organizations. One example of this feedback starts at the bottom right side of the fig- ure where issues arising in the operation of one set of controls in the operat- ing process generate reports and audits about problems or incidents that are fed back to a separate company responsible for the design of the technology relied on by the operator. The company responsible for design feeds back information in the form of hazard analyses into design management, which sets new safety standards for design. The designers establish new safety constraints, which are fed back to the operating company in the form of revised operating constraints. The designers should also modify designs for future systems, which may then be manufactured by yet another company. The operator in this example would also be feeding reports upward to the regulator, which may revise regulations, procedures, or certification and share information with other operators. The regulator may also share in- formation upward to Congress, which may, in turn, revise laws that affect all components of the system. The feedback loops in STAMP generate different kinds of safety infor- mation, the use of which should enable the system to adapt to manage hazards as they emerge. Some of the data and information generated in these feedback loops can also serve as sources of precursor measures. In Chapter 4, we describe examples of databases used to monitor commercial aviation safety in greater detail and how the data are being monitored

24 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 FIGURE 2-2 Control in a socio-technical system. SOURCE: Based on Figure 4, the STAMP model, in Leveson (2004). and analyzed to identify trends. In the next section we describe precursors generically and relate them to the various levels of socio-technical control. SCOPE OF PRECURSOR MONITORING AND ANALYSIS The general categories of data available, or potentially available, for iden- tifying emerging hazards are illustrated in Table 2-1 using the same func- tions shown in Figure 2-2. The examples provided are largely drawn from operational data derived from existing operations and safety management.

MANAGING SAFETY IN COMPLEX SYSTEMS 25 (Identifying emerging hazards from issues such as a new generation of vertical take-off and landing air taxis and UASs, as described in Chapter 5, would require different approaches at the design and certification levels.) As depicted in Table 2-1, the organizational level can be thought of as having three elements: the operational front line or “sharp end” of the orga- nization; management; and culture. These elements are discussed separately here for the purpose of categorizing different precursors, but they are often tightly interwoven. Managers, for example, can be conducting operational, front-line functions and the culture of an organization is not a distinct element; instead, it pervades everything the organization does and how effectively it manages safety at both the front line and management levels. For air carriers, the front-line column refers to flight crews flying air- craft and the technologies they rely on; maintenance crews responsible for their upkeep; and so forth. For FAA’s ATC service, the front line would include the controllers responsible for separating aircraft. For original equipment manufacturers (OEMs) it includes the front-line staff designing and manufacturing aircraft. For FAA’s Aviation Safety Office (AVS), the front line includes the inspectors and certifiers of operations and equip- ment, and training. The management column of the table represents the levels of manage- ment of any organization involved in commercial aviation. The culture column and sources of precursors are described in more detail in Chapter 3. The column labeled interactions among organizations is meant to capture the horizontal flows of information across organizations as illustrated in Figure 2-2. Front Line At the organizational-level front line, many different kinds of potential precursor measures can be assessed, particularly from voluntary reports of safety issues and concerns submitted by front-line workers (see Table 2-1). More detail about existing data sources used in aviation are provided in Chapter 4. Other kinds of reports are mandatory, such as ATC re- ports about near-midair collisions and runway incursions and excursions. Line audits capture observers’ measures of how flight crews are handling normal issues well and where they struggle. Some analysis is enabled by digital recordings of operations, such as the thousands of data streams in digital flight data recorders capturing the state of the aircraft and its sys- tems throughout the flight, allowing for analysis of malfunctions, alarms, and flights operated outside of company-set limits (exceedances); similarly, air traffic surveillance logs provide indicators of loss of separation as de- fined by operational errors. As described in more detail in Chapter 4, the availability of such massive heterogeneous sources can facilitate analysis

26 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 TABLE 2-1 Illustrative Generic Accident Precursor Measures Control Layers Function Socio-Technical System Individual Organizations Example Interactions Among Organizations Providing Sources of Precursor MeasuresFront Line Management Safety Culture Flight Operations Adverse events, alarms, flight deviations, runway incursions, etc. Line audits Voluntary and mandatory reports Research indicating inadequate human- systems design in displays and automation Software issues Rigor of hazard ID and risk assessment Procedure/control lapses Voluntary and mandatory reports SMS outputs (self- audit reports and timeliness of CAPs) Fatigue measures Indictors of training and staff competence Voluntary reports Assessments of organizational safety culture Measures of vigilance/drift Airline operator–regulator check rides Multiple data flows between operators and original equipment manufacturers (OEMs) on aircraft equipment, pilot-automation system interactions, and software performance and maintenance Feedback about design assumptions, operational plans, and revisions to guidance based on actual operations OEM regulator certification, supplemental type certificates, airworthiness directives Maintenance and Repair Reports of service difficulties Lapses in procedures Lapses in maintenance/repair requirements and procedures Lapses in software updates and revisions Design Expected performance and behavior Assumptions, particularly about interactions with other systems and operating conditions OEMs Reports of service difficulties Quality assurance/ quality control measures Suppliers Indicators of components and equipment not meeting design tolerances/industry standards Standards Timeliness of new standards and updates to existing standards to address emerging hazards Regulation Information generated from FAA certification, inspection, audits Advisory circulars, notices to flight crew

MANAGING SAFETY IN COMPLEX SYSTEMS 27 TABLE 2-1 Illustrative Generic Accident Precursor Measures Control Layers Function Socio-Technical System Individual Organizations Example Interactions Among Organizations Providing Sources of Precursor MeasuresFront Line Management Safety Culture Flight Operations Adverse events, alarms, flight deviations, runway incursions, etc. Line audits Voluntary and mandatory reports Research indicating inadequate human- systems design in displays and automation Software issues Rigor of hazard ID and risk assessment Procedure/control lapses Voluntary and mandatory reports SMS outputs (self- audit reports and timeliness of CAPs) Fatigue measures Indictors of training and staff competence Voluntary reports Assessments of organizational safety culture Measures of vigilance/drift Airline operator–regulator check rides Multiple data flows between operators and original equipment manufacturers (OEMs) on aircraft equipment, pilot-automation system interactions, and software performance and maintenance Feedback about design assumptions, operational plans, and revisions to guidance based on actual operations OEM regulator certification, supplemental type certificates, airworthiness directives Maintenance and Repair Reports of service difficulties Lapses in procedures Lapses in maintenance/repair requirements and procedures Lapses in software updates and revisions Design Expected performance and behavior Assumptions, particularly about interactions with other systems and operating conditions OEMs Reports of service difficulties Quality assurance/ quality control measures Suppliers Indicators of components and equipment not meeting design tolerances/industry standards Standards Timeliness of new standards and updates to existing standards to address emerging hazards Regulation Information generated from FAA certification, inspection, audits Advisory circulars, notices to flight crew

28 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 methods of abnormalities and potential hazards beyond those that measure exceedances. The above examples come from operational experience, but precursors can also be generated from research and experimentation. For example, analysis of a variety of data sets combined with flight simulator testing can produce indicators of whether designers have anticipated the ways that pilots will interact with flight management systems and the kinds of errors that inadequate design features can cause (Performance-based Operations Rulemaking Committee et al., 2013). Management Safety information about the management of the organization takes dif- ferent forms, although it, too, is highly dependent on voluntary reporting, which typically comes from the same sources of voluntary reports at the front line. For example, a pilot, controller, or maintenance technician may self-report a violation of a procedure, which occurred at the front line, but also indicate that he or she was fatigued because of management’s call for overtime due to staffing issues. Other organizational information could be generated from SMSs, such as audit reports, corrective action plans taken in response to audits or FAA inspections or from internal organization reporting system reports about such issues as supervisors telling front-line workers to side-step a procedure or to proceed with operations despite an identified and uncontrolled hazard. Ongoing independent audits of flight operations (such as Line Operations Safety Audits, or LOSAs) may indicate issues with company training or procedures that need to be adjusted (see Table 2-1). Culture As noted, an organization’s safety culture determines whether employees feel free to report without fear of recrimination and how effectively the company manages safety and carries out its SMS functions. Internal vol- untary reporting systems may provide insight about a company’s culture as perceived by those reporting. Culture, particularly deeply embedded assumptions and values, is challenging but not impossible to assess, as described in the next chapter (NASEM, 2016). Safety precursor informa- tion about an organization’s safety culture could be drawn from regular employee surveys (which are unable to measure culture directly but can reflect current employee perceptions), focus groups, and other efforts if they are being done regularly and consistently to a high standard. Indi- cators of organizational drift from its safety goals can be extrapolated from mundane organizational imperfections such as tolerance for routine

MANAGING SAFETY IN COMPLEX SYSTEMS 29 operational errors, failures in carrying out procedures and policies, failures in compliance, weak monitoring and control practices, increasing weaken- ing and misalignment of organizational culture (differences in espoused and actual behavior), managerial defensiveness and simplified views, decreases in audits, decreases in training, increases in regulatory complaints, missed audits, missed deadlines, and increased staff turnover (Roux-Dufort, 2007; Williams et al., 2017). Areas of Safety Controls The rows of Table 2-1 provide illustrative examples of precursors across the different areas of aviation in which safety is controlled in some manner: flights, maintenance, design, and so forth. The committee is most aware of precursor measures being assessed in data taken from flight operations, as, indeed, most current safety analysis and data collection are focused at this level (as described in greater detail in Chapter 4). Examples in other areas and using other data sets are more conceptual at this point in the commit- tee’s information gathering and deliberation. Interactions Among Organizations In concept, precursor information could be gleaned from the information flows across organizations as depicted in Figure 2-2. For example, problem reports of pilot interactions with automation or maintenance issues with equipment can originate within operators and be fed back to designers and OEMs, who, in turn, may offer changed guidance for operations; the regulator may also choose to update its guidance and advisories and, in significant situations, update criteria for certification and operational approval. The column of interactions among organizations of Table 2-1 offers sources of information for precursors rather than generic examples since they remain conceptual at this point in the committee’s deliberations. SUMMARY AND ASSESSMENT This chapter introduces the complexity and layered nature of the avia- tion industry and its approach to controlling hazards. It also describes the general attributes of precursor information that can be monitored for how well safety controls are working. It categorizes at a conceptual level the landscape of potential indicators by which emerging trends may be iden- tified and characterized, which is meant to convey the multiple layers of controls, lapses in which can lead to accidents, and examples of measures of weaknesses in these controls. It also describes a time dimension for different emerging hazards. Hazards can possibly arise within any component of the

30 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 socio-technical system. Therefore, it is desirable to have precursor measures available for monitoring whether troublesome trends are becoming appar- ent in any area of the system. This conclusion will drive the committee’s future information-gathering efforts and reports. The collection, aggregation, analysis, and use of precursors to maintain and improve commercial aviation safety is discussed in Chapter 4. Also discussed in subsequent chapters are the dimensions of safety management that lack good precursor measures, as well as the state of the committee’s understanding at the time of this writing of how available precursor mea- sures are being measured, analyzed, and applied to safety management. The committee closes this chapter with two observations that subsequent chapters build on. Concerns with Sharing and Integrating Proprietary and Sensitive Information An important observation from this chapter is that an ideal system would analyze multiple data sources to both enlarge its overall data set and to in- clude many perspectives into potential failures in controls. Such aggregation could, in theory, result in multiple feedback loops supporting safety controls at many levels throughout the industry. However, as detailed in Chapter 4, the aviation industry comprises many independent entities that each gather information for their own purposes. Thus, sharing and integrating data can be difficult for both pragmatic reasons (e.g., different data formats or types of measures made between different operators) and for policy reasons (e.g., data that are considered by their owners too sensitive or proprietary to gather if they will be forced to broadly share). These practical difficulties suggest that, in many cases, feedback loops within the system, and moni- toring for precursors, will need to be constructed deliberately in a manner that best supports safety control within the constraints imposed on data sharing and integration. Potential Uncontrolled Hazards There are perhaps three potential weaknesses with the organizational sys- tems of control described in this chapter. First, random and rare events can align and occur in a number of ways beyond what current processes can measure or control, particularly as flight systems, and their software, increase in complexity. The committee assumes that such randomness would result in isolated events and not trends. The second is the tendency of organizational managers, lulled by the rare occurrence of catastrophes, to take advantage of the multiple defenses by extending operations in ways that support new functions or more efficient operations but were not part of the initial safety analysis when

MANAGING SAFETY IN COMPLEX SYSTEMS 31 first designed and implemented. Reason describes this tendency as “trading off added protection for improved production” (Reason, 1997, p. 6).9 This example can potentially be addressed by high-reliability organizations with strong safety cultures, as described in the next chapter, but could also be reflected in emerging trends. A third possible weakness is that the system of controls described herein depends heavily on how comprehensively hazards are identified and characterized. Controls cannot be put in place for hazards that have not been identified or properly understood. After more than seven decades of commercial aviation on a large scale, one presumes that most hazards have been identified, but the system is evolving over time and there is always the possibility of a hazard emerging that was not expected in design or before the existing system of controls was put in place. Examples would include introduction of drones into airspace around airports, the increasing sophistication and reliance on software for safety-critical functions, and problems with occasional fires from lithium-ion batteries experienced in recent years. We return to the issue of identifying unanticipated emerging hazards in Chapter 5. REFERENCES Kahneman, D. 2011. Thinking Fast and Slow. Farrar, Straus, Giroux. Larouzee, J., and J-C. Le Coze. 2000. Good and bad reasons: The Swiss cheese model and its critics. Safety Science 126. Leveson, N. 2004. A new accident model for engineering safer systems. Safety Science 42:237–270. Leveson, N. G. 2011. Engineering a Safer World. MIT Press. Leveson, N. 2015. A systems approach to risk management through leading safety indicators. Reliability Engineering and System Safety (136). Maitlis, S., and M. Christianson. 2014. Sensemaking in organizations: Taking stock and moving forward. The Academy of Management Annals 8(1):57–125. NASEM (National Academies of Sciences, Engineering, and Medicine). 2016. Strengthening the Safety Culture of the Offshore Oil and Gas Industry. The National Academies Press. Ockerman, J., and A. Pritchett. 2000. A review and reappraisal of task guidance: Aiding workers in procedure following. International Journal of Cognitive Ergonomics 4(3):191-–212. Performance-based Operations Rulemaking Committee and Commercial Aviation Safety Team Flight Deck Automation Working Group. 2013. Operational Management of Flight Path Management Systems. Federal Aviation Administration, U.S. Department of Transportation. Perrow, C. 1984. Normal Accidents: Living with High-Risk Technologies. Basic Books. Rasmussen, J. 1997. Risk management in a dynamic society: A modeling problem. Safety Science 21(2–3):189–197. Reason, J. 1990. Human Error. Cambridge University Press. Reason, J. 1997. Managing the Risks of Organizational Accidents. Ashgate Publishing Limited. 9 See also Rasmussen (1997).

32 EMERGING HAZARDS IN COMMERCIAL AVIATION—REPORT 1 Reason, J. 2000. Safety paradoxes and safety culture. Injury Control and Safety Promotion 7(1):3–14. Roux-Dufort, C. 2007. Is crisis management (only) a management of exceptions? Journal of Contingencies and Crisis Management 15(2). Schein, E. H. 2010. Organizational Culture and Leadership, 4th ed. John Wiley. Sterman, J. 2000. Business Dynamics: Systems Thinking and Modeling for a Complex World. Irwin McGraw-Hill, Boston. Turner, B. A. 1978. Man-Made Disasters. Wykeham Publications. Weick, K., and K. Sutcliffe. 2015. Managing the Unexpected: Sustained Performance in a Complex World. Wiley. Williams, T. A., D. A. Gruber, K. M. Sutcliffe, D. A. Sheperd, and E. Y. Zhao. 2017. Organi- zational response to adversity: Fusing crisis management and resilience research streams. Academy of Management Annals 11(2). https://doi.org/10.5465/annals.2015.0134.

Next: 3 Safety Culture and Its Assessment »
Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes Get This Book
×
 Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Commercial aviation safety in the United States has improved more than 40-fold over the last several decades, according to industry statistics. The biggest risks include managing safety in the face of climate change, increasingly complex systems, changing workforce needs, and new players, business models, and technologies.

TRB Special Report 344: Emerging Hazards in Commercial Aviation—Report 1: Initial Assessment of Safety Data and Analysis Processes is the first of a series of six reports that will be issued from TRB and the National Academies of Sciences, Engineering, and Medicine over the next 10 years on commercial aviation safety trends in the U.S.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!