Page 29 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

5

Improvements to the TARAM Process

This chapter provides a discussion of findings related to the gaps identified in the current Transport Airplane Risk Assessment Methodology (TARAM) analysis process and provides recommendations for improvements.

IMPROVING SYSTEMATIC RISK MODELING IN THE TARAM PROCESS

In Section 4.1 of the Handbook, TARAM discusses the creation of a causal chain, which starts with the “condition under study” and ends with the “unsafe outcome(s).” This causal chain describes a series of airplane-level events that may result in unsafe outcome(s). In the probabilistic risk assessment (PRA), utilized for other technological systems such as nuclear power plants¹ and space exploration,² a causal chain is commonly modeled by event trees. Event trees have inductive logic and are used to model (using Boolean logic) the chronological sequences of system-level events from an initiating event to an end state. In PRA, fault trees have deductive logic and are used to model (using Boolean logic) the causal and functional relationships between system-level events in the event trees and their underlying subsystems and components/equipment. Similarly, 14 CFR 25.1309 fault trees or other probabilistic analysis could be integrated with the causal chains in TARAM. This integration of TARAM’s causal chains and the 14 CFR 25.1309’s fault trees could help identify missing failure conditions and highlight potential design gaps. This integration would also provide a more comprehensive probabilistic risk assessment that could help address the lack of data for the conditional probabilities³ (CPs) in the TARAM causal chains. Although the TARAM Handbook indicates the potential use of FTs from the design certificate to support the lack of data for CPs—this is being estimated based on engineering judgment.

Finding: TARAM analysis has no referencing to 14 CFR 25.1309 fault tree anaylsis for failure conditions, which may need to be integrated with field data.

___________________

¹ U.S. Nuclear Regulatory Commission, 2020, Acceptability of Probabilistic Risk Assessment Results for Risk-Informed Activities, Regulatory Guide 1.200, Revision 3, Washington, DC.

² National Aeronautics and Space Administration, 2011, Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners, NASA/SP-2011-3421, Washington, DC: NASA Center for AeroSpace Information.

³ Defined as the probabilities of unsafe outcomes given the occurrence of the initial event under study.

Page 30 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

Finding: TARAM’s causal chain needs to build from 14 CFR 25.1309 fault tree analysis (or other probabilistic analyses) to provide a more complete assessment of risk-contributing causal factors.

TARAM currently utilizes worksheets for calculating and presenting the risk outputs. To integrate the airplane-level causal chain with fault trees, a software tool is needed for the derivation of minimal cut sets in Boolean logic, event sequence quantification, and risk calculation. There are a number of existing software tools available that have the capability to perform these functions. For instance, the fault tree analysis in the type certification utilizes the Computer Aided Fault Tree Analysis System (CAFTA) software tool,⁴ which also has the functionality of building and quantifying event tree models to represent the airplane-level causal chain and integrate the event trees with the fault trees. As another example, the U.S. Nuclear Regulatory Commission (U.S. NRC) utilizes the Systems Analysis Programs for Hands-on Integrated Reliability Evaluation (SAPHIRE) software tool⁵ to develop a standardized PRA model by integrating event trees and fault trees for each operating nuclear power plant.

When the data to quantify component/equipment-level inputs (such as failure probabilities for basic events in fault trees) are unavailable or insufficient, one option would be to integrate explicit models of failure mechanisms underlying the component/equipment-level events with event trees and fault trees as was done, for instance, in the development of the Integrated Risk Information System (IRIS) software tool for a previous Federal Aviation Administration (FAA)-funded research project.⁶ The IRIS software integrates an airplane-level causal chain (modeled by an Event Sequence Diagram) and fault trees with a Bayesian Belief Network (BBN) that models the underlying causal factors. The BBN is an acyclic graphical modeling technique, where the causal factors and their influence paths are represented by nodes and edges, respectively. The causal relationship between two factors is quantified using conditional probabilities, typically estimated based on data and subjective judgment. As another example of software code, recent research in the nuclear power domain has developed an Integrated PRA (I-PRA) methodology⁷ to integrate event trees and fault trees with simulation models of underlying failure mechanisms by generating a probabilistic interface equipped with key functions to convert the simulation data to the PRA inputs considering uncertainty analysis and dependent failure analysis. The I-PRA methodology models the underlying causation using a system performance simulation rather than translating the system behavior to a probabilistic graphical model as done in IRIS. For instance, as stated in Chapter 4 and in the previous section of this chapter, if the wear-out failure TARAM analysis lacks sufficient data to fit the Weibull distribution for calculating the expected value “DA,”⁸ additional data could be generated by simulation modeling for the physical degradation mechanism of concern using the probabilistic physics-of-failure (PPoF) approach.⁹ In this case, the PPoF model could be interfaced with event trees and fault trees using the I-PRA methodology.

For risk estimation, adequate treatment of dependency is crucial. In risk analysis, scenarios are represented by the intersections of multiple events; hence, risk quantification requires the calculation of their joint probabilities. If the events (E₁, E₂, …, E_N) are independent, their joint probability¹⁰ can be calculated by multiplying their marginal probabilities, Pr(E₁, E₂, …, E_N) = Pr(E₁) * Pr(E₂) * … * Pr(E_N). Meanwhile, if the events are not independent, the joint probability must be computed using the chain rule of probability, Pr(E₁, E₂, …, E_N) = Pr(E₁) * Pr(E₂ | E₁) * … * Pr(E_N | E₁, E₂, …, E_N–1). In the context of risk analysis, most often, the existence of dependency tends to increase the conditional probability of a failure event, given its preceding

___________________

⁴ Electric Power Research Institute, 2014, “Computer Aided Fault Tree Analysis System (CAFTA), Version 6.0b,” Palo Alto, CA.

⁵ U.S. Nuclear Regulation Commission, 2011, “Systems Analysis Programs for Hands-on Integrated Reliability Evaluations (SAPHIRE) Version 8, NUREG/CR-7039,” Washington, DC: Office of Nuclear Research.

⁶ K. Groth, C. Wang, and A. Mosleh, 2010, “Hybrid Causal Methodology and Software Platform for Probabilistic Risk Assessment and Safety Monitoring of Socio-Technical Systems,” Reliability Engineering & System Safety 95(12):1276–1285.

⁷ H. Bui, T. Sakurahara, J. Pence, S. Reihani, E. Kee, and Z. Mohaghegh, 2019, “An Algorithm for Enhancing Spatiotemporal Resolution of Probabilistic Risk Assessment to Address Emergent Safety Concerns in Nuclear Power Plants,” Reliability Engineering & System Safety 185:405–428.

⁸ Defined as “the expected number of airplanes that would experience the subject failure, if left undetected, during the time period under study,” in Chapter 5 in the TARAM Handbook.

⁹ M. Azarkhail and M. Modarres, 2012, “The Evolution and History of Reliability Engineering: Rise of Mechanistic Reliability Modeling,” International Journal of Performability Engineering 8(1):35–47.

¹⁰ The joint probability of events A and B is represented as the probability of their intersection P (A ∩ B).

Page 31 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

failure event(s), compared to the marginal probability, for instance, Pr(E₂ | E₁) > Pr(E₂). Therefore, inadequate consideration of known dependencies in risk quantification can result in underestimating risk and, ultimately, lead to an unsafe decision.

In the current TARAM, risk scenarios are represented by an intersection of airplane-level events in the causal chain.¹¹ The dependency among those airplane-level events is addressed by directly estimating the CPs of unsafe outcomes, given the condition or event being analyzed, as input to the TARAM risk calculations. This approach can work if adequate data are available to support the airplane-level CP estimation. However, it is not always feasible to find either sufficient relevant data for the simultaneous occurrence of multiple events at the airplane level or operational data for the airplane-level scenarios that have led to a catastrophic outcome; thus, reliance on the data-driven CP estimation can result in inaccurate risk outputs and significant uncertainties. The TARAM Handbook states that, when historical or test data are not available, CPs can be estimated based on design and certification fault tree analyses. Based on the presentations provided to the committee by the FAA, the lack of data for CPs is, however, mainly addressed by using engineering judgment.

Additionally, the current continued operational safety (COS) decision-making practice accounts for the common cause failure (CCF) as one of the qualitative decision criteria (Table 9, “Qualitative Safety Criteria,” in the FAA Seattle ACO Branch Transport Airplane Safety Manual) for determining whether the condition is unsafe. The Qualitative Safety Criteria include Criterion 1.c, “The condition is a foreseeable single failure, cascading failure sequence, or common cause failure scenario that could result in a catastrophic event,” and if this criterion is assessed to be YES, that is sufficient to classify the issue as an unsafe condition, regardless of the TARAM risk outputs and the other safety criteria. The treatment of common cause failure in the current COS decision-making is qualitative and relies on expert judgment by the Corrective Action Review Board (CARB), while the likelihood and airplane-level consequence of CCF are not explicitly modeled in TARAM.

PRA for technological systems in other domains, such as nuclear and space, has a model-based approach for dependency treatment as follows: (1) it integrates the system-level causal chain (typically modeled by event trees) with fault trees that model detailed functional causation among subsystems and components/equipment; and (2) it uses parametric CCF approaches to quantify dependency at the component/equipment level. The integration of the causal chain with fault trees addresses functional dependency among subsystems in the causal chain induced by supporting components/equipment; for instance, both subsystems A and B require input from the shared components/equipment. The treatment of functional dependency is implemented by the reduction of Boolean logic. The parametric CCF approaches treat dependencies among the components/equipment in each minimal cut set in a fault tree. The parametric CCF analysis in PRA is conducted in three phases.¹² In the first phase, a screening analysis is conducted to identify all the potential CCF vulnerabilities in the system being analyzed and to generate a list of the component/equipment groups within the system whose CCF events can contribute significantly to the system risk. The purpose of the screening analysis is to narrow the scope of the detailed analysis (in the second and third phases) to reduce the burden of analysis while ensuring a reasonable level of accuracy in the estimated risk. The potential CCF vulnerabilities with insignificant risk contribution are screened out and, in the subsequent phases, only the remaining component/equipment groups are further analyzed. In the second phase, a detailed qualitative analysis is conducted to understand the system-specific CCF vulnerabilities and defenses by reviewing detailed system characteristics, such as design, operation, environmental conditions, and maintenance practices. In the third phase, based on the results from the first and second phases, the CCF probabilities are quantified by (1) detailed logic modeling through the extension of the fault trees, (2) CCF probability quantification using the parametric models, and (3) CCF event data analysis.

TARAM would benefit from having a model-based approach to treat both CCFs and functional dependencies. Regarding the treatment of functional dependencies, for instance, the integration of 14 CFR 25.1309 fault trees with TARAM causal chain (converted to Event Trees) would help. As part of the Safety Assessment Process in

___________________

¹¹ Federal Aviation Administration, 2011, Transport Airplane Risk Assessment Methodology (TARAM) Handbook, PS-ANM-25-05, Washington, DC: Transport Airplane Directorate ANM-100, https://rgl.faa.gov/Regulatory_and_Guidance_Library/rgPolicy.nsf/0/4E5AE8707164674A862579510061F96B?OpenDocument&Highlight=ps-anm-25-05.

¹² U.S. Nuclear Regulation Commission, 2011, “Systems Analysis Programs for Hands-on Integrated Reliability Evaluations (SAPHIRE) Version 8, NUREG/CR-7039,” Washington, DC: Office of Nuclear Research.

Page 32 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

support of the Design Certificate analysis, Common Cause Failure Analysis (CCFA) is “qualitatively” conducted.¹³ This means that the CCFA in the Design Certificate focuses on understanding potential CCFs qualitatively, identifying credible failure modes, and developing corrective action rather than quantifying the CCF probabilities and their impact on the airplane-level risk. Quantitative CCF analysis, performed under PRA, could be leveraged, evaluated, adjusted (if needed), and, when practical, be implemented in TARAM. Conducting quantitative CCF analysis would also need a CCF database to support the required input data, as explained in Chapter 4.

Finding: The current TARAM addresses dependencies by directly estimating conditional probabilities based on service data or engineering judgment. In the COS decision-making for transport airplanes, common cause failure is considered as one of the qualitative decision criteria. A model-based methodology to quantify the likelihood and consequence of dependencies among the subsystem- or component-level events is not utilized.

Recommendation 4: Within 6 months of receipt of this report, the Federal Aviation Administration should evaluate and document its approach to the use of quantitative common cause failure analysis, performed under probabilistic risk assessment, to determine its applicability for the continued operational safety process.

INCORPORATING HUMAN RELIABILITY ANALYSIS IN THE TARAM PROCESS

On the human side, recognition needs to be given to the fact that flight, cabin, and maintenance crew all play an important and interconnected role in maintaining safe operations. To ensure operational safety, specific actions undertaken by these crews are relied on; yet there is no mechanism inside TARAM for properly assessing the reliability of these crews in their appropriate contexts.

In other domains that are overseen by the U.S. NRC and the National Aeronautics and Space Administration (NASA), such as nuclear power production and space exploration, Human Reliability Analysis (HRA) methods have been adopted. Several variations of HRA methods have been created over the years; however, most of them shared the following common steps: (1) qualitative analysis to construct human action scenarios by identifying elementary tasks and their relationships to the human failure event considered in the risk model, typically using an HRA event tree; (2) analysis of the context of human action and the determination of possible failure modes; (3) calculation of human error probabilities for elementary tasks (the basic human error probabilities for elementary tasks are often established based on human performance data, such as simulator data); (4) a method for modifying the basic human error probabilities using performance-influencing factors to account for the differing contexts that have been shown to impact human behavior, for example—training, fatigue, and stress; and (5) a method for combining these elementary human error probabilities for each human action scenario to estimate the human failure event probability.

In the nuclear power plants domain, where most of the existing HRA methods originated, a common approach to classifying different HRA methods is to group them into generations by evaluating four aspects¹⁴: (1) chronology, which refers to the era in which the method was developed; (2) cognition, which refers to whether the method explicitly considers cognitive functions and mechanisms as part of its performance-influencing factors; (3) context, which refers to whether the method considers the environmental, situational, and organizational factors that could impact the human behavior; and (4) commission, which refers to the capability and focus of the method in modeling errors of commission (in addition to errors of omission).

Although there is always room for debate, first-generation HRA methods do not model cognition, context, and/or errors of commission. Examples of first-generation methods are the THERP, ASEP, SPAR-H, and HEART.¹⁵

___________________

¹³ Federal Aviation Administration, 2000, “Analysis Techniques,” Ch. 9 in FAA System Safety Handbook, Washington, DC, http://rapeutation.com/FAAChap9_1200.pdf.

¹⁴ R.L. Boring, R.E. Shirley, J.C. Joe, and D. Mandelli, 2014, Simulation and Non-Simulation Based Human Reliability Analysis Approaches, Idaho Falls, ID: Idaho National Laboratory.

¹⁵ THERP: Technique for Human Error Rate Prediction; ASEP: Accident Sequence Evaluation Program; SPAR-H: Standardized Plant Analysis Risk-Human; and HEART: Human Error Assessment and Reduction Technique.

Page 33 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

Second-generation HRA methods generally attempt to capture the cognition, context, and/or errors of commission aspects. Notable second-generation HRA methods include ATHEANA, CREAM, and MERMOS.¹⁶ These first- and second-generation HRA methods, however, rely significantly on static task analyses of human failure events and cannot capture the dynamic nature and implication of many important human actions, especially in contexts where the presence and evolution of harsh environmental conditions significantly impact the work processes and psychological states of humans that are likely to be present in many civil aviation scenarios. This limitation motivated the development of the so-called simulation-based HRA methods, which provide a dynamic basis for HRA modeling and quantification and are usually referred to as the third-generation HRA methods. Some notable simulation-based HRA methods include ADS-IDAC, MIDAS, and HUNTER.¹⁷ Apart from these generational categories, there exist other HRA methods that rely more on expert judgment for evaluating human error likelihood in a specific operational context, such as the SLIM-MAUD and its failure-centric counterpart FLIM.¹⁸ Summaries of these HRA methods and a discussion on good practices of HRA in the nuclear industry were provided by the U.S. NRC through their NUREG-1842¹⁹ and NUREG-2127.²⁰ In space exploration, NASA experts also provided their guidance in a technical report²¹ on the selection of HRA methods that can support PRA.

Some of the first- and second-generation HRA methods have been leveraged for civil aviation studies, for instance, THERP,²² HEART,^23,24 SPAR-H,²⁵ ATHEANA,²⁶ and CREAM.²⁷ Two potential deficiencies when applying or leveraging first- or second-generation HRA methods are that (1) these methods are not capable of capturing organizational factors in design, manufacturing, operation and maintenance, or logistics activities that could very well affect the performance of flight and maintenance crews and air traffic controllers; and (2) these methods are not capable of capturing the dynamics of human actions in quantifying human error.

There have been some efforts to address the first deficiency—that is, the lack of models to account for organizational factors in human performance analysis in civil aviation. For instance, the Human Factors Analysis and Classification System (HFACS) was developed by Wiegmann and Shappell to classify human errors and the associated causal factors in aviation accidents and mishaps.²⁸ This method qualitatively models latent and active human errors by considering organizational influences, unsafe supervision, unsafe acts, and factors that impact the operator’s mental and physical behavior (i.e., preconditions of the unsafe operator acts). In the HFACS method, latent errors refer to those of designers and managers, while active errors refer to those of operators while inter-

___________________

¹⁶ ATHEANA: A Technique for Human Error Analysis; CREAM: Cognitive Reliability and Error Analysis Method; and MERMOS: Method d’Evaluation de la Realisation des Missions Operateur pour la Surete.

¹⁷ ADS-DIAC: Accident Dynamics Simulator-Information Decision and Action in Crew; MIDAS: Man-Machine Integration Design and Analysis System; and HUNTER: Human Unimodel for Nuclear Technology to Enhance Reliability.

¹⁸ SLIM-MAUD: Success Likelihood Index Methodology, Multi-Attribute Utility Decomposition; and FLIM: Failure Likelihood Index Methodology.

¹⁹ U.S. Nuclear Regulatory Commission, 2006, Evaluation of Human Reliability Analysis Methods Against Good Practices, NUREG-1842, Washington, DC: Office of Nuclear Regulatory Research.

²⁰ U.S. Nuclear Regulatory Commission, 2014, The International HRA Empirical Study: Lessons Learned from Comparing HRA Methods Predictions to HAMMLAB Simulator Data, NUREG-2127, Washington, DC.

²¹ F. Chandler, J. Chang, A. Mosleh, J. Marble, R. Boring, and D. Gertman, 2006, Human Reliability Analysis Methods: Selection Guidance for NASA, Washington, DC: NASA Headquarters Office of Safety and Mission Assurance.

²² N. Mitomo, A. Hashimoto, and K. Homma, 2015, “An Example of an Accident Analysis of Aircrafts Based on Human Reliability Analysis Method,” Pp. 1–5 in 2015 International Conference on Informatics, Electronics & Vision (ICIEV), IEEE.

²³ R. Maguire, 2005, “Validating a Process for Understanding Human Error Probabilities in Complex Human Computer Interfaces,” Complexity in Design and Engineering 313–326.

²⁴ Y. Guo and Y. Sun, 2020, “Flight Safety Assessment Based on an Integrated Human Reliability Quantification Approach,” PLOS One 15(4):e0231391.

²⁵ K. Burns and C. Bonaceto, 2020, “An Empirically Benchmarked Human Reliability Analysis of General Aviation,” Reliability Engineering & System Safety 194:106227.

²⁶ D. Miller and J. Forester, 2000, “Aviation Safety Human Reliability Analysis Method,” Albuquerque, NM: Sandia National Laboratories (SNL-NM).

²⁷ Y. Lin, X. Pan, and C. He, 2015, “Human Reliability Analysis in Carrier-Based Aircraft Recovery Procedure Based on CREAM,” Pp. 1–6 in 2015 First International Conference on Reliability Systems Engineering (ICRSE), IEEE.

²⁸ D.A. Wiegmann and S.A. Shappell, 2003, A Human Error Approach to Aviation Accident Analysis: The Human Factors Analysis and Classification System, Farnham, United Kingdom: Ashgate Publishing.

Page 34 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

acting with the complex system. HFACS was used by the FAA to examine and identify underlying causes of air traffic control operational errors.²⁹ Another effort, funded by the FAA, was a study conducted by Mohaghegh et al. (2019) to incorporate human and organizational factors, associated with airline maintenance quality, into quantitative aviation risk assessment.³⁰ This was done by integrating PRA (a combination of event sequence diagram and fault tree) with the System Dynamics and the Bayesian belief network methods to capture the dynamic effects of organizational factors on system risk. Such an approach for explicit modeling of in-depth causal factors (e.g., maintenance organizational factors underlying human performance–influencing factors) can provide more complete risk information and, thus, facilitate the identification and selection of corrective actions based on their impacts on system risk (e.g., the Control Program Fleet and Individual Risk). Later, Chen and Huang integrated a Bayesian network approach with the HFACS method to provide a quantitative analysis for human reliability in aviation maintenance.³¹ In this study, causal factors that affect the maintenance crew behavior were identified using HFACS, while the causal relationships were modeled with a Bayesian network. In general, the use of the Bayesian network approach in these studies allows for the integration of valuable judgments of subject-matter experts with historical and operational data.

Regarding the second deficiency—that is, the lack of models that can capture the dynamics of human actions in quantifying human error, there are methods created in other domains; for example, the U.S. NRC has recently developed the Integrated Human Event Analysis System for Event and Condition Assessment (IDHEAS-ECA) for HRA in support of risk-informed regulation.³² In IDHEAS-ECA, the basic human error probabilities are obtained considering five macro-cognitive functions (Detection, Understanding, Decision-making, Action execution, and Inter-team coordination) and can be modified based on the context of a specific action being analyzed using 20 performance-influencing factors. The human error probability quantification in IDHEAS-ECA is supported by human error data in IDHEAS-DATA,³³ where human performance data are collected and compiled from various data sources, such as simulator data and operator data from nuclear power plants, operational performance data from other domains (e.g., transportation, oil and gas, military operations, manufacturing), and experimental studies in academic literature). In modeling a human failure event, IDHEAS uses a Crew Response Diagram (CRD) that represents expected crew response paths along with the detailed timeline of critical responses to support the identification, analysis, and quantification of critical human tasks along the CRD. In this fashion, the human performance model in IDHEAS explicitly considers the temporal dimension of the critical human tasks. Research needs to be conducted to evaluate the feasibility of the existing HRA methods for the COS analysis and, if any of the existing ones can satisfy the needs, they can be adopted for TARAM; otherwise, an aviation-specific HRA may need to be developed for TARAM.

Finding: It is clear that at least three distinct sources of failure occur in modern commercial aviation: hardware failure, software fault, and human error. The interactions among these three further complicates the challenge for safety assessment. Each of these sources of failure having distinct characteristics, requires distinct measuring and modeling methods. Methods to study their combined effect are necessary to understand not only the primary but also the secondary, compound, or system-level risk. Assessment of the current TARAM methodology indicates that the modeling techniques for probabilistic assessment of human reliability and software reliability need to be aligned with current standards.

___________________

²⁹ Federal Aviation Administration, 2005, Examining ATC Operational Errors Using the Human Factors Analysis and Classification System, DOT/FAA/AM-05/25, Ft. Belvior, VA: Defense Technical Information Center.

³⁰ Z. Mohaghegh, R. Kazemi, and A. Mosleh, 2019, “Incorporating Organizational Factors into Probabilistic Risk Assessment (PRA) of Complex Socio-Technical Systems: A Hybrid Technique Formalization,” Reliability Engineering & System Safety 94(5):1000–1018.

³¹ W. Chen and S.P. Huang, 2013, “Human Reliability Analysis in Aviation Maintenance by a Bayesian Network Approach,” presented at 11th International Conference on Structural Safety and Reliability (ICOSSAR 2013), New York.

³² U.S. Nuclear Regulatory Commission, 2020, “Integrated Human Event Analysis System for Event and Condition Assessment (IDHEAS-ECA),” In RIL-2020-02, Washington, DC.

³³ U.S. Nuclear Regulatory Commission, 2020, “DRAFT—Integrated Human Event Analysis System for Human Reliability Data (IDHEAS-DATA),” In RIL-2021-XX, Washington, DC.

Page 35 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

Recommendation 5 addresses the human reliability aspects, mentioned in the above finding, while the software reliability aspects are discussed in the next section.

Recommendation 5: Within 18 months of receipt of this report, the Federal Aviation Administration should initiate and report on an effort to quantify the human performance of flight, maintenance, and cabin crews under the wide range of contexts experienced in civil aviation. This should be a broad-based effort including regulatory agencies, manufacturers, operators, and industry associations. The resultant data set of baseline human capabilities should be regularly maintained and be appropriate for a modern Human Reliability Analysis and used for continued operational safety analyses.

In a response letter from the FAA to the NTSB dated July 16, 2021, the FAA stated that it is forming an internal Human Factors and Flight crew Coordinating Group (HFFCG) in response to recommendations associated with the 737 MAX. The purpose of the HFFCG is to coordinate FAA activities associated with human factors–centric recommendations described in reports from the Boeing 737 MAX Flight Control System JATR, the DOT Special Committee to Review the FAA’s Aircraft Certification Process, NTSB Safety Recommendations A-19-13 through A-19-16, and the Aircraft Certification, Safety, and Accountability Act of 2020. In the letter, the FAA stated that the HFFCG will coordinate various activities, ensure that the FAA responds holistically to all recommendations, and minimize potential duplication of work. This group could also be responsible for the above recommended activity.

INCORPORATING SOFTWARE RELIABILITY ANALYSIS IN THE TARAM PROCESS

Until now, efforts to improve software reliability on commercial airplanes mainly centered around software fault-avoidance and fault-tolerant technologies.³⁴ These fault-avoidance technologies are common in software reliability engineering as they rely on a compliance with formal development guidelines, design requirements, and testing and validation procedures to reduce ambiguity, uncertainties, and potential software faults. Meanwhile, fault-tolerant technologies^35,36 often include (1) single-version methods that equip software with mechanisms to detect and recover from faults; and (2) multi-version methods that implement diversity measures (e.g., separate development teams, different algorithms, and different programming languages/tools) to defend against common cause software error.

The Safety Assessment Processes utilized for 14 CFR 25.1309 Type Certification compliance include a consideration of errors in the development of functions, software, and airborne electronic hardware (AEH). The process defined in SAE ARP4754A describes a methodology to determine the level of rigor—Development Assurance Level (DAL)—to apply to the development of functions, software, and AEH, based on the failure condition with which those elements are associated. These DALs guide the development process by increasing the rigor applied to the development based on the severity of the failure condition. As the severity increases, so does the rigor. These DALs are utilized in the structured software development process defined in RTCA/DO-178³⁷ and AEH in RTCA/DO-254.³⁸ While not quantitative demonstrations of software and AEH reliability, the DAL used for the development of these items can be used to demonstrate if the DALs of the item support an unsafe condition that the TARAM process may identify.

Characterization and quantification of software errors in the TARAM process need a probabilistic modeling approach to account for unavoidable uncertainties associated with the process and its variables.³⁹ TARAM, in its

___________________

³⁴ M.R. Lyu, 2007, “Software Reliability Engineering: A Roadmap,” Pp. 153–170 in Future of Software Engineering (FOSE ’07), IEEE.

³⁵ M.R. Lyu and X. Cai, 2007, “Fault-Tolerant Software,” Wiley Encyclopedia of Computer Science and Engineering, https://doi.org/10.1002/9780470050118.ecse154.

³⁶ M. Sghairi, A. De Bonneval, Y. Crouzet, J.-J. Aubert, and P. Brot, 2008, “Challenges in Building Fault-Tolerant Flight Control System for a Civil Aircraft,” IAENG International Journal of Computer Science 35(4).

³⁷ See Radio Technical Commission for Aeronautics, “DO-178C Training,” https://www.rtca.org/training/do-178c-training, accessed February 19, 2022.

³⁸ See Radio Technical Commission for Aeronautics, “DO-254 Training,” https://www.rtca.org/training/do-254-training, accessed February 19, 2022.

³⁹ U.S. Nuclear Regulatory Commission, 2009, Workshop on Philosophical Basis for Incorporating Software Failures into a Probabilistic Risk Assessment, Technical Report BNL-90571-2009-IR, Upton, NY: Brookhaven National Laboratory.

Page 36 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

current form, does not offer a documented approach to analyze software errors in a probabilistic manner, especially when the software is a source of latent failure, or the software contributes to progression of the scenarios after the occurrence of the condition under study (represented by CPs).

In other domains, there have been efforts to develop probabilistic models for software reliability to support risk assessment. For example, the U.S. NRC has been conducting research on the identification and development of methods, analytical tools, and regulatory guidance for probabilistically modeling the reliability of digital instrumentation and control systems and including them in PRAs of nuclear power plants. A review of available quantitative software reliability methods (QSRMs) was conducted,⁴⁰ where the existing methods are grouped into four major categories including software reliability growth methods, BBN methods, test-based methods, and other methods such as the Context-based Software Risk Model [CSRM]). The BBN and the test-based methods were eventually selected for further development. The BBN method can incorporate expert judgment and information about the software’s life cycle activities into the evaluation of safety-critical software. In addition, the BBN provides a mathematical framework for propagating epistemic uncertainties while calculating the software error probabilities. Meanwhile, the test-based method uses standard statistical methods with software testing and operating data (if available) and includes the treatment of parameter uncertainties. The two methods were then combined to develop a Bayesian updating algorithm in which a prior distribution of the software error probability is first developed via the BBN approach (or using a non-informative prior distribution), and the test-based method is then used to generate data needed for the Bayesian updating.⁴¹ To incorporate software reliability into the current PRA frameworks, software functions or components are modeled as events on the PRA model’s event trees and/or fault trees. The failure probabilities of these events, estimated by using methods such as the above-mentioned Bayesian updating algorithm, are then used for PRA quantification. In parallel to these efforts, the U.S. NRC also sponsored research to investigate the modeling of digital systems using dynamic PRA methods, as detailed in NUREG/CR-6901,⁴² NUREG/CR-6942,⁴³ and NUREG/CR-6985.⁴⁴

In the space exploration domain, NASA suggested using the CSRM method.⁴⁵ CSRM combines event tree and fault tree techniques of traditional PRA with an advanced modeling approach (e.g., the dynamic flowgraph methodology) to integrate the contributions of both hardware and software into an overall system risk model. With this design, CSRM is not specifically an approach to estimate the failure probability or failure rate of a particular software error mode and, therefore, other classical QSRMs or context-based, risk-informed testing could be relied upon for such estimation. CSRM targets logic errors triggered by off-normal system conditions, which are considered the dominant contributors to system risk from software errors yet are often overlooked by classical QSRMs.

The different methods developed by the U.S. NRC and NASA, however, still require further evaluation as they are facing a number of challenges: (1) the BBN methods require a substantial development effort and depend significantly on the expertise of the BBN developers, expert opinion, and availability and quality of software development documentation; (2) the test-based methods and any other QSRM that rely on test data (e.g., software reliability growth methods) require a large number of software tests and are susceptible to the uncertainty that the testing designs and conditions may not represent the actual environment in which the software is operated; (3) the

___________________

⁴⁰ U.S. Nuclear Regulatory Commission, 2010, “Review of Quantitative Software Reliability Methods,” Upton, NY: Brookhaven National Laboratory.

⁴¹ U.S. Nuclear Regulatory Commission, 2013, “Development of Quantitative Software Reliability Models for Digital Protection Systems of Nuclear Power Plants,” NUREG/CR-7044, Washington, DC: Office of Nuclear Regulatory Research.

⁴² See U.S. Nuclear Regulatory Commission, 2006, Current State of Reliability Modeling Methodologies for Digital Systems and Their Acceptance Criteria for Nuclear Power Plant Assessments, NUREG/CR-6901, Washington, DC: Office of Nuclear Regulatory Research, https://www.nrc.gov/docs/ML0608/ML060800179.pdf.

⁴³ See U.S. Nuclear Regulatory Commission, 2007, Dynamic Reliability Modeling of Digital Instrumentation and Control Systems for Nuclear Reactor Probabilistic Risk Assessments, NUREG/CR-6942, Washington, DC: Office of Nuclear Regulatory Research, https://www.nrc.gov/docs/ML0730/ML073030092.pdf.

⁴⁴ See U.S. Nuclear Regulatory Commission, 2009, A Benchmark Implementation of Two Dynamic Methodologies for the Reliability Modeling of Digital Instrumentation and Control Systems, NUREG/CR-6985, Washington, DC: Office of Nuclear Regulatory Research, https://www.nrc.gov/docs/ML0907/ML090750687.pdf.

⁴⁵ National Aeronautics and Space Administration, 2011, Probabilistic Risk Assessment Procedures Guide for NASA Managers and Practitioners, NASA/SP-2011-3421, Washington, DC: NASA Center for AeroSpace Information.

Page 37 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

CSRM approach also relies on context-based, risk-informed testing for scenarios that involve off-nominal conditions for which a substantial amount of time and resources would be needed; and (4) many software reliability growth methods rely on empirical formulas of the expected number of failures as a function of time, yet these assumed empirical formulas are not applicable for all situations. Owing to these limitations, further research is required to advance the existing QSRMs for the safety-critical applications.

In a relevant area of research and development, efforts in the nuclear industry have been initiated to address software-related technical challenges that emerge from the introduction of digital technologies (e.g., automation, digital instrumentation, and control). These technical challenges include but are not limited to (1) new potential software-based hazards/failures in critical safety and control functions, (2) common mode failure and common cause failure in software, and (3) increased complexity in human-software-hardware interactions leading to possible programming errors and incorrect outputs. While addressing the first two challenges requires software reliability analysis and its integration into a risk assessment framework, the third challenge falls under the umbrella of software trustworthiness evaluation. A line of research⁴⁶ has recently been initiated within the Department of Energy Light Water Reactor Sustainability Program Plant Modernization Pathway to develop a generic (instead of technology-specific) methodology to evaluate and improve automation trustworthiness. This methodology extends the scientific usage of epistemic uncertainty to generate sufficient evidence for verifying that the automation would be explainable, trustworthy, and operationally acceptable.

Finding: TARAM does not offer a documented approach to analyze software errors in a probabilistic manner, especially when the software is a source of latent failure, or the software contributes to progression of the scenarios after the occurrence of the condition under study (represented by CPs). In support of Recommendation 6, research needs to be conducted to evaluate the feasibility of the existing methods for the probabilistic assessment of software reliability in TARAM and, if any of the existing methods can satisfy the needs, they can be adopted for TARAM; otherwise, new methods/tools may need to be developed to analyze software reliability in support of the COS decision-making.

In the current TARAM, the risk outputs are calculated and presented in spreadsheets. When the scope of TARAM is expanded based on the recommendations in this report, the current spreadsheet format may not be practical in the light of timely analysis and decision-making. The computational tools that fit the practical needs in the COS analysis would need to be evaluated and, if any of the existing ones are relevant, they can be adopted for TARAM; otherwise, a new computational tool may need to be developed for TARAM leveraging the existing tools.

Recommendation 6: Within 18 months of receipt of this report, the Federal Aviation Administration should identify or develop and implement methods and computational tools that leverage 14 CFR 25.1309 (SAE ARP4761) compliance for use in conducting the in-service safety process. These methods and tools should take advantage of Development Assurance Level assessments of software/airborne electronic hardware, Fault Tree analysis, and other probabilistic risk assessment methodologies that support software reliability analyses.

INCORPORATING UNCERTAINTY ANALYSIS IN THE TARAM PROCESS

As stated in Chapter 4, the TARAM methodology needs to incorporate and make use of a formal uncertainty analysis. In PRA, uncertainty analysis typically consists of two elements: uncertainty quantification (UQ) and sensitivity analysis.⁴⁷ Sensitivity analysis looks at the deviations of the quantity of interest when the inputs are perturbed, typically one input variable at a time, by a small but fixed amount. In contrast, UQ quantifies the

___________________

⁴⁶ See U.S. Department of Energy, 2021, “Probabilistic Validation and Risk Importance Ranking Methodology for Automation Trustworthiness and Transparency in Nuclear Power Plants,” 2021 CFA Technical Abstract 21-24380, https://neup.inl.gov/SiteAssets/FY%202021%20Abstracts/CFA-21-24380_TechnicalAbstract_2021CFATechnicalAbstract21-24380.pdf.

⁴⁷ U.S. Nuclear Regulatory Commission, 2017, Guidance on the Treatment of Uncertainties Associated with PRAs in Risk-Informed Decisionmaking, NUREG-1855, Revision 1, Washington, DC.

Page 38 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

uncertainty for risk outputs induced by considering the uncertainties of all the inputs simultaneously, utilizing statistical measures, such as probability distributions. The use of UQ is needed because it provides a more holistic and realistic assessment of uncertainty. An end-to-end approach to UQ could be adopted, consisting of the three phases as described in Chapter 4: (1) identifying dominant sources of uncertainties, (2) characterizing each of the identified uncertainty sources, and (3) propagating the characterized uncertainties to the TARAM risk outputs and quantifying the aggregated uncertainty for the final risk estimate. Chapter 4 discussed the first two phases of the UQ process. In this section, two topics are discussed: (1) the third phase of UQ, uncertainty propagation and quantification of the final risk estimates, and (2) sensitivity analysis.

Sensitivity analysis examines how variations of inputs and models in a specific manner can alter the risk outputs. Sensitivity analysis helps quantify the robustness of the model to its inputs and determines how the uncertainty in the risk estimates can be attributed to different sources of uncertainty in the inputs. Knowing which input variables contribute to the risk helps determine an acceptable level of uncertainties in the input variables in order to control the uncertainty in the estimated risks. Thus, sensitivity analysis is a useful tool when the uncertainties in the input variables are well characterized. Sensitivities are also useful to understand the models. For example, it can be informative to know if most of the uncertainty is owing to a possible equipment failure or to a human action.

The TARAM Handbook and MSAD Order (FAA Order 8110.107A) provide no explicit guidance on uncertainty analysis or sensitivity analysis. The COS decision-making practice, documented in the Seattle ACO Branch Transport Airplane Safety Manual, includes no guidance on uncertainty analysis but provides a limited-scope sensitivity analysis to study the risk output change when any of the TARAM inputs are varied in a predefined manner. For instance, the current approach to calculate the peak individual flight risk for an issue under study in the constant failure rate analysis (Section 2.2 of the Seattle ACO Branch Transport Airplane Safety Manual), where each conditional probability is set to unity and the highest-risk case is presented to the CARB, can be considered one form of sensitivity analysis. Based on an FAA briefing to the committee regarding the Seattle ACO Transport Airplane Safety Manual, sensitivity analyses are sometimes conducted based on the analyst’s judgment regarding how the TARAM outputs could be influenced when each TARAM input (or modeling assumption) varied individually to a certain value or condition.

Uncertainty analysis in the current TARAM process has two limitations. First, sensitivity analysis is only executed at the analysts’ discretion, and the procedure is not documented. It has no guidelines for how to determine the range of input values and modeling assumptions to be examined or which sensitivity analysis methods to use.

Second, conducting sensitivity analysis only is not a substitute for uncertainty quantification. Varying each input or modeling assumption to a predefined discrete value in a one-at-a-time manner does not quantify the impact of interactions among multiple inputs and modeling assumptions on the TARAM risk outputs, possibly missing cases when the derived estimated risks exceed the risk guideline thresholds. UQ addresses this problem by varying all inputs simultaneously and quantifying the uncertainty in the final risk estimates. Monte Carlo simulations can provide a common and relatively straightforward method to propagate uncertainties and probabilistically quantify their aggregated impact on the risk outputs. The basic principle is to draw many samples from the distribution of each of the input parameters, calculate the implied risk for each sample, and thereby produce a distribution for the risks. Statistical properties of the distributions can be used to represent and communicate uncertainties.

Formal uncertainty analysis could also contribute to the validation of the TARAM risk outputs. A National Research Council report⁴⁸ highlighted uncertainty analysis as one of the principles in the validation of computational models. This report states that, in support of validation, the uncertainty in the model outputs “must be aggregated from uncertainties and errors introduced by many sources, including discrepancies in the mathematical model, numerical and code errors in the computational model, and uncertainties in model inputs and parameters.” The safety and risk analysis community takes a similar view in that the scope and quality of uncertainty analysis is an important aspect in assessing the level of maturity and validity of risk assessment and needs to be addressed as one of the criteria in an independent review for quality assurance.⁴⁹

___________________

⁴⁸ National Research Council, 2012, Assessing the Reliability of Complex Models: Mathematical and Statistical Foundations of Verification, Validation, and Uncertainty Quantification, Washington, DC: The National Academies Press.

⁴⁹ F. Goerlandt, N. Khakzad, and G. Reniers, 2017, “Validity and Validation of Safety-Related Quantitative Risk Analysis: A Review,” Safety Science 99:127–139.

Page 39 Cite

Suggested Citation:"5 Improvements to the TARAM Process." National Academies of Sciences, Engineering, and Medicine. 2022. Evaluation of the Transport Airplane Risk Assessment Methodology. Washington, DC: The National Academies Press. doi: 10.17226/26519.

×

Finding: The TARAM Handbook is silent regarding how uncertainties associated with TARAM inputs and models are analyzed. In the current practice of COS decision-making for transport airplanes, limited-scope sensitivity analyses are sometimes conducted, where individual inputs are varied to predefined discrete values (often representing the bounds of the possible input ranges) in a one-at-a-time manner.

Recommendation 7: Within 12 months of receipt of this report, the Federal Aviation Administration should establish and document guidance to account for the uncertainties associated with inputs and models used in the Transport Airplane Risk Assessment Methodology process. To the extent practical, quantitative uncertainty analysis should be adopted.