Fielded defense systems that fail to meet their reliability goals or requirements reduce the effectiveness and safety of the system and incur costs that generally require funds to be diverted from other defense needs. This is not a new problem, as readers of this report likely know. A synopsis of the relevant history is presented in the second section of this chapter. In recognition of this continuing problem, the U.S. Department of Defense (DoD) asked the National Research Council, through its Committee on National Statistics, to undertake a study on reliability.
DoD originally asked the Panel on Reliability Growth Methods for Defense Systems to provide an assessment only of the use of reliability growth models to address a portion of the problem. Reliability growth models are used to track the extent to which the reliability of a system in development is on a trajectory that is consistent with achieving the system requirement by the time of its anticipated promotion to full-rate production. However, the importance of the larger problem of the failure of defense systems to achieve required reliability levels resulted in the broadening of the panel’s charge. The sponsor and the panel recognized that reliability growth is more than a set of statistical models applied to testing histories. Reliability is grown through development of reasonable requirements, through design, through engineering, and through testing. Thus, DoD broadened its charge to the panel to include recommendations
for procedures and techniques to improve system reliability during the acquisition process:1
The present project on reliability growth methods is the latest in a series of studies at the National Research Council (NRC) on improving the statistical methods used in defense systems development and acquisition. It features a public workshop with an agenda that will explore ways in which reliability growth processes (including design, testing, and management activities) and dedicated analysis models (including estimation, tracking, and prediction methodologies) can be used to improve the development and operational performance of defense systems. Through invited presentations and discussion, the workshop will characterize commonly used and potentially applicable reliability growth methods for their suitability to defense acquisition. The scope of the workshop and list of program participants will be developed by an expert ad hoc panel that will also write the final report that summarizes the findings of the workshop and provides recommendations to the U.S. Department of Defense.
In response, the panel examined the full process of design, testing, and analysis. We began with the requested workshop that featured DoD, contractor, academic, and commercial perspectives on the issue of reliability growth methods; see Appendix B for the workshop agenda and list of participants. And, as noted in the charge, this report builds on previous work by the NRC’s Committee on National Statistics.
The procedures and techniques that can be applied during system design and development include system design techniques that explicitly address reliability and testing focused on reliability improvement. We also consider when and how reliability growth models can be used to assess and track system reliability during development and in the field. In addition, given the broad mandate from the sponsor, we examined four other topics:
- the process by which reliability requirements are developed
- the contents of acquisition requests for proposals that are relevant to reliability
- the contents of the resulting proposals that are relevant to reliability, and
- the contents of acquisition contracts that are relevant to reliability.
Broadly stated, we argue throughout this report that DoD has to give higher priority to the development of reliable systems throughout all phases of the system acquisition process. This does not necessarily mean additional
1 For a list of the findings and recommendations from the previous reports noted in the first sentence of the charge, see Appendix A.
funds, because in many cases what is paid for up front to improve reliability can be recovered multiple times by reducing system life-cycle costs. This latter point is supported by a U.S. Government Accountability Office report (2008, p. 7), which found that many programs had encountered substantial reliability problems in development or after initial fielding:
Although major defense contractors have adopted commercial quality standards in recent years, quality and reliability problems persist in DOD weapon systems. Of the 11 weapon systems GAO reviewed, these problems have resulted in billions of dollars in cost overruns, years of schedule delays, and reduced weapon system availability. Prime contractors’ poor systems engineering practices related to requirements analysis, design, and testing were key contributors to these quality problems.
The report further noted (U.S. Government Accountability Office, 2008, p. 19):
[I]n DOD’s environment, reliability is not usually emphasized when a program begins, which forces the department to fund more costly redesign or retrofit activities when reliability problems surface later in development or after a system is fielded. The F-22A program illustrates this point. Because DOD as the customer assumed most of the financial risk on the program, it made the decision that system development resources primarily should be focused on requirements other than reliability, leading to costly quality problems. After seven years in production, the Air Force had to budget an additional unplanned $400 million for the F-22A to address numerous quality problems and help the system achieve its baseline reliability requirements.
The magnitude of the problem in achieving reliability requirements was described at the panel’s workshop by Michael Gilmore, the Director of Operational Test and Evaluation (DOT&E), and by Frank Kendall, Acting Under Secretary of Defense for Acquisition, Technology, and Logistics (USD AT&L). Since 1985, 30 percent of 170 systems under the purview of DOT&E had been reported as not having demonstrated their reliability requirements. A separate review by DOT&E in fiscal 2011 found that 9 of 15 systems, 60 percent, failed to meet their reliability thresholds. Figures for the preceding 3 years, 2008-2011, as documented in DOT&E Annual Reports, were 46 percent, 34 percent, and 25 percent, respectively.
It should be noted that failure to meet a reliability requirement does not necessarily result in a system’s not being promoted to full-rate produc-
tion. The DOT&E 2011 Annual Report summarizes operational reliability and operational suitability results for 52 system evaluations that DOT&E provided to Congress for 2006 to 2011: a full 50 percent of the systems failed to meet their reliability threshold, and 30 percent of the systems were judged to be unsuitable. However, none of the 52 systems was cancelled.
DoD estimates that 41 percent of operational tests for acquisition category I (ACAT I) defense systems met reliability requirements between 1985 and 1990, while only 20 percent of such tests met reliability requirements between 1996 and 2000.2 For Army systems alone, the Defense Science Board report on developmental test and evaluation (U.S. Department of Defense, 2008a) plotted estimated reliabilities in comparison with requirements for all operational tests of ACAT I Army systems between 1997 and 2006: only one-third of the systems met their reliability requirements. The Defense Science Board also found substantial declines in the percentage of Navy systems meeting their reliability requirements from 1999 to 2007.3 These plots strongly indicate that there was an increasing problem in DoD regarding the ability to achieve reliability requirements in recent defense acquisition programs, especially for the higher-priced systems (those in which the Office of the Secretary of Defense is obligated to become involved) between 1996 and 2007.
At the workshop, Kendall stressed the importance of reliability engineering to address DoD’s acknowledged reliability deficiency. He said that the department has not brought sufficient expertise in systems engineering to bear on defense acquisition for many years, and, as a result, many defense systems that have been recently deployed have not attained their anticipated level of reliability either in operational testing or when fielded. He said that this problem became very serious during the mid-1990s, when various DoD acquisition reform policies were instituted: they resulted in the elimination of sets of military standards; the relinquishment by the Office of the Secretary of Defense (OSD) of a role in overseeing quality control, systems engineering, reliability, and developmental testing; and associated severe cuts in staffing in service and program offices.
In a presentation to the International Test and Evaluation Association on January 15, 2009, Charles McQueary, then director of DOT&E, said that DOT&E needed to become more vigilant in improving the reliability of defense systems: for example, in 2008 two of six ACAT I systems in
2 U.S. Department of Defense (2005, pp. 1-4).
3 U.S. Department of Defense (2008a, pp. 3, 18)
operational testing were found not suitable.4 This was a particularly serious issue because sustainment costs, which are largely driven by reliability, represent the largest fraction of system life-cycle costs. Also, as systems are developed to remain in service longer, the importance of sustainment costs only increases.
McQueary stressed that small investments in improving reliability could substantially reduce life-cycle costs. He provided two specific examples: two Seahawk helicopters, the HH-60H and the MH-60S. For one, an increased reliability of 2.4 hours mean time to failure would have saved $592.3 million in the estimated 20-year life-cycle costs; for the other, an increased reliability of 3.6 hours mean time to failure would have saved $107.2 million in the estimated 20-year life-cycle costs.5 (A good analysis of the budgetary argument can be found in Long et al., 2007.)
Several years ago, the Defense Science Board issued a report with a number of findings and recommendations to address reliability deficiencies. It included the following key recommendation (U.S. Department of Defense, 2008a, pp. 23-24):
The single most important step necessary to correct high suitability failure rates is to ensure programs are formulated to execute a viable systems engineering strategy from the beginning, including a robust RAM [reliability, availability, and maintainability] program, as an integral part of design and development. No amount of testing will compensate for deficiencies in RAM program formulation [emphasis added].
In other words, it is necessary to focus on system engineering techniques to design in as much reliability as possible at the initial stage of development. The result of inadequate initial design work is often late-stage adjustments of system design, and such redesigning of a system to address reliability deficiencies after a design is relatively fixed is more expensive than addressing reliability during the initial stages of system design. Moreover, late-stage design changes can often result in the introduction of other problems in a system’s development.
The report of the Defense Science Board (U.S. Department of Defense, 2008a, p. 27) also contained the following finding:
4 ACAT I—acquisition category I programs—are those for major defense acquisitions. They are defined by USD AT&L as requiring eventual expenditures for research, development, testing, and evaluation of more than $365 million (in fiscal 2000 constant dollars); requiring procurement expenditures of more than $2.19 billion (in fiscal 2000 constant dollars); or are designated as of high priority.
5 See Chapter 3 for a discussion of mean time to failure.
The aggregate lack of process guidance due to the elimination of specifications and standards, massive workforce reductions in acquisition and test personnel, acquisition process changes, as well as the high retirement rate of the most experienced technical and managerial personnel in government and industry has a major negative impact on DoD’s ability to successfully execute increasingly complex acquisition programs.
Over the past 5 years, DoD has become more responsive to the failure of many defense systems to meet their reliability requirements during development. In response, the department has produced or modified a number of its guidances, handbooks, directives, and related documents to try to change existing practices. These documents support the use of more up-front reliability engineering, more comprehensive developmental testing focused on reliability growth, and greater use of reliability growth modeling for planning and other purposes.6
Two important recent documents are DTM-11-003 (whose improvements have been incorporated into the most recent version of DoDI 5000.02)7 and ANSI/GEIA-STD-00098 (for details, see Appendix C). Although ANSI/GEIA-STD-0009 does not have an obligatory role in defense acquisition, it can be agreed upon by the acquisition program manager and the contractor as a standard to be used for the development of reliable defense systems.
We are generally supportive of the contents of both of these documents. They help to produce defense systems that have more reasonable reliability requirements and that are more likely to meet these requirements through design and development. However, these documents were designed to be relatively general, and they are not intended to provide details regarding specific techniques and tools or for engineering a system or component for high reliability. Nor do they mandate the methods or tools a developer would use to implement the process requirements. The tailoring will depend on a “customer’s funding profile, developer’s internal policies and procedures and negotiations between the customer and developer” (ANSI/GEIA-STD-0009, p. 2).9 Proposals are to include a reliability program plan, a conceptual reliability model, an initial reliability flow-down of require-
6 For a discussion of some of these documents, see Appendix C.
7 A DTM (Directive-Type Memo) is a memorandum issued by the Secretary of Defense, Deputy Secretary of Defense, or OSD principal staff assistants that cannot be published in the DoD Directives System because of lack of time to meet the requirements for implementing policy documents.
8 While not a DoD standard, ANSI/GEIA-STD-0009, “Reliability Program Standard for Systems Design, Development, and Manufacturing,” was adopted for use by DoD in 2009.
9 Available: http://www.techstreet.com/publishers/285174?sid=msn&utm_source=bing&utm_medium=cpc [August 2014].
ments, an initial system reliability assessment, candidate reliability trade studies, and a reliability requirements verification strategy.
But there is no indication of how these activities are to be carried out. For example, how should one produce the initial reliability assessment for a system when it only exists in diagrams? What should design for reliability entail, and how should it be carried out for different types of systems? How can one determine whether a test plan is adequate to take a system with a given initial reliability and improve that system’s reliability to the required level through test-analyze-and-fix? How should reliability be tracked over time, with a relatively small number of developmental or operationally relevant test events? How does one know when a prototype for a system is ready for an operational test?
A handbook has been produced with one goal of providing more operational specificity,10 but it understandably does not cover all the questions and possibilities. We believe that it would be worthwhile for an external group to assist in the provision of some additional specificity as to how some of these steps should be carried out.
Given the lengthy development time of ACAT I systems, the impact of the introduction of these new guidances and standards and memoranda, especially the changes to DODI 5000.02 (due to DTM 11-003) and ANSI-GEIA-STD-0009, will not be known for some time. However, we expect that adherence to these documents will have very positive effects on defense system reliability. We generally support the many recent changes by OSD. In this report, we offer analysis and recommendations that build on those changes, detailing the engineering and statistical issues that still need to be addressed.
The assessment of defense systems is typically separated into two general operational assessments: the assessment of system effectiveness and the assessment of system suitability:
- Operational effectiveness is the “overall degree of mission accomplishment of a system when used by representative personnel in the environment planned or expected for operational employment of the system considering organization, doctrine, tactics, survivability or operational security, vulnerability, and threat.” (U.S. Department of Defense, 2013a, p. 749).
- Operational suitability is “the degree in which a system satisfactorily places in field use, with consideration given to reliability, availability, compatibility, transportability, interoperability, wartime usage ranges, maintainability, safety, human factors, manpower, supportability, logistics supportability, documentation, environmental effects, and training requirements” (U.S. Department of Defense, 2013a, pp. 749-750).
Essentially, system effectiveness is whether a system can accomplish its intended missions when everything is fully functional, and system suitability is the extent to which, when needed, the system is fully functional.
Reliability is defined as “the ability of a system and its parts to perform their mission without failure, degradation, or demand on the support system under a prescribed set of conditions” (U.S. Department of Defense, 2012, p. 212). It can be measured in a number of different ways depending on the type of system (continuously operating or one-shot systems), whether a system is repairable or not, whether repairs return the system to “good as new,” and how a system’s reliability changes over time. For continuously operating systems that are not repairable, a common DoD metric is mean time to failure (see Chapter 3).
The evaluation of system suitability often involves two other components, availability and maintainability. Availability is “the degree to which an item is in an operable state and can be committed at the start of a mission when the mission is called for at an unknown (random) point in time” (U.S. Department of Defense, 2012, p. 214). Maintainability is “the ability of an item to be retained in, or restored to, a specified condition when maintenance is performed by personnel having specified skill levels, using prescribed procedures and resources, at each prescribed level of maintenance and repair” (U.S. Department of Defense, 2012, p. 215). Most of this report concentrates on development and assessment of system reliability, though some of what we discuss has implications for the assessment of system availability. The three components of suitability, reliability, availability, and maintainability, are sometimes referred to as RAM.
Ensuring system effectiveness does and should take precedence over concerns about system suitability. After all, if a system cannot carry out its intended mission even when it is fully functional, then certainly the degree to which the system is fully functional is much less important. However, this subordination of system suitability has been overdone. Until recently, DoD has focused the great majority of its design, testing, and evaluation efforts on system effectiveness, under the assumption that reliability (suitability) problems can be addressed either later in development or after initial fielding (see, e.g., U.S. Government Accountability Office, 2008).
The production of reliable defense systems begins, in a sense, with the setting of reliability requirements, which are expected to be (1) necessary for successful completion of the anticipated missions, (2) technically attainable, (3) testable, and (4) associated with reasonable life-cycle costs. After a contract is awarded, DoD has to devote adequate funds, oversight, and possibly additional time in development and testing to support and oversee both the application of reliability engineering techniques at the design stage and testing focused on reliability during contractor and government testing. These steps greatly increase the chances that the final system will satisfy its reliability requirements. The reliability engineering techniques that are currently used in industry to produce a system design consistent with a reliable system prior to reliability testing are referred to collectively as “design for reliability” (see Chapters 2 and 5). After the initial design stage, various types of testing are used to improve the initial design and to assess system reliability. A set of models are used throughout this development process to help oversee and guide the application of testing, and are commonly referred to as reliability growth models (see Chapter 4).
After the design stage, defense systems go through three phases of testing. “Contractor testing” is a catchall term for all testing that a contractor conducts during design and development, prior to delivery of prototypes to DoD. Contractor testing is initially component- and subsystem-level testing of various kinds. Some is in isolation, some is with representation of interfaces and interoperability, some is under laboratory conditions, some is under more realistic operating conditions, and some is under accelerated stresses and loads. Contractor testing also includes full-system testing after testing of components and subsystems: the final versions of these tests should attempt to emulate the structure of DoD operational testing so that the prototypes have a high probability of passing operational tests. There are obvious situations in which the degree to which the contractor’s testing can approximate operational testing is limited, including some aircraft and ship testing.
Contractor testing, at least in the initial phase, ends with the delivery of prototypes to DoD for its own testing. The first phase of DoD testing is developmental testing, which is often initially focused on component- and subsystem-level testing; later, it focuses on full-system testing. There can be many respects in which developmental testing does not fully represent operational use. First, developmental testing is generally fully scripted, that is, the sequence of events and actions of the friendly and enemy forces, if represented, is generally known in advance by the system operators. Also, developmental testing does not often involve typical users as operators or typical maintainers. Furthermore, developmental testing often fails to fully
represent the activities of enemy systems and countermeasures. However, in some situations, developmental testing can be more stressful than would be experienced in operational use, most notably in accelerated testing. The full process of government developmental testing is often conducted over a number of years.
After developmental testing, DoD carries out operational testing. Initial operational testing, which is generally a relatively short test of only a few months duration, is full-system testing under as realistic a set of operational conditions as can be produced given safety, noise, environmental, and related constraints. Operational testing is much less scripted than developmental testing and uses typical maintainers and operators. It is used as a means of determining which systems are ready to be promoted to full-rate production, i.e., deployed. Toward this end, measurements of key performance parameters collected during operational testing are compared with the associated requirements for effectiveness and suitability, with those systems successfully meeting their requirements being promoted to the field.
Ideally, developmental testing would have identified the great majority of causes of reliability deficiency prior to operational testing, so that any needed design changes would have been recognized prior to full specification of the system design. Such recognition would have resulted in design changes that would be less expensive to implement at that stage than later. Furthermore, because operational testing is not well designed to discover many reliability deficiencies because of its fairly limited time frame, it should not be depended on to capture a large number of such problems.
Moreover, because developmental testing often does not stress a system with the full spectrum of operational stresses, it often fails to discover many design deficiencies, some of which then surface during operational testing. This failure could also be due, at times, to changes in the data collection techniques and estimation methodology in the testing. There is no requirement for consistent failure definitions and scoring criteria across developmental and operational testing. In fact, as Paul Ellner11 described at the panel’s workshop, for a substantial percentage of defense systems, reliability as assessed in operational testing is substantially lower than reliability of the same system as assessed in developmental testing. This difference is often not explicitly accounted for in assessing which systems are on track to meet their requirements: this lack of recognition of the difference may in turn account for the failure of many systems to meet their reliability requirements in operational testing after being judged as making good progress toward the reliability requirement in developmental testing and evaluation.
11 Presentation to the panel at its September 2011 workshop.
ACAT I defense systems, the systems that provide the focus of this report, are complicated.12 They can generally be represented as systems of systems, involving multiple hardware and software subsystems, each of which may comprise many components. The hardware subsystems sometimes involve complicated electronics, and the software subsystems often involve millions of lines of code, all with interfaces that need to support the integration and interoperability of these components, and all at times operating under very stressful conditions.
While defense systems are growing increasingly complex, producing reliable systems is not an insurmountable challenge, and much progress can be made through the use of best industrial practices.
There are, however, important differences between defense acquisition and industrial system development (see Chapter 2). For instance, defense acquisition involves a number of “actors” with somewhat different incentives, including the contractor, the program manager, testers, and users, which can affect the degree of collaboration between DoD and the contractor. Furthermore, DoD assumes the great majority of risk of development, which is handled in the private sector through the use of warranties and other incentives and penalties. Acknowledgment of these distinctions has implications as to when and how to best apply design for reliability, reliability testing, and formal reliability growth modeling.
In this report, we examine the applicability of industrial practices to DoD, we assess the appropriateness of recent reliability enhancement initiatives undertaken by DoD, and we recommend further modifications to current DoD acquisition processes.
As noted, in addition to the use of existing design for reliability and reliability testing techniques, we were asked to review the current role of formal reliability growth models. These models are used to plan reliability testing budgets and schedules, track progress toward attaining requirements, and predict when various reliability levels will be attained. Reliability growth is a result of design changes or corrective actions resulting, respectively, from engineering analysis or from correction of a defect that becomes apparent from testing. Often included in reliability growth modeling are fix effectiveness factors, which estimate the degree to which design changes are fully successful in eliminating a reliability failure mode. Formal reliability growth models are strongly dependent on often unvalidated assumptions about the linkage between time on test and the discovery of reliability defects and failure modes. The impact of relying on these unvali-
12 Though ACAT I systems are a focus, the findings and recommendations contained here are very generally applicable.
dated assumptions can result in poor inferences through use of the predicted values and other model output. Therefore, the panel was asked to examine the proper use of these models in the three areas of application listed above.
The goal of this report is to provide advice as to the engineering, testing, and management practices that can improve defense system reliability by promoting better initial designs and enhancing prospects for reliability growth through testing as the system advances through development. We include consideration of the role of formal reliability growth modeling in assisting in the application of testing for reliability growth.
There is a wide variety of defense systems and there are many different approaches toward the development of these systems in several respects. Clearly, effective methods differ for different kinds of systems, and therefore, there can be no recommended practices for general use. Furthermore, while there is general agreement that there have been problems over the past two decades with defense system reliability, even during this period there were defense systems that used state-of-the-art reliability design and testing methods, which resulted in defense systems that met and even exceeded their reliability requirements. The problem is not that the methods that we are describing are foreign to defense acquisition: rather, they have not been consistently used, or have been planned for use but then cut for budget and schedule considerations.
The remainder of this report is comprised of nine chapters and five appendixes. Chapter 2 reports on the panel’s workshop, which focused on reliability practices in the commercial sector and their applicability to defense acquisition. Chapter 3 discusses different reliability metrics that are appropriate for different types of defense systems. Chapter 4 discusses the appropriate methods and uses for formal reliability growth modeling. Chapter 5 covers the tools and techniques of design for reliability. Chapter 6 documents the tools and techniques of reliability growth testing. Chapter 7 discusses the design and evaluation of reliability growth testing relevant to developmental testing. Chapter 8 details the design and evaluation of reliability growth testing relevant to operational testing. Chapter 9 covers software reliability methods. Chapter 10 presents the panel’s recommendations. Appendix A lists the recommendations of previous reports of the Commitee on National Statsitics that are relevant to this one. Appendix B provides the agenda for the panel’s workshop. Appendix C describes recent changes in DoD formal documents in support of reliability growth. Appendix D provides a critique of MIL HDBK 217, a defense handbook that provides information on the reliability of electronic components. Finally, Appendix E provides biographical sketches of the panel members and staff.