7
An Assessment of the Utility of NAOMS Data
This chapter continues with task 4 of the charge to the committee requiring that it “conduct an analysis of the NAOMS project survey data provided by NASA to determine its potential utility…. This analysis may include an assessment of the data’s validity using other known sources of information.” Issues on data quality, data validation, and estimation of rates and trends are discussed in this chapter.
7.1
DATA QUALITY
7.1.1
Anomalies
As noted in Chapter 6, NAOMS survey data in the Phase 2 release were not cleaned or modified (unlike those released in Phase 1). One can thus examine the quality of the raw data from the survey. The committee’s analysis found that a significant proportion of the nonzero numerical values were implausible, both for event counts (numerators) and for numbers of legs/hours flown (denominators). Selected examples are discussed below.
Table 7.1 shows the distributions of data values for the number of flight legs flown for all pilots who reported that they flew more than 60 flight legs in the 60day recall period. Data for 3 of the 4 years of the survey (2002 through 2004) are shown separately.^{1} Note the high numbers for flight legs flown, with responses as high as 300650 during the recall period in some cases. Even values in the 150200 range may be unlikely for a 60day recall period in an FAR Part 121 operation because of the limitations on pilot flying in regulations and operator scheduling policies. Further, the number of pilots who reported these numbers are not small (15 percent of the pilots reported having flown more than 150 hours). Table 7.2 shows the corresponding distributions for number of hours flown for all pilots who reported that they flew more than 150 hours. Again, note the implausibly high numbers for hours flown and their frequencies, including responses numbering as many as 400600 hours flown during the recall period.
An equally serious problem exists with event counts. Since many of these events, such as inflight engine failure, are rare (that is, most of the responses are zero, and only a small fraction of the pilots reported nonzero counts), it is clear that even a few anomalous values can completely distort the estimates of event rates.
TABLE 7.1 Distributions of the Number of Flight Legs Flown During the 60Day Recall Period, for the Years 20022004
TABLE 7.2 Distributions of the Number of Hours Flown During the 60Day Recall Period, for the Years 20022004
The committee’s analysis showed that the problem with anomalous values is common across many event types. Table 7.3 provides selected examples. For the event AH4 (“inadvertently landed without clearance at an airport with an active control tower”), a total of 541 events were reported by 161 pilots. Of these, 4 pilots reported 10, 20, 30, and 303 events (the last response corresponding to a pilot who flew between 46 and 70 hours and fewer than 14 legs during the recall period). These 4 pilots accounted for 363 (67%) of the 541 events reported. Table 7.3 shows several other examples with unusually high numbers of events reported. If the instances of such anomalies were limited to only a few event types, one might be able to investigate them in greater detail. Unfortunately, however, the problem was extensive, with one, and often several, implausible values for many event types.
There are at least two possible reasons for these anomalous values: (1) the pilots gave erroneous answers, or (2) errors were made during data entry. A verification question for high values for hours flown was included in the questionnaire, but the committee does not know if other dataaudit procedures were in place to flag implausible values reported by the respondents or entered into the database.
7.1.2
Rounding
Another characteristic common in the survey data was the rounding of the responses (raw data) by the respondents (pilots). A disproportionate number of observations were rounded, either to have zero or 5 as the last
TABLE 7.3 Examples of Implausibly High NonZero Counts for Events
Survey Question 
Total Number of Events 
Number of Pilots Who Reported at Least One Event 
Unusually High Numbers of Events as Reported by Individual Pilots^{a} 
AH4: Number of times respondent inadvertently landed without clearance at an airport with an active control tower. 
541 
161 
303, 30, 20, 10. Of the 161 pilots who reported a nonzero count, 4 pilots accounted for 363 of the 541 events (or 67% of the total). 
AH8: Number of times respondent experienced a tail strike on landing. 
80 
24 
30, 10(2), 9. Of the 24 pilots who reported a nonzero count, 4 pilots accounted for 59 of the 80 events (or 74% of the total). 
AH13: Number of times respondent experienced an unusual attitude for any reason. 
508 
450 
100, 60, 30, 12, 11. Of the 450 pilots who reported a nonzero count, 5 pilots accounted for 213 of the 508 events (or 42% of the total). 
ER6: Number of times an inflight aircraft experienced a precautionary engine shutdown. 
365 
215 
30(3), 20, 10, 9. Of the 215 pilots who reported a nonzero count, 6 pilots accounted for 129 of the 365 events (or 35% of the total). 
ER7: Number of times an inflight aircraft experienced a total engine failure. 
132 
82 
30, 10, 3(2). Of the 82 pilots who reported a nonzero count, 4 pilots accounted for 46 of the 82 events (or 56% of the total). 
GE5: Number of times respondent went off the edge of a runway while taking off or landing. 
350 
33 
90, 70, 30(4), 20, 10(2), 9(2). Of the 33 pilots who reported a nonzero count, 11 pilots accounted for 338 of the 350 events (or 97% of the total reported). 
GE9: Number of times respondent landed while another aircraft occupied or was crossing the same runway. 
928 
240 
100, 80, 50, 40(2), 20, 15, 12, 10(14). Of the 240 pilots who reported a nonzero count, 20 pilots accounted for 497 of the 928 events (or 54% of the total). 
^{a} Number in parentheses refers to number of pilots who reported that number of events. 
digit. Figure 7.1 shows an example for the number of hours flown in Category 2 (4670 hours) during the 4year period of the survey. Similar problems arose with the number of events reported. This type of rounding is common when respondents cannot recall the exact numbers. The committee did not conduct an extensive analysis to assess the magnitude of the rounding bias on the computed event rates. Nevertheless, the distribution of the numbers in Figure 7.1 suggests that it may be significant. This problem could have been alleviated in part by asking the respondent to retrieve his or her logbook to verify the answers. A request along these lines could have been included in the prenotification letter that was sent to the respondents.
Finding: There are several problems with the quality of NAOMS data:

Substantial fractions of the reported nonzero counts of events had implausibly large values, as did the reported flight legs and hours flown. Simple audits to alert for such values should have been used during the computerassisted telephone interviews and datacleaning steps to reduce the occurrence of these problems.

It appears that respondents often rounded their answers to convenient numbers (for example, there were unusually high occurrences of numbers with final digits of “0” and “5”).
The extent and magnitude of these problems raise serious concerns about the accuracy and reliability of the data. The development of appropriate strategies for handling some of these problems will require access to the unredacted data.
7.2
EXTERNAL DATA VALIDATION
7.2.1
Comparisons with Other Data Sources
One type of external validation involves comparing the attributes of the respondents in the sample to corresponding population data from other sources. For example, if the proportion of certain characteristics (distribution of aircraft or distribution of pilots by experience levels) is quite different from the proportion of the same characteristics in another reliable source, the survey results might not be representative. Table 4.1 in Chapter 4 shows that the distribution of the aircraft types in the NAOMS survey differed markedly from that in the BTS data, with the proportion of widebody aircraft being overrepresented in the NAOMS survey.
Similarly, if other data sources were available for event counts, these sources could be used for an external validation of the counts. NAOMS representatives indicated to the committee that they saw no point in asking questions to which answers could be obtained elsewhere. While this is valid point, a limited amount of redundancy is often included in surveys for the purposes of validation. The committee recognizes the potential for problems in comparing data across different sources (differences in contexts, in the way that data were collected, etc.), but such comparisons are often conducted in other surveys and have been extremely valuable.
7.2.2
Use of Logbooks
Another potential source of external validation is the use of respondents’ logbooks during the survey. The invitation letter requesting survey participation suggested that respondents have their logbooks readily available during the survey. However, the committee did not find information in the survey or other documents indicating
whether the respondents actually referred to their logbooks while answering their questions. The survey could have included a question on this matter, such as, “Are you using your logbook in providing responses during this survey?” This information would have been helpful in assessing the validity of the responses. The response to question D1 in the final section (“How confident are you that you accurately counted all of the safetyrelated events that I asked you about?”) provides a rough measure of a respondent’s confidence in the accuracy of the responses, but it is unclear how this information could be incorporated into the estimation process.
Finding: Limited comparison of NAOMS data with those of other sources indicates an overrepresentation of some groups and an underrepresentation of others. Such sampling biases must be addressed in the estimation of event rates and trends. More effort should have been spent on ensuring data accuracy at the interview stage, such as by asking respondents to refer to their logbooks. Preliminary analysis of the data would likely have raised these problems in time to modify the survey implementation accordingly.
7.3
ESTIMATION AND WEIGHTING
7.3.1
Overall Rates
Consider the estimation of a particular event type, k, during a given recall period, t, in the AC survey (the issues are similar for the GA survey). Let D_{kt} be the number of events of type k that were observed by all AC pilots during the recall period t. Similarly, let M_{t} be the total number of flight units (legs or hours as appropriate) flown by all AC pilots during the recall period t. Then, the true population rate for event k during period t is
(7.1)
For example, event k may refer to landings in an occupied runway in this example, and t may denote the time period January 1 through March 31, 2003. In this case, the appropriate denominator (flight units) is the number of flight legs; for other events, it may be the number of flight hours.
Let d_{kt} be the total number of events of type k that were observed in the sample of AC pilots during the recall period t. Similarly, let m_{t} be the total number of flight units (legs or hours, as appropriate) flown by all AC pilots in the sample during the recall period t. If the survey results are based on a simple random sample (or more generally, an equalprobability design), then the population ratio R_{kt} can be estimated by the corresponding sample ratio
(7.2)
The properties of this estimate and expressions for its variance under simple random sampling can be found in most textbooks on sample surveys.^{2}
However, there are several types of biases present in the NAOMS study that preclude the use of the simple estimate in Equation 7.2. Chapter 4 discussed various types of coverage biases. The overrepresentation of widebody aircraft and underrepresentation of smaller aircraft in the study were noted there. In addition, the sampling probabilities of flight legs varied with the number of pilots in the aircraft, and these unequal probabilities have to be accounted for when estimating event rates.
If there is sufficient information about the precise nature and the magnitude of these biases, it is possible that at least some of them can be accounted for by weighting the responses appropriately. For example, if one knew the unequal sampling probabilities for the flight legs due to the presence of multiple pilots in the aircraft, the responses could be weighted inversely by the sampling probabilities. There is extensive discussion of these methods in the sampling literature.^{3} However, this type of information must be documented during the planning and implementation stages of the study and does not appear to be available for the NAOMS survey.
For the unredacted data, event rates can be computed for a period as small as 2 months (the recall period). For the redacted data, the information is grouped into years, so periods of length of 1 year are the smallest periods for which rates can be calculated. As noted in Chapter 5, this level of categorization severely limits the usefulness of the data. It is difficult to detect any effects because of seasonal variations, shortterm effects of changes in aviation procedures on safety data, and other effects likely to be of interest for safetymonitoring purposes.
7.3.2
Rates by Subpopulations
In addition to the overall event rates in Equation 7.1, users of aviation safety data will also be interested in event rates for various subpopulations, such as rates by aircraft size or by pilot experience. Consider, for example, the event “landing on occupied runways,” and suppose that one wants to compare how the rate for this event varies by three subpopulations of pilot experience: low, medium, and high levels. Let D_{jkt} be the number of flights that landed in an occupied runway (event type k) during the recall period t by pilots with experience level j. Similarly, let M_{jt} be the total number of flights during the recall period t by pilots with experience level j. Then, the rate of interest is
(7.3)
Let d_{jkt} be the number of flights that landed in an occupied runway (event type k) that were observed in the sample of AC pilots during the recall period t. Further, let m_{jt} be the number of flights during the recall period t by pilots with experience level j in the survey. Then, if the survey results in a simple random sample of pilots and the full data are available, one can estimate the population ratio R_{jkt} by the sample ratio
(7.4)
However, it is not possible to estimate these rates from the redacted data, as the counts d_{jkt} and m_{jt} are not available for subpopulations. As noted for Equation 7.2, the estimates in Equation 7.4 are not valid when there are substantial biases, as appears to be the case with the NAOMS project. Since the nature and extent of the biases were not documented at the planning stage, it was not possible for the committee to examine the use of weighting or other adjustment methods to account for the biases.
Finding: The intended simple random sampling for the NAOMS study would have facilitated the easy computation of event rates. However, the final sample does not appear to be representative of the target population as indicated by the limited data analysis conducted by the committee. The survey sampling literature contains many approaches that attempt to address such coverage problems, but they require detailed information on how the survey was implemented, including the type and nature of problems that were experienced, and access to the original data.
7.3.3
Estimation of Trends
The most consistently articulated goal of the NAOMS project was to use survey data to learn about trends. Information on trends allows one to assess the effects of safety innovations on event rates. Preliminary analyses by the NAOMS team appear to indicate that the trends for a number of safety events were consistent over time. However, the committee did not conduct any analysis to verify the results, as it had access only to redacted data, in which the time variable was aggregated to full years.
It is important to recognize that event rate biases discussed thus far in this report would not affect trends to the extent that the biases are constant over time. For example, if any biases because of nonresponse were constant across years, those biases would cancel out in estimates of trends. However, some type of biases may not have been constant or may have drifted over the survey period. For example, as discussed in Chapter 5, the AC questionnaire included operations and events from a broad array of aviation industry segments. If the mix of these
operations changed over time, this would have caused biases in the trend estimates. In addition, biases associated with subjective assessments by pilots may have changed abruptly in response to external events such as those of September 11, 2001.
Finding: Many of the biases that affect the estimates of event rates may be mitigated for trend analysis to the extent that the biases remain relatively constant over time. However, the degree of mitigation might vary substantially across event types.
7.4
CONFIDENCE INTERVALS
The charge to the committee asked for specific recommendations on how to compute error bars for the estimates, or in statistical terminology, confidence intervals. The key information needed for computing a confidence interval is the variance of the estimated event rate. Under the equalprobability sampling scheme, the variance of the simple ratio estimate in Equation 7.2 can be computed easily.^{4} Given the variance estimate, a normal approximation is generally used to compute the confidence interval. Since these issues have been discussed extensively elsewhere, the committee will not repeat the details here.
The development of confidence intervals (error bars) for the NAOMS study faces the same difficulties that were discussed for the estimates in Section 7.3 and would require knowledge of the nature and extent of the biases that was not available to the committee. Without such information, the committee cannot provide recommendations that will be useful in this particular context.
7.5
SUMMARY
Careful planning of any statistical investigation, including surveys, involves the following steps: (1) the development of initial methods for data analysis and estimation, (2) the analysis of pilot data or early survey data to discover potential problems, and (3) the use of this information to refine the survey methodology. The committee was surprised by the apparent lack of these activities (or at least lack of documentation of these activities). The NAOMS team also did not conduct a formal analysis of the survey data as they became available. This was an especially serious oversight for a project with a research component in which one goal was to learn and to refine the ideas to improve the methodology. In the committee’s view, many of the problems that have been identified with the NAOMS survey might well have been detected and corrected if these aspects of the survey planning had been better executed.
Finding: The committee did not find any evidence that the NAOMS team had developed or documented data analysis plans or conducted preliminary analyses as initial data became available in order to identify early problems and refine the survey methodology. This is consistent with any wellconducted research study.
The final charge to the committee asks for recommendations regarding the most effective ways to use the NAOMS data. Because the committee did not have access to the unredacted data, a recommendation on this front, by necessity, only relates to the redacted, publicly available data.
As in any research study, a full description of the NAOMS project and the results of any analysis should be submitted for possible publication, which will involve a peer review. Because of the problems associated with analyzing the redacted data set, discussed in Chapter 6, the analysis would have to be based on the unredacted data and would need to address challenges such as treatment of data errors and potential effects of biases on trends.
However, because of the methodological and implementation problems cited in Chapters 4 and 5 as well as the difficulties associated with data analysis discussed in this chapter, the committee does not recommend using the publicly available data set of NAOMS to identify systemwide trends in the rates of safetyrelated events.