Skip to main content

Currently Skimming:

4. Statistical Design
Pages 59-75

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 59...
... BROAD PERSPECTIVE ON EXPERIMENTAL DESIGN OF OPERATIONAL TESTS Constrained Designs of ATEC Operational Tests ATEC has designed the IOT to be consistent with the following constraints: 1. Aside from statistical power calculations, little information on the performance of IBCT/Stryker or the baseline Light Infantry Brigade 59
From page 60...
... IMPROVED OPERATIONAL TESTING AND EVALUATION (LIB) from modeling or simulation, developmental testing, or the performance of similar systems is used to impact the allocation of test samples in the test design.
From page 61...
... Multiple Objectives of Operational Testing and Operational Test Design: Moving Beyond Statistical Significance as a Goal Operational test designs need to satisfy a number of objectives. Major defense systems are enormously complicated, with performances that can change in important ways as a result of changes in many factors of interest.
From page 62...
... The hope is that the other measures of interest will be related in some fashion to the primary measure, and therefore the test design to evaluate the primary measure will be reasonably effective in evaluating most of the remaining measures of interest. (If there are two measures of greatest interest, a design can be found that strikes a balance between the performance for the two measures.)
From page 63...
... However, when that is not the case, the objectives of learning about system performance, in addition to that of confirming improvement over a baseline, argue for additional sample size so that these additional objectives can be addressed. Therefore, rather than base sample size arguments solely on power calculations, the Army needs to allocate as much funding as various external constraints permit to support operational test design.
From page 64...
... In addition to test objectives, test designs are optimized using previous information on system performance, which are typically means and variances of performance measures for the system under test and for the baseline system. This is a catch-22 in that the better one is able to target the design based on estimates of these quantities, the less one would clearly need to test, because the results would be known.
From page 65...
... While this may be less clear for the specific scenarios under which IBCT/Stryker is being tested, allocating 42 scenarios to the baseline system may be inefficient compared with the allocation of greater test samples to IBCT/Stryker scenarios. Testing with Factors at High Stress Levels A general rule of thumb in test design is that testing at extremes is often more informative than testing at intermediate levels, because information from the extremes can often be used to estimate what would have happened at intermediate levels.
From page 66...
... Alternatives to One Large Operational Test In the National Research Council's 1998 report Statistics, Testing, and Defense Acquisition, two possibilities were suggested as alternatives to large operational tests: operational testing carried out in stages and small-scale pilot tests. In this section, we discuss how these ideas might be implemented by ATEC.
From page 67...
... COMMENTS ON THE CURRENT DESIGN IN THE CONTEXT OF CURRENT ATEC CONSTRAINTS Using the arguments developed above and referring to the current design of the operational test as described in Chapter 2 (and illustrated in Table 2-1) , the discussion that follows takes into account the following constraints of the current test design: Essentially no information about the performance of IBCT/Stryker or the baseline has been used to impact the allocation of test samples in the test design.
From page 68...
... The primary disadvantage of the current design is that there is a very strong chance that observed differences will be confounded by important sources of uncontrolled variation. The panel discussed one potential source of confounding in its October 2002 letter report (see Appendix A)
From page 69...
... In addition, ATEC designed the operational test for IBCT/Stryker to support comparisons relative to attrition at the company level. ATEC provided analyses to justify the assertion that the current test design has sufficient power to support some of these comparisons.
From page 70...
... Then ATEC modeled SME ratings scores for both IBCT/Stryker and the baseline using linear functions of the controlled variables from the test design. These linear functions were chosen to produce SME scores in the range between 1 and 8.
From page 71...
... ATEC's analysis argues that since 0.75 is larger than 0.40, the operational test will have sufficient statistical power to find a difference of 0.75 in SME ratings. The same argument was used to show that interaction effects that are estimated using test sample sizes of 18 or 12 would also have sufficient statistical power, but interaction effects that were estimated using test sample sizes of 6 or 4 would not have sufficient statistical power to identify SME ratings differences of 0.75.
From page 72...
... two real battalion commanders, each supported by one real company and two simulated companies and (b) one simulated battalion commander supported by three simulated companies.
From page 73...
... 2. ATEC should consider applying to future operational testing in general a two-phase test design that involves, first, learning phase studies that examine the test object under different conditions, thereby helping testers design further tests to elucidate areas of greatest uncertainty and importance, and, second, a phase involving confirmatory tests to address hypotheses concerning performance vis-a-vis a baseline system or in comparison with requirements.
From page 74...
... 5. ATEC shoulcl reconsider for the IBCT/Stryker their assumption concerning the distribution of SME scores and shoulcl estimate the residual standard errors clirectly, for example, by running a small pilot study to provide preliminary estimates; or, if that is too expensive, by identifying those SME score differences for which residual standard errors provide sufficient power at a number of test sample sizes, as a means of assessing the sensitivity of their analysis to the estimation of these standard errors.
From page 75...
... one simulated battalion commander supported by three simulated companies.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.