Skip to main content

Currently Skimming:

Generalizability of military Performance Measurements: I. Individual Performance
Pages 207-257

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 207...
... Two additional, pernicious sources of error are inaccuracies due to scoring, where observers typically score performance in real time, and inaccuracies The author gratefully acknowledges helpful and provocative comments provided by Lee Cronbach and the graduate students in his seminar on generalizability theory. The author alone is responsible for the contents of this paper.
From page 208...
... The fifth section concludes the paper by discussing some limitations of the theory. APPLICATION OF GENERALIZABILITY THEORY TO THE MEASUREMENT OF MILITARY PERFORMANCE Background Military decision makers, ideally, seek perfectly reliable measures of individuals' perfonnar~ce in their military occupational specialities.2 Even with imperfect measures, the decision malcer typically treats as interchangeable measures of an individual's performance on one or another representative sample of military occupational specialty tasks (and subtasks)
From page 209...
... More realistically, however, if the individual's score depends on the particular sample of tasks to which he was assigned, on the particular occasion or station at which the measurement was taken, and/or on the particular observer scoring the performance, the measurement is less than ideally dependable. In this case, interest attaches to determining how to minimize the impact of different sources of measurement error.
From page 210...
... 210 RICHARD J SHAVELSON TABLE ~ Caliber .38 Revolver Operation and Maintenance Task Score Task Subtask Go No Go Load the weapona Reduce a stoppages Unload and clear the weapons ( 1 )
From page 211...
... GENERALIZABILI~ THEORY 211 TABLE 2 Caliber .38 Revolver Operation and Maintenance Task: Time to Complete Tasks Observer Station Occasion Task 1 2 3 4 1 84 85 86 87 82 84 85 85 1 91 92 92 94 83 82 84 85 75 76 78 78 76 76 77 77 75 84 75 76 1 2 83 81 83 81 77 78 76 77 69 70 70 70 94 95 96 97 91 92 93 94 3 99 99 99 99 93 94 94 95 83 83 84 85 *
From page 212...
... Generalizability Theory Approach Two limitations of classical theory are readily apparent. The first limitation is that a lot of information in Table 2 is ignored (i.e., "averaged overly.
From page 213...
... shows whether mean performance times, averaged over all other factors, systematically vary as to the location at which the measurement was taken. Apparently performance time did not differ according to station (variance component for station = O)
From page 214...
... The task effect reflects this characteristic of the performance measurement (variance component for task = 20~. And variation across judges shows whether observers are using the same criterion when timing~:performance.
From page 215...
... The remaining sources of variation in Table 3 reflect combinations or "statistical interactions" among the factors. Interactions between persons and other sources of error variation represent unique, unpredictable effects, the particular performance times assigned to soldiers have one or more components of unpredictability (error)
From page 216...
... can be used to determine whether several judges are needed and whether different judges can be used to score the performance of different soldiers, or whether the same judges must rate all soldiers due to d~sagreements among them. The analysis of the performance-time data in Table 3 suggests, based on the pattern of the variance component magnitudes, that several judges are needed and that the same set of judges should time all soldiers (e.g., variance components for PJ and OJ)
From page 217...
... 60 . Regardless of whether relative or absolute decisions are to be made on the basis of the performance measurement, the dependability of the measure based on the G theory analysis is considerably different than the analysis based on classical theory.
From page 218...
... 218 RICHARD J SHAVELSON TABLE 4 Caliber .38 Revolver Operation and Maintenance Task: Accuracy Observer Occasion Bask Subtasl: 1 2 3 1 0 0 1 1 O 0 1 1 1 0 0 0 1 1 1 1 0 1 1 1 0 O O 2 1 1 1 o o o o 1 3 4 l o o o o o o o l o o o o l o ll O O O o o o *
From page 219...
... 6A comparison of variance components in Table 5 with variance components in Table 3 reveals substantial differences in magnitudes due to differences in metrics. Compare Tables 2 and 4.
From page 220...
... Only the last recommended change can be evaluated with the hypothetical data. By using four observers, the G coefficients are .36 and .29 for relative and absolute decisions, respectively.
From page 221...
... Likewise, the theory speaks of generalizability coefficients rather than the reliability coefficient, realizing that the computed value of the coefficient may change as the definition of the universe changes. Variance Components In G theory, a measurement is decomposed into a component for the universe score (analogous to the true score in classical theory)
From page 222...
... = 2 ~ 2 relative decisions As + Generalizability coefficient for absolute decisions facet design is used for simplicity: soldiers x judges x occasions. The object of measurement, soldiers, is not a source of error and, therefore, is not a facet.
From page 223...
... variation in performance times among soldiers it is the signal the decision maker is looking for. The variance component for occasions is sizable; it represents mean differences in the times on occasions 1 and 2.
From page 225...
... This procedure is dictated in large part by cost and convenience, and perhaps also by lack of information on the consequences this procedure has for the reliability of the performance measurement. Generalizability theory provides a method for estimating measurement error due to inconsistencies arising from one occasion to another, or from one judge to another.
From page 226...
... This tack has the effect of reducing all variance components involving occasions by 1/nO' ~ where nO' is the number of occasions to be sampled in the D study. For example, suppose to reduce measurement errors due to inconsistencies among occasions, a D study were planned to take the average of three occasions' performance times on the revolver test.
From page 227...
... Measurement Error for Relative Decisions For relative decisions, the error variance consists of all variance components that affect the rank-ordering of soldiers; all variance components representing interactions of the facets with the object of measurement soldiers in our example. The error variance for relative decisions is shown in equation 3 of Table 6.
From page 228...
... Thus, it may be important to obtain measures of soldiers' performance on several occasions using several judges so that these influences will be averaged out. Generalizability Coefficients for Relative and Absolute Decisions While stressing the importance of the variance component, G theory also provides a coefficient analogous to the reliability coefficient in classical theory.
From page 229...
... Generalizability theory, then, can model the military decision maker's ideal performance measurement. This is a measurement that generalizes over all possible stations at which the test might be given, over all possible occasions on which the test might be given, over all possible tasks in a military occupational specialty, and over all possible observers who might time and score soldiers' performance.
From page 230...
... 73~. Generalizability theory handles sampling issues in two ways.
From page 231...
... This is not the case for the absolute G coefficient where the variance component for task enters into the definition of measurement error in the random model and is sizable (20~. Even after dividing it by the number of tasks sampled (3)
From page 232...
... 12 1.20 1.20 (c) Generalizability Coefi6icientsb Random Model Mixed Model (T*
From page 233...
... Large variance components associated with the candidate facet would suggest that performance differs across conditions; a large person by candidate-facet interaction would indicate that individuals' performances are not ordered the same under different conditions of the candidate facet. If this is the case, a second stage might be to conduct G-study analyses for each condition of the candidate facet separately.
From page 234...
... In general, interaction components of variance cannot be estimated for interactions containing the nested variable and the variable in which it is nested. Since nested designs do not provide variance components for all sources of measurement error, Cronbach et al.
From page 235...
... Generalizability theory provides a method for taking into account the covariances among performance-measurement scores. Just as univariate generalizability theory stresses interpretations of the pattern of variance components, multivariate G theory stresses interpretation of variance and covariance components (see Webb et al., 1983, for a concise, elementary presentation of the theory)
From page 236...
... For this twofacet, univariate design, 62 is the estimated universe-score variance. For relative decisions, the estimate of the multifaceted error variance is: ~ 2 ^ 2 ^ 2 n2 = c~50 it_ C;sr _p (ares ~ ~ , n, non, where nO' and no' are the numbers of conditions of the facets in the decision study, and the generalizability coefficient is: P = ^ 2 ^ 2 Os + (S In extending the notion of multifaceted error variance to multivariate designs, we treat tasks not as a facet but as three dependent variables: time to load (/so)
From page 237...
... Universe-score variance components are found along the main diagonal, and covariance components are given off the diagonal. The high covariance components among the universe scores across the three tasks, relative to the residual covariance components, indicate that soldiers who load the revolver quickly also remove the stoppage and unload the revolver quickly.
From page 238...
... (d) Multivariate G Coefficient for Relative Decisions a'Vsa a'V a' _ 5 ~ a'V a+ _ S~ n where V is a matrix of variance and covariance components estimated from the mean products matrices, it' is the number of conditions of facet O in a D study, and a is the vector of canonical coefficients that maximizes the ratio of universe-score variation to universe-score plus error variation.
From page 239...
... In our hypothetical example, the multivariate generalizability of a composite composed of the three tasks with performance times obtained at one occasion is .73 for the first canonical variate. In forming the composite to optimize the multivariate G coefficient, time to remove a stoppage was given the greatest weight (.37)
From page 240...
... Estimation—decisions about whether the facets are finite or infinite, random or fixed, and the estimation of variance components; (3) Measurement specification of the facet (or combination of facets)
From page 241...
... ILLUSTRATIVE APPLICATIONS OF GENERALIZABILITY THEORY To this point, generalizability theory has been applied to military performance measurements using hypothetical data. In this section, two published G studies are presented.
From page 242...
... All other variance components were considered measurement error in this study since absolute decisions are made regarding general educational development requirements for each job. These include the components for raters nested within center, center, occasion, and all interactions.
From page 243...
... For this analysis, the matrices of variance components, coefficients of generalizability, and canonical weights corresponding to each coefficient of generalizability were computed. The estimated variance and covariance component matrices are presented in Table 13.
From page 244...
... To obtain results for four raters, the components corresponding to the rater effect and rater interactions need only to be divided by four. As a consequence of the calculation procedure, the variance components in Table 13 are the same as those produced by the univariate analysis.
From page 245...
... The estimates of variance comTABLE 14 Canonical Variates for Multivariate Generalizability Study of General Educational Development Ratings Canonical Coefficients General Educational nr= 1, no= 1 Development Component I I nr=4, nO=1 II Reasoning 0.34 0.38 0.05 Mathematics 0.06 0.06 -1.95 Language 0.51 0.57 1.33 Coefficient of generalizability 0.74 0.92 0.62 NOTE: The design is raters crossed with jobs and occasions.
From page 246...
... (1978; see Shavelson and Webb, 1981~. The magnitudes and pattern of estimated variance components from the three analyses were very similar.
From page 247...
... partially nested design. Interpretation of Variance Components In theory, a variance component cannot be negative, yet a negative estimate occurred (as indicated in TABLE 15 Variance Components for the Study of TankCrew Performance Measurementa Source of Estimated Variance Variation Mean Squares Component Companies (C)
From page 248...
... If, however, the decision maker is interested in the generalizability of the score of a single tank crew selected randomly and observed on a single occasion, the coefficient drops to .48 due to the large residual variance component. The principle of symmetry states that the universe-score variance is comprised of all components that give rise to systematic variation among crews.
From page 249...
... Even the best of theories have limitations in their applications, and generalizability theory is no exception. In concluding, I address the following topics: negative estimated variance components; assump
From page 250...
... Small Samples and Negative Estimated Variance Components Two major contributions of generalizability theory are its emphasis on multiple sources of measurement error and its Reemphasis of the role played by summary reliability or generalizability coefficients. Estimated variance components are the basis for indexing the relative contribution of each source of error and the undependability of a measurement.
From page 251...
... provides an alternative algorithm that sets all negative variance components to 0. Each variance component, then, "is expressed as a function of mean squares and sample sizes, and these do not change when some other estimated variance component is negative" (Brennan, 1983:47~.
From page 252...
... are examining a restricted maximum likelihood approach that, in simulations so far, appears to offer considerable promise in dealing with the negative variance component problem. Assumption of Constant Universe Scores Nearly all behavioral measurement theories assume that the behavior bein, studied remains constant over observations; this is the steady-state assumption made by both classical theory and G theory.
From page 253...
... The variance component that reflects the stability of individual differences over time is the interaction between individuals and occasions. A small component for the interaction (compared to the variance component for universe scores)
From page 254...
... , produced estimates of variance components and generalizability coefficients that were closer to the true values than those from the ANOVA. Concluding Comment Used wisely, none of the foregoing limitations invalidates G theory.
From page 255...
... Tsutakawa 1981 Estimation in covariance components models. Jourrzal of the American Statistical Association 76:341-353.
From page 256...
... 1983 The Estimation of Variance Components for the Dichotomous Dependent Variables: Applications to Test Theory. Unpublished doctoral dissertation,University of California, Los Angeles.
From page 257...
... Maddahian 1983 Multivariate generalizability theory.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.