Skip to main content

Currently Skimming:

Part II: Job Performance Measurements Issues
Pages 35-100

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 35...
... A second concern was that the scaling of hands-on performance scores should go beyond rank ordering. That is, there was a need for a score scale that could be interpreted in terms of, at a minimum, acceptable and unacceptable performance, and preferably at finer gradations.
From page 36...
... This issue has been of particular importance because only a few jobs were selected for detailed study in the JPM Project and there was a need to generalize the findings to the several hundred jobs performed by first-term enlisted personnel. For the current model, the multilevel regression analysis method was recommended because of its contributions to performance prediction at the job level.
From page 37...
... The full cost/performance trade-off model takes specifications for required performance levels for each different job or family of jobs, deter 37
From page 38...
... The general idea is to look at predicted performance levels for new recruits at different times and see how these levels varied across time and by job. At the very least, this normative approach will provide plausible ranges for performance level goals.
From page 39...
... Task Sampling In the JPM Project, a limited number of tasks was selected for measuring performance in each job. If these tasks were selected randomly from an exhaustive list of job tasks, generalization from scores on the hands-on tests to the entire domain of tasks would be simple and easy to defend.
From page 40...
... This generalization is not necessarily bad, but the relevant domain should be kept in mind when it comes to setting performance standards. Higher performance levels would almost surely be expected for important and frequent tasks than for less important and less fret ticks hilt lower performance levels might be required for less ~ ~ ~ ~ ~ _ ^ A ~ ~ ~ - r dangerous tasks in comparison to more dangerous tasks; lower performance levels would also be expected for more difficult tasks in comparison to trivially easy tasks.
From page 41...
... Groupings of individual task steps into four knowledge and two skill categories were also analyzed. The Marine Corps created a matrix that mapped task steps onto different "behavioral elements." Although interesting, these subscores did not lead to
From page 42...
... If this were the case, then performance level requirements might generalize to new jobs more easily than quality require ments would. Another question is how much predicted performance levels have varied over time, overall and by job.
From page 43...
... This is important because the performance prediction equations in the linkage model are based on the subtests in the current ASVAB. Earlier ASVAB forms had different subtests and a number of assumptions would be required in generating AFQT and technical composite scores from these prior forms for use in the prediction equation.
From page 44...
... Mean predicted performance scores by job and year are also plotted in the table. Table 2 shows the means for the AFQT and technical composites and predicted performance by year.
From page 45...
... Second, there was some consistent variation among jobs in predicted performance levels, with lows of around 60 for Army field artillery (13B) and truck driver (64C)
From page 46...
... TABLE 3 F-Ratios Testing Components of Variance for Aptitude and Predicted Performance Scores (Based on 106,663 observations) AFQT TechnicalPredicted Component (Sum of SS)
From page 47...
... In the SYNVAL project, four different skill levels were defined for each job. These skill levels were tied to operational decisions that supervisors would make about job incumbents in an effort to derive cost implications for the different performance levels: Unacceptable: the recruit cannot perform the job, is not likely to become an acceptable performer with additional training, and should be discharged; Marginal: the recruit is not performing acceptably and should be given additional training to bring performance up to standard; Acceptable: the recruit is performing at an acceptable level and making a positive contribution to force readiness; and Outstanding: the recruit is performing well above minimal standards and should be given a promotion or other recognition for superior performance.
From page 48...
... Overall performance goals would then reflect an average of scores from these three performance levels, with each level weighted according to this optimal mix. As indicated below, the SYNVAL project results speak primarily to the first step of defining the different performance levels.
From page 49...
... This approach assumes that empirical data on soldier performance are available (in the form of hands-on tests scored GO/NO-GO) on a representative sample of the soldiers in question so that these "percentage-performing" estimates can be related to actual performance scores.
From page 50...
... The specific methods used to estimate the percentages of soldiers performing at or below specific critical incident or Percent-GO score levels are detailed in Whetzel and Wise (1990~. Phase II Results Table 4 shows the means and standard deviations of the judges' ratings of the percentage of soldiers performing at each acceptability level for each combination of performance dimension and MOS.
From page 52...
... 52 a a 04 ;> ~ a Cat m ~ a _ ~ I 00 00 ~ En r $:: a V)
From page 53...
... 53 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ oo ~ ~ oo ~ ~ ~ ~ _______________-_ -o ~ ~ ~ C~ ~ C~ ~ ~ C~ ~ ~ ~ ~ ~ ~ - , - , - °° __________~______ __ ~ ~ o - _ - o ( - , - C~ _ _ _ _ _ ~ ~ ~o ~ _ _ _ _ _ _ _ C ~- -} C']
From page 54...
... Comparison of Task-DPG and Task-APG Results The task-DPG and task-APG methods are of particular interest because they use the same Percent-GO scale used in the DoD model. The only
From page 55...
... was provided. Raters were asked to draw lines between the score levels to indicate divisions between different performance levels (unacceptable versus marginal, marginal versus acceptable, and acceptable versus outstanding)
From page 56...
... Table 8 shows the estimated percentage of current job incumbents at the lower and higher performance levels. The Phase III approach attempted to combine the task-based and soldier-based approaches by providing both criterion information (about the tasks)
From page 57...
... There were a number of arguments and concerns about differences in procedures and instruments and the reliability of individual judgments was not extremely high. One persistent finding was that standards set using the hands-on performance tests appeared harsh in comparison with direct estimates of performance level distributions.
From page 58...
... Mean predicted performance scores ranged between 60 and 80 percent across different jobs with an overall average of 66 percent. Based on these data, it appears reasonable to use past predicted perfor mance levels in setting performance targets tier each job, but generalization across jobs will be somewhat limited.
From page 59...
... A similar effort with the hands-on test performance levels would help in the development of explicit rationales for economic consequences of performance at specific levels. Linking Job Characteristics to Performance Distribution Targets While common performance level descriptions appeared feasible, there was considerable variation across jobs in the proportion of incumbents at each level.
From page 60...
... Wise, L.L., Peterson, N.G., Hoffman, R.G., Campbell, J.P., and Arabian, J.M. 1991 The Army Synthetic Validity Project: Report of Phase III Results, Volume I
From page 61...
... These events simultaneously focused attention on the need to relate the ASVAB to measures of job performance and the absence of such measures. The outcome of this series of events was an all-Service effort to measure job performance and to determine the relationship between job performance and military enlistment standards.
From page 62...
... However, there can be limitations regarding the types of tasks they can assess (e.g., it would be difficult to assess a military policeman's proficiency at rint control using a hands-on measure) , and their resource primacy is virtu ~ ~ _ _ ~ _ __ ~ _ _ _ Y~ ~ _ _ ~ _ ~ ~-A ~ a_ 1-% ~ 1 1 _ ally required given their expense IO Develop <~e mile LIllU Wi:1111~11, 1993, and Wigdor and Green, 1991, for detailed descriptions of the development of the hands-on performance tests)
From page 63...
... Transporting validation results beyond a specific setting to other settings has been the concern of two methods in the industrial/organizational psychology literature: validity generalization and synthetic validation. In this paper, following a brief discussion of these two methods, a third method that can be used to provide performance predictions for jobs that are devoid of criterion data-multilevel regression is introduced and discussed in detail.
From page 64...
... Although the corrected mean validity could be used to forecast performance scores, the approach is too indirect. Selection decisions are often based on prediction equations, which may in turn comprise a number of tests.
From page 65...
... To illustrate the process of obtaining synthetic validity estimates, consider the Army's Synthetic Validation Project (SYNVAL) as an example.
From page 66...
... In the next section, multilevel regression analysis is proffered as an alternative method for deriving job performance predictions for jobs without criterion data.
From page 67...
... This is the general form for the equation linking individual job performance to enlistment standards the linkage equation. This model says that ok, Hi, and ~yj can, in principle, vary across jobs.
From page 68...
... A multilevel regression model was chosen for the linkage equation because the JPM data are multilevel or "nested." Specifically, in the JPM database, individuals are nested within jobs.4 Ordinary least-squares (OLS) regression models are inappropriate for multilevel data.
From page 69...
... in the Linkage project, the variables constituting it deserve comment. The Building Blocks of the Linkage Equation: Individual- and Job-Characteristic Variables As noted above, equation (1)
From page 70...
... Using each of the ASVAB subtest scores as a predictor along with the other measures of individual characteristics and job characteristics would have involved estimating too many parameters, whereas using only the Armed Forces Qualification Test (AFQT) or some other general ability factor might have missed important job differences.
From page 71...
... Table 2 contains job-specific means and standard deviations for the individual characteristics for each of the jobs in the JPM Project sample. Derivation of the Job-Characteristic Variables Development of the job-level variables for the multilevel model was based on an analysis of civilian jobs.
From page 73...
... 73 o ~oo ~ ooo _ ~ ~ oo _ oo ~oo~ ~ oo .
From page 74...
... . These job-specific component scores were then used as the job-characteristic variables in the multilevel regression model of military job performance.
From page 75...
... The structure of the model parameters for the linkage equation is the following: O(j 0( + Atop + Am ' /3; = 13 + ACES + 77pj, fj = ~ + ~¢Mj + I, (9)
From page 76...
... Because the linkage equation contains both individual- and job-level predictors, it qualifies as a multilevel model, with individuals being level one and jobs being level two.~3 This structure for the model parameters assumes that some of their variation (except for y) is due to characteristics of the jobs.
From page 77...
... This program uses maximum likelihood estimation to obtain parameter values for the model. The specific, unstandardized parameter estimates for the linkage equation are given in Table 3, along with their associated standard errors.
From page 78...
... As an example of the information provided in the table, the substantial negative correlation between the intercept and TECH indicates that the TECH parameter tends to be smaller in jobs having a higher overall mean performance level.
From page 79...
... Variance of Predicted Performance Scores 79 Although VARCL provides standard errors of the model parameters, no standard error of estimate is printed. One may be calculated, however, by taking the square root of the equation: V(Pij ~ = V(~)
From page 80...
... Table 5 contains the means of the predicted performance scores and associated standard errors of estimate for the 24 JPM jobs. The values of the standard error range from 6.80 for Army MOS 64C (motor transport operator)
From page 81...
... In contrast, the Army's infantry HOPT assesses performance on first-term tasks only. This difference in performance test design is reflected in the mean HOPT scores for Marine Corps rifleman (mean = 52.62, SD = 8.96)
From page 82...
... The principal advantage of the primary linkage equation is that it allows performance predictions for jobs having no criterion data. Using ordinary regression, performance scores can be estimated for individuals without criterion data by weighting their predictor information by the appropriate regression coefficients.
From page 83...
... (20) The coefficients for the job-specific linkage equations for the 24 JPM jobs used in the Linkage project are given in Table 6.
From page 84...
... As above, the weighted mean factor scores would be inserted into the primary linkage equation to generate performance equations for each of the occupational codes. The model also may be amended to include additional or different individual and job characteristics.
From page 85...
... As a result, there is some question about the degree to which the parameters from the primary linkage equation and the corresponding job-specific equations would change if any one of the 24 jobs were deleted from the sample. Second, quite apart from the capability simply to generate job-specific linkage equations for jobs devoid of criterion data and the independence of those equations from the 24 jobs included in the estimation sample is the issue of how well those generated equations actually predict performance in the out-of-sample jobs.
From page 86...
... The capacity to generate predicted performance scores (via job-specific linkage equations) for individuals in jobs for which no criterion data are available begs the question of how well the jobspecific linkage equations predict performance for various jobs in particular, any out-of-sample jobs without criterion data.
From page 87...
... Nevertheless, both sets of analyses provide an empirical test of the ability of the job-specific linkage equations to provide accurate predictions of actual performance scores for out-of-sample jobs. Such information is vital because these situations reproduce the scenario in which the primary linkage equation would be implemented by manpower planners.
From page 89...
... 89 oo oo ~ cr~ o o o o ~ ~ ~ oo o _ _ ~ .
From page 90...
... The question remaining is what to make of this difference in R2 values. Shrinkage Formulae The value of R2 obtained for the least-squares job-specific regression equation (R2o~s in Table 8)
From page 91...
... were then compared to the cross-validity R2 values obtained from the job-specific equations generated by the 23job and primary (24job) linkage equations in the holdout and newjob analyses, respectively (i.e., R2CV)
From page 92...
... For the 7 new jobs that were not part of the original 24job estimation sample, however, the sample-based linkage equations were generated using full-sample weights (i.e., the 24job primary linkage equation) and used to estimate performance scores for all individuals in a new job (as will be the case upon implementation by manpower planners)
From page 93...
... The R2 values for the job-specific least-squares and linkage equations given in Table 8 have not been corrected for range restriction or criterion unreliability. For the job-specific least-squares regression equations, values of the multiple correlation range from r = .26 (Army MOS 13B)
From page 94...
... These results are positive and supportive of the multilevel regression approach to predicting performance for jobs without criterion data. Discussion One characteristic shared by validity generalization, synthetic validation, and multilevel regression is that they act as "data multipliers" they take the results of a set of data and expand their application to other settings when the collection of complete data is too expensive or impossible.
From page 95...
... One potential drawback of applying multilevel regression techniques is that a number of jobs must have criterion data for estimating the primary linkage equation. The 24 jobs used in the Linkage project did supply enough stability to obtain statistically reliable results based on acrossjob variation, but including more jobs in the estimation sample would certainly have resulted in better estimates.
From page 96...
... Both procedures should be examined closely in future research because they have the potential for turning an initial investment into substantial cost savings they make a few data go a long, long way. ACKNOWLEDGMENTS The author wishes to thank Larry Hedges and Bengt Muthen for their invaluable help and patience in communicating the details of multilevel regression models and their application, the Committee on Military Enlistment Standards for their challenging comments and creative ideas, Linkage project director Dickie Harris for his support and good humor throughout this research, and the reviewers of the manuscript for their careful reading of a previous version of this chapter.
From page 97...
... Maier, M.H., and Hiatt, C.M. 1984 An Evaluation of Using Job Performance Tests to Validate ASVAB Qualification Standards (CNR 89)
From page 98...
... 1984 Synthetic validity: A conceptual and comparative review. Journal of Applied Psychology 69:322-333.
From page 99...
... Wise, L.L., Peterson, N.G., Hoffman, R.G., Campbell, J.P., and Arabian, J.M. 1991 Army Synthetic Validity Project Report of Phase III Results, Volume I (Report 922)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.