Skip to main content

Currently Skimming:

8. Evaluating the Quality of Performance Measures: Criterion-Related Validity Evidence
Pages 141-183

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 141...
... Such investigations, commonly known as criterion-related validity studies, seek evidence that performance on criteria valued by an organization can be predicted with a useful degree of accuracy from test scores or other predictor variables. The implications of a criterion-related validity study depend only secondarily, however, on the strength of the statistical relationship that is obtained.
From page 142...
... Among the justifications that might be presented for the use of a test to select or classify applicants, none is apt to be more persuasive or intuitively appealing than the demonstration that test scores predict actual on-the-job performance. Like Thorndike's ultimate criterion, however, actual on-thejob performance
From page 143...
... Similarly, criterion deficiency is most serious when the criterion measure fails to include elements of job performance that are related to the predictor constructs (Broaden and Taylor, 19501. Of particular concern are situations in which criterion deficiency or contamination "enhancers]
From page 144...
... The number of people with each possible combination of test and criterion scores is shown in Table 8-1. This simple table contains all the information about the relationship between the test scores and the criterion measure.
From page 145...
... . A variety of tabular and graphical summaries similar in general nature to the above tables can be useful in summarizing relationships between test scores and criterion measures.
From page 146...
... For example, the effects of the reliability of the criterion measure, the effects of basing coefficients only on samples of job incumbents who have already been selected on the basis of test scores and successful completion of training, and the possibility that validities and predictive equations may differ as a function of subgroup (e.g., men and women or blacks, whites, and Hispanics) - are all important considerations in a criterion-related validity study.
From page 147...
... The threats to validity of criterion contamination and criterion deficiency apply as much to hands-on measures as to alternative criterion measures, such as job knowledge tests, ratings, or administrative records. As is also true of other types of measures, the quality of handson measures also depends on the reliability of the measures, that is, the degree to which the scores that are obtained can be generalized across test administrators, tasks, and administration occasions.
From page 148...
... + .15 (UNIQUE2) where CORE is the weighed sum of the basic infantry core duty area/task scores, UNIQUE1 is the score from the rifle live fire task, and UNIQUE2 is the score from the MOS supplementary tasks (tactical measures 2 and squad automatic weapon 2)
From page 149...
... This apparent complexity and the differential dependency of subtasks on cognitive ability are supported by an inspection of the correlations of the task scores with GT, the General Technical ASVAB composite score. As would be expected, tasks judged to have a greater cognitive component have relatively high correlations with GT, and the duty areas that generally involve manipulation of weapons (rifle, live fire, squad automatic weapon 1 and 2, and hand grenades)
From page 150...
... Correlations of job knowledge tests with other criterion measures, particularly with hands-on job performance measures, take on particular importance due to the concern about criterion contamination that may be correlated with the predictor test scores. Results reported for 15 specialties/ratings (9 Army, 4 Marine Corps, and 2 Navy)
From page 151...
... Compared with other variables, however, the link beTABLE 8-5 Correlations of Paper-and-Pencil Job Knowledge Test Scores With Hands-On Job Performance Total Score Service Specialty (MOS/Rating) Correlation Army Infantryman .44 Cannon crewman .41 Tank crewman .47 Radio teletype operator .56 Light wheel vehicle/ power generator mechanic .35 Motor transport operator .43 Administrative specialist .57 Medical specialist .46 Military police .37 Marine Corps Infantry assaultman .49 Infantry machinegunner .61 Infantry mortarman .55 Infantry rifleman .52 Navy Machinist's mate (engine room)
From page 152...
... As a criterion measure, interviews are apt to be both deficient and contaminated. A person may be able to describe what would be done, for example, without being able actually to perform the task.
From page 153...
... The correlations of MOS-specific ratings with total hands-on performance scores were lower than the correlations of paper-and-pencil job knowledge tests with hands-on total scores for each of the nine occupational specialties for which the Army obtained hands-on measures. As shown in Table 8-7, the correlations of ratings with TOTAL ranged from a low of .18 to a high of .28 for the nine Army occupational specialties.
From page 154...
... Supervisor ratings Peer ratings Self ratings Machinist's mates (generator room) Supervisor ratings Peer ratings Self ratings Radioman Supervisor ratings Peer ratings Air Force Air traffic control operator Supervisor dimensional ratings Peer dimensional ratings Self dimensional ratings · · .
From page 155...
... The school knowledge tests, thus, were limited in content to topics covered in training classes, whereas the job knowledge tests included areas that might be dealt with in on-thejob training. The correlations of school knowledge test scores with job knowledge test scores and with hands-on performance were reported by Campbell et al.
From page 156...
... , the correlation of school knowledge and job knowledge scores is greater than .60. Although none of the correlations of the school knowledge test scores with the hands-on performance measures reaches .60, school knowledge and hands-on performance scores are clearly related.
From page 157...
... In the Army's research, six spatial ability tests (Assembling Objects, Map, Mazes, Object Rotation, Orientation, and Figural Reasoning) were combined to form a single spatial ability composite score (McHenry et al.,
From page 158...
... . Problems in Linking Predictor and Criterion Constructs Correlations of predictor scores and scores on a criterion measure are affected by the facts that criterion measures are always less than perfectly reliable and that the correlations can be computed only for people who have been selected for the job and are still on the job at the time the criterion scores are obtained.
From page 159...
... But criterion reliabilities are often substantially lower than those of predictor tests. Hence, adjustments for the unreliability of criterion measures can have a more substantial effect.
From page 160...
... Summary of Relationships Between Predictors and Criterion Measures Prediction of Hands-On Performance The Office of the Assistant Secretary of Defense—Force Management and Personnel (1989) summarized the validity of the AFQT for predicting the hands-on performance measures of the JPM Project in its January 1989 report to the House Committee on Appropriations.
From page 161...
... .17 .36 NA Jet engine mechanic .10 .29 NA Info. systems radio operator .32 .35 NA Personnel specialist .29 .53 NA Army Infantryman .25 .34 .41 Cannon crewman .13 .15 .20 Tank crewman .26 .31 .37 Radio teletype operator .34 .51 .53 Light wheel vehicle/ power generator mechanic .13 .13 .28 Motor transport operator .24 .39 .52 Administrative specialist .35 .49 .50 Medical specialist .28 .46 .57 Military police .23 .49 .57 Marine Corps Rifleman .40 .55 .62 Machinegunner .49 .66 .68 Mortarman .33 .38 .48 Assaultman .38 .46 .50 Navy Machinist's mate .23 .27 NA Radioman .22 .15 NA Median Correlation .26 .38 .50 NOTE: AA correlations reported only for Army and Marine Corps.
From page 162...
... The leaf of 9 next to the stem of .4, for example, indicates that the uncorrected correlation between the AFQT and the hands-on performance total score was .49 for 1 of the 23 occupational specialties (Marine Corps machinegunner, see Table 8-10~. The stem-andleaf plot to the right of center displays the distribution of correlations after corrections for range restriction have been made.
From page 163...
... The validity coefficients, while generally somewhat lower than validities that have been reported using school or job knowledge tests as criterion measures, are consistently positive and in most cases are high enough to have practical value for purposes of selection and classification. There clearly is a substantial degree of variability across specialties in the validity of the ASVAB for predicting hands-on performance.
From page 164...
... Because they are paper-and-pencil tests, however, job knowledge tests raise concerns about criterion contamination and criterion deficiency. Questions about criterion deficiency can be addressed in part by analyzing the strength of the relationship between the job knowledge test scores and hands-on performance measures, such as those summarized in Table 8-5.
From page 165...
... Since job knowledge tests and the ASVAB depend on the results of paper-and-pencil, multiple-choice testing formats, it might be expected that this common method variance would inflate the correlation between the two types of measures. Comparisons of the uncorrected predictive validities of the appropriate aptitude area composite on the ASVAB using job knowledge tests and hands-on performance measures are provided in Figure 8-2 for the nine Army and four Marine Corps occupational specialties for which 0.6 I I I , I I , , , , I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ ~ t I ~ ~ t ~ I ~ t I 0 5 , I ~ I I ~ 1 1 I t 1 1 1 1 1 1 1 1 1 1 1 1 1 i I I ,~, I I I ~ I t I ~ I I I t I I ~ I 1 1 1 1 1 1 1 1 1 1 1 1 0.4 ~ ~ ~ I ~ I I ~ ~ I I I I ~ I ~ I ~ ~ I ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ ~ ~ 0 ~ 0 ~ 0 ~ ~ ~ ~ ~ >` 1 1 1 1 1 1 4 1 1 1 1 1 1 ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 0.3 Y ~ ~ ~ ~ ~ ~ ~ O ~ I ~ I tI5 ~ ~ ~ I ~ ~ ~ ~ I I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ 1 0.2 ~ ~ I ~ ~ I I ~ I , ~ ~ ~ 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ ~ ~ ~ 1 1 ~ ~ ~ ~ ~ 1 1 1 1 1 1 1 1 1 ~ ~ ~ i ~ ~ ~ 0 Hands-on Test 0.1 ~ ~ ~ ~ ~ ~ ~ ~ Job Knowledge Test 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ~ 1 1 1 1 I I I l I I I I I I I I I 1 1 1 1 1 1 1 1 1 1 1 1 1 O ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ 6 7 8 9 10 11 12 13 Occupational Specialties Army: 8 = Medical specialist 1 = Infantryman 9 = Military police 2 = Cannon crewman Marme Corps: 3 = Tank crewman 4 = Radio teletype operator 10 = Rifleman 5 = Light wheel vehicle/power 11 = Machinegunner generator mechanic 12 = Mortarman 6 = Motor transport operator 13 = Assaultman 7 = Administrative specialist FIGURE 8-2 Aptitude area validities for hands-on and job knowledge tests for 13 occupational specialties.
From page 166...
... Although the differences shown in the figure are sometimes relatively large, a given aptitude area composite appears to have a useful degree of validity for both criterion measures in all 13 jobs. Thus, it would appear that validation studies using job knowledge tests as criterion measures can provide useful indications of the predictive validity of ASVAB scores for military jobs.
From page 167...
... . Note: The criterion measures that comprise each factor are as indicated.
From page 168...
... Predictor and Criterion Measure Construct Similarities The corrected validities of the four ASVAB aptitude factors for one of the Marine Corps specialties (rifleman) are shown in Figure 8-4 using handson total scores and job knowledge test scores as the criterion measures.
From page 169...
... , rather than using the shorter and more traditional labels of differential validity and prediction. Demonstrating that the ASVAB has a useful degree of validity for handson criterion measures in all military jobs would be sufficient if the ASVAB was used only for making selection decisions.
From page 170...
... It was expected that the lack of greater variation in the pattern of validities across jobs might be partially due to the substantial general cognitive component in all of the nonspeeded ASVAB tests and in the training criterion measures. Assuming that the hands-on criterion measures reflect greater differentiation between jobs than the more cognitive training measures, it might therefore be expected that the pattern of validities would be more variable from job to job with hands-on measures than had been previously obtained using training measures.
From page 171...
... FAIRNESS ANALYSIS The discussion now turns to group differences in predictor test scores and in job performance scores. Implementation of the Civil Rights Act of
From page 172...
... Hence the need was felt to find out whether these differences in average test scores are related to real performance differences or are artifactual. The now-conventional fairness analysis focuses on whether selection tests like the ASVAB function in the same way for different population groups, wherein it examines two questions: whether the correlations of test scores with on-thejob criterion measures differ by racial or ethnic group or gender; and whether predictions of criterion performance from test scores differ for employees who are of different racial or ethnic identity or gender.
From page 173...
... Table 8-13 summarizes correlations between criterion measures and selection composites used by the Services. (One occupational specialty with only three black
From page 174...
... . When viewed in this global manner, these results reveal the already-noted fact that validities tend to be higher when a written job knowledge test is used as the criterion as opposed to a hands-on measure of performance (due at least in part to method effects)
From page 175...
... Standard Errors of Prediction Studies of group-to-group differences in prediction systems have generally found regression equations to yield more accurate predictions of performance for nonminorities and women than for minorities and men (Linn, 1982~. The use of hands-on performance criteria in the JPM Project, as opposed to paper-and-pencil tests, does not appear to have altered that finding to any great extent.
From page 176...
... Thus, the stable values in the table suggest relatively minor group differences in the accuracy of prediction. Slopes and Intercepts Hypothesis tests of group-to-group differences in slopes were conducted for all occupational specialties with appropriate data.
From page 177...
... Although in most cases small in magnitude, the intercept differences do show a trend found in previous differential prediction studies toward positive values (Linn, 1982; Houston and Novick, 1987) , which implies that the use of prediction equations based on the pooled sample in these jobs would result in overprediction of black performance more often than not.
From page 178...
... This last result, again, may well be due to the fact that the AFQT is not an explicit selection variable for most jobs; hence, group differences in this case are confounded by differing degrees of range restriction in the comparison groups. TABLE 8-17 Standardized Differences Between Predicted Scores from Black and Pooled Equations Selection Black Composite AFQT Sample Size -1SD Mean +1SD -1 SD Mean +1 SD Fewer (Avg.)
From page 179...
... The average group differences in scores across the 21 studies was -.85 of a standard deviation on the AFQT, -.78 of a standard deviation on the job knowledge- test, and -.36 of a standard deviation on the hands-on test. In addition to these mean differences, the stem-and-leaf plot in Table 8-19, which compares group differences on hands-on and job knowledge criterion measures, shows that the entire distribution of differences on the hands-on criterion is shifted in a direction that indicates greater similarity between blacks and nonminorities on the more concrete and direct measure of job performance.
From page 180...
... Mean AFQT and Hands-On Criterion Performance of Black and Nonminority Enlisted Personnel Leaf Stem Leaf a. Job Knowledge Criterion Mean =-.78 SD = .24 +.0 87 7 951 7 -.2 -.3 -.4 -.5 -.6 -.7 3 -.8 0 -.9 3 -1.0 -1.1 -1.2 11 2 27 568 9 579 04678 38 -1.8 7 AFQT Mean = -.85 SD = .41 b.
From page 181...
... Regression Equations Standard errors of estimate, slopes, and intercepts from the regressions of hands-on performance on the ASVAB selection composite were examined in the 16 occupational specialties for which data
From page 182...
... Whereas the typical finding in studies of differential prediction by gender is that slopes among samples of women are steeper than among samples of men, half of the jobs in the current data showed the opposite trend. One might suspect that the use of a performance criterion rather than a written test of job knowledge might account for this discrepancy with previous studies.
From page 183...
... CONCLUSION The JPM Project has succeeded in demonstrating that it is possible to develop hands-on measures of job performance for a wide range of military jobs. The project has also shown that the ASVAB can be used to predict performance on these hands-on measures with a useful degree of validity.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.