Skip to main content

Currently Skimming:

Procedures for Eliciting and Using Judgments of the Value of Observed Behaviors on Military Job Performance Tests
Pages 258-304

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 258...
... Such standards could be used to discriminate between enlisters who would not be expected to exhibit satisfactory (or, perhaps, cost-beneficial) on-thejob performance in a military occupational specialty and those who would be expected to exhibit such performance.
From page 259...
... that have produced convincing evidence of the complexity of the various military occupational specialties and the need to describe military occupational specialties in terms of disjoint clusters of tasks. Even when attention is restricted to the job proficiencies expected of personnel at the initial level of skill defined for a military occupational specialty, the military occupational specialty might be defined by several hundred tasks that can reasonably be allocated to anywhere from 2 to 25 or more disjoint clusters (U.S.
From page 260...
... Fourth, the worth or value associated with a given proficiency level in a single task cluster would likely differ, depending on the military occupational specialty in which the task cluster was imbedded. To address these issues, the problem of establishing functions and eliciting judgments that assign value to levels of proficiency in various military occupational specialties (hereafter called "value functions")
From page 261...
... Related issues that must be considered include the comparability of value assignments across tasks within a military occupational specialty, as well as the scale equivalence of value assignments to levels of performance in different military occupational specialties. Using Predicted Test Performances and Value Judgments in Personnel Classification Assuming it is possible to predict enlisters' performances on military job performance tests from the ASVAB or other predictor batteries, and assuming that judgments of the values of these predicted performances can be elicited and combined to produce summary scores for military occupational specialties, there remains the problem of using these summaries in classifying enlisters among military occupational specialties.
From page 262...
... In the third section, we consider the prospects for applying these methods to the problem of setting standards on military job performance tests. Finally, we examine a variety of operational questions that arise in the application of any standard-setting procedure, such as the types and numbers of persons from whom judgments on appropriate standards are sought, the form in which judgments are sought, and the information provided to those from whom judgments are sought.
From page 263...
... Since this review of standard-setting procedures will be restricted to those that have been widely used and/or hold promise for use in establishing standards on military job performance tests, a simple, two-category classification method will be used. Procedures that require judgements about test items will be described apart from procedures that require judgments about the competence of examiners.
From page 264...
... recommended that this initial test standard be adjusted to control the probability that an examinee whose true performance was just equal to the initial test standard could be classified as incompetent due solely to measurement error in the testing process. The adjustment procedure recommended by Nedelsky depends on the assumption that the standard deviation of the test standards derived from the predictions of a sample of judges is equal to the standard error of measurement of the test.
From page 265...
... If these assumptions were correct, and if judges were able to correctly predict the average number of options a minimally competent examinee could eliminate as being clearly incorrect, the initial tests standard resulting from the Nedelsky procedure would be an unbiased estimate of the mean tests score that would be earned by minimally competent examinees. However, studies by Poggio et al.
From page 266...
... The average of the recommended test standards produced by the entire sample of judges is the test standard that results from Angoff's procedure. If for each item on the test the average of the probabilities predicted by the sample of judges was correct, the test standard produced by Angoff's procedure would equal the mean score earned by a population of minimally competent examiners.
From page 267...
... The judges' final task is to answer the following question for each category of test items (Livingston and Zieky, 1982:25~: If a borderline test-taker had to answer a large number of questions like these, what percentage would he or she answer correctly? When a test standard is computed using Ebel's method, a judge's recommended percentage for a cell of the taxonomy is multiplied by the number of test items the judge allocated to that cell.
From page 268...
... Following a review of these data, judges were asked to reconsider their initial recommendations and once again answer, for each item, the question of whether every "successful" examined should be able to answer the test item correctly. These answers were used to compute a new set of recommended test standards in preparation for a final judgment session.
From page 269...
... Proponents of these procedures claim that the types of judgments required—concerning persons rather than test items are more consistent with the experience and capabilities of educators and supervisory personnel. The resulting test standards are thus claimed to be more reasonable and realistic.
From page 270...
... First, unless the sample of examinees that is classified by the judges is in its distribution of test scores representative of the population of examinees to which the test standard is applied, a biased standard will result. Second, in making their classifications it is essential that judges restrict their attention to knowledge and/or skill that is assessed bv the test for which a standard is sought.
From page 271...
... Hambleton and Eignor (1980) recommended that the two test score distributions be plotted on the same graph and that the test standard be defined as the score at which these two distributions intersect.
From page 272...
... We will consider the applicability of the procedures in the order of their initial description. Procedures That Require Judgments About Test Items The Nedelsky procedure may be only partially applicable in setting standards on military job performance tests because it can be used only with multiple-choice test items, while the assessment of "manifest, observable job behaviors" is a central purpose of the military job performance tests.
From page 273...
... , the percentage of examinees who would satisfy the overall military occupational specialty criterion on the pencil-and-paper portion of the job perfo~ance test would be 100 x (1 - 0.05~° = 59.9 percent. Thus almost 40 percent of the examiners would fail the penciland-paper portion of the job performance test, even though only 5 percent would fail to complete any given task.
From page 274...
... Theoretically, Ebel~s method could be applied to an overall job performance test to yield a standard for an entire military occupational specialty rather than a single task. However, several assumptions inherent in the method would then be highly questionable.
From page 275...
... If judges based their recommendations on "the book" they would likely answer questions about most, if not all, activities affirmatively, thus resulting in impossibly high test standards. Our expectation then, is that Jaeger's procedure could be adapted to military job performance tests quite readily, but would likely yield test standards that were impractically high.
From page 276...
... In a somewhat different form, the same problem must be dealt with for the person-based procedures: Should judges be asked to classify examinees as "unacceptable," "borderline," or "acceptable" in the skills defined by the task cluster represented by the test or in all skills needed to function within a military occupational specialty? Since a standard is likely to be desired for a test that is restricted to a single task cluster, one could argue that the appropriate referent population is obvious.
From page 277...
... Recall that classification of examinees to the "unacceptable," "borderline," and "acceptable" groups must be based on information other than scores on the tests for which standards are sought. In the present context, that information would have to consist of observations of on-thejob performance of enlistees early in their initial tours of duty in a military occupational specialty.
From page 278...
... Numbers of Judges to be Used In any standard-setting procedure, the numbers of judges to be used should be determined by considering the probable magnitude of the standard error of the recommended test standard as a function of sample size. Since in all of the standard-setting procedures described in this paper, the recommendations of individual judges are derived more or less independently and are aggregated only at the point of computing a final test standard, the standard error of that recommendation will vary inversely as the square root of the size of the sample of judges.
From page 279...
... The goal in developing stimulus materials should be to minimize the variance of recommendations across judges due to factors other than true differences in their judgment of an appropriate test standard.
From page 280...
... It should not be assumed that judges will know that these numbers represent the proportions of examinees who answered each test item correctly when the test was last administered. Normative data on examiners' test performances have also been provided in the form of an ogive (cumulative distribution function graph)
From page 281...
... In support of their contentions, these authors cite a number of studies in which recommended test standards would have resulted in outlandish examinee failure rates. For example, Educational Testing Service's study to determine standards for the National Teacher Examinations in North Carolina produced recommendations that would have resulted in denial of teacher certification to half the graduates of the state's accredited and approved teacher education programs.
From page 282...
... _ ,= —. ~ To protect against the possibility of failing an examinee as a result of measurement error, several researchers have proposed that initial test standards be adjusted downward by some multiple of the standard error of measurement of the test for which a standard is desired.
From page 283...
... DEFINITION AND CONSTRUCTION OF VALUE FUNCTIONS The problems discussed in this section concern the establishment of functions that assign value (or worth) to different levels of proficiency in completing various military occupational specialty tasks, and the use of these value functions in assessing the overall worth of an enlistee in a specific military occupational specialty.
From page 284...
... . The same performance set would be used in determining minimally acceptable performance levels when establishing job performance test standards.
From page 285...
... There is no reason to believe that value functions would be linear. Intuitively, it would seem that small deviations from minimally acceptable performance levels would result in large changes in value, whereas at some higher levels of perfonnance value functions would change more gradually.
From page 286...
... over different clusters both within and across military occupational specialties. Comparability of value functions is essential to the classification of enlistees into the military occupational specialties and to the assignment of duties within a military occupational specialty.
From page 287...
... The resulting military occupational specialty value functions will then be comparable across military occupational specialties. With military occupational specialty value functions defined in this way, it will be possible to determine the military occupational specialty for which a given enlisted has the greatest value or worth.
From page 288...
... i=1 = _ — N Eliciting Value Judgments Task value functions must be based on two types of judgments. One type concerns the assignment of value to each possible level of performance on each of the dimensions that compose a task performance set.
From page 289...
... Of interest here is the information that is supplied to the judges to assist them in deriving their value functions. The following explanation of the method is set in the context of deriving value functions for dimensions of a military occupational specialty task performance set.
From page 290...
... The final value function for a given dimension in the performance set is the average of the value functions recommended by all judges for that dimension, based on the second scenario. The average value function procedure has merit, in that the scenarios used can be constructed so as to mirror the scenarios used in "hands-on" performance tests or the scenarios used in assessing the relative importance of tasks within a military occupational specialty (U.S.
From page 291...
... A separate discussion of operational issues in this section would therefore be largely redundant. One consideration that is appropriate here but would not be appropriate in the establishment of minimally acceptable performance standards is the use of enlistees themselves to determine value functions.
From page 292...
... These classification strategies fall into two groups: individual classification strategies and institutional classification strategies. Individual Classification Strategies The simplest classification scheme would be to let each enlistee choose his/her preferred military occupational specialty from a pool of military occupational specialties for which his/her predicted performance scores satisfied the minimally acceptable standards.
From page 293...
... in the average values of individuals assigned to all of the military occupational specialties. In this classification scheme, individual classification decisions would be based on predicted value functions for the entire group of new enlisters about whom classification decisions are to be made and also on information about enlisters currently assigned to the military occupational specialties.
From page 294...
... ~ rnua, specialty because this enlistee's predicted value levels are higher than the current estimated average values of individuals assigned to both of the military occupational specialties. Since VmOs~ is so much larger than VmOs2' the military's immediate interest would be to assign enlistees to MOS2 who would have the greatest potential to help raise the current average value of individuals already assigned to MOS2 (provided the military's goal is as we stated earlier)
From page 295...
... __mos2 I 295 I r n n 1 n n 01 n n 1r n r I n n 02 12 _1 _2 V , V 12 12 n 20 rO 22 _1 _2 V , V 22 22 rl 1 2 V , V r1 r1 r R _1 2 V , V r2 r2 _1 2 V , V 1r, 1r n 2r . · · 1 _1 _ 2 V , V 2r, 2r rr · · · 1 _1 _ 2 V , V rr rr FIGURE 2 Two-way table of new enlisters' predicted MOS value functions.
From page 296...
... The predicted average military occupational specialty values for the new enlisters can be expressed as Vmos1 = One lP1 1vl 1 + nl2P12V12 + n21P2lV21 + n22P22V22 ~ / and V ne 2 = And 1 (1—P1 1 ~V1 1 ~ nl2 (1—P12 ~Vl2 ~ n21 (1 P21 ~V2l ~ n22 (1 P22 ~ 22 ~ / Q2 A ~ The goal is to find the pij'S which jointly maximize VmOs~ and ~~os2 while jointly minimizing, if necessary, the amount these values may fall below the current estimated average value of individuals in the military occupational specialties, Vmos~ and Vmos2 A ~ V (P)
From page 297...
... The data are fictitious. Example 1 The following two-way table shows the distribution of 100 new enlistees' predicted value functions for two military occupational specialties.
From page 298...
... These choices assign a larger penalty to MOS2 than to MOST, for decreases in anticipated average values of personnel currently assigned to the military occupational specialties. The linear programming analysis produced the following results: Number of Enlistees Number of Enlistees Cell (i,j)
From page 299...
... ~ n31 = 0 ~ n32 = 0 ~ n33 = 0 1 1 1 1 1 Let Qua = 90 and Q2 = 90. Assume estimates of the average values of personnel currently assigned to the military occupational specialties are Vmos~ = .7 and Vmos2 = ·3.
From page 300...
... The second problem we addressed involves procedures for eliciting and combining judgments of the values of enlisters' behaviors on military job performance tests. We examined the potential contributions of psychological decision theory and social behavior theory to solving this problem and concluded that they were largely inapplicable.
From page 301...
... The results of these studies can and should be employed in developing methods for combining judged values associated with performance of the tasks that compose a military occupational specialty. A method based on weighted averages of value functions, with weights proportional to the judged importance of tasks, was described in detail.
From page 302...
... TR-79-29. Air Force Human Resources Laboratory, Occupation and Manpower Research Division, Brooks Air Force Base, 1ex.
From page 303...
... SR82-2. Air Force Human Resources Laboratory, Manpower and Personnel Division, Brooks Air Force Base, Tex.
From page 304...
... JAEGER AND SALLIE KELLER-McNULIY U.S. Army Research Institute for the Behavioral and Social Sciences 1984 Selecting Job Tasks for Criterion Referenced Tests of MOS Proficiency.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.