Skip to main content

Currently Skimming:

4. Developing an Evaluation Framework for Teacher Licensure Tests
Pages 70-82

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 70...
... CRITERIA FOR EVALUATING TESTS The Standards for Educational and Psychological Testing (American Educational Research Association et al., 1999) provide guidelines for evaluating educational and psychological tests.
From page 71...
... This framework, which relies heavily on the Crocker paper, suggests criteria for test development and evaluation. The framework includes criteria for stating the purposes of testing; deciding on the competencies to test; developing the test; field testing and analyzing results of the test; administering and scoring tests; protecting tests from corruptibility; setting standards; attending to reliability and related issues; reporting scores and providing documentation; conducting validation studies; determining feasibility and costs; and studying the long-term consequences of the broader licensure program.
From page 72...
... Similarly, Popham (1992) says that, "although it would clearly be more desirable to appraise teacher licensure tests using both criterion-related and content-related evidence of validity, this is precluded by technical obstacles, as well as the enormous costs of getting a genuinely defensible fix on the instructional competence of a large number of teachers." The feasibility of identifying in a professionally acceptable way teachers who are and are not minimally competent is unknown.
From page 73...
... This group of researchers applies the recommendations of the standards for validity research on educational and psychological tests to the licensure testing arena. The 1999 standards suggest, but do not require, gathering additional validity evidence for educational and psychological tests, such as evidence on the fit between the intended interpretations of test scores and (a)
From page 74...
... Poggio and colleagues obtained evidence of validity by comparing the performance of education and noneducation majors at the University of Kansas on one of the precursor tests to Praxis the National Teachers Examination Test of Professional Knowledge. The committee contends that current licensing and employment conditions provide new opportunities to collect criterion-related evidence for teacher licensure tests.
From page 75...
... EVALUATION FRAMEWORK This broader conception of validity is reflected in the committee's framework for evaluating teacher licensure tests. The framework does not necessarily call for validity studies that examine the relationships between performance on the tests and future performance in the classroom.
From page 76...
... of the candidates is appropriate; · the assessment tasks, scoring keys, rubrics, and scoring anchor exemplars should be reviewed for content accuracy, clarity, relevance, and technical quality; · the assessment tasks, scoring keys, rubrics, and scoring anchor exemplars should be reviewed for sensitivity and freedom from biases that might advantage or disadvantage candidates from particular geographic regions, cultures, or educational ideologies or those with disabilities; · the developers and reviewers should be representative, diverse, and trained in their task;
From page 77...
... In particular, the criteria for this phase include the following: · the assessments should be field tested on an adequate sample that is representative of the intended candidates; · where feasible, assessment responses should be examined for differential functioning by major population groups to help ensure that the exercises do not advantage or disadvantage candidates from particular geographic regions, races, gender, cultures, or educational ideologies or with those disabilities; · assessment analysis (e.g., item difficulty and discrimination) methods should be consistent with the intended use and interpretation of scores; and · clearly specified criteria and procedures should be used to identify, revise, and remove flawed assessment exercises.
From page 78...
... Protection from Corruptibility Procedures should be used to ensure that candidates' products are authentic, that assessment materials are secure, and that inappropriate coaching strategies do not improve scores. In particular, the committee's criteria include the following: · instructions and procedures should be in place to ensure the authenticity of candidates' responses; · administrative procedures should protect the security of test items and · 1 · r · 1 scoring ruDncs Irom copying or plaglansm; · coaching strategies that are inappropriate or inconsistent with the knowledge and skills tested do not improve performance; · sanctions for possible candidate improprieties related to the assessment should be specified; and · if the assessment is designed to be secure, there should be a sufficient number of exercises and forms available to maintain the assessment over time and to accommodate any retake policy, and effective design should be in place for limiting exercise exposure over time, particularly for memorable exercises.
From page 79...
... These procedures should include statistical equating of forms, procedures for training raters, and procedures for arriving at consensus among raters; · the consistency of decisions should be estimated and reported, taking into account various sources of error, including different assessment exercises, different raters, and different times of assessment; · misclassification rates should be estimated and reported for the entire population and by population groups defined by gender, racial/ethnic status, and other relevant characteristics; and · defensible designs and procedures should exist for equating alternate assessment forms. Score Reporting and Documentation Candidates should be provided a study guide, including sample assessments and guidelines regarding scoring procedures prior to administration of an assessment.
From page 80...
... , priorities for the additional evidence needed, designs for data collection, the process for disseminating results, and a time line; · the validation plan should include a focus on the fairness of the assessment for candidates and on disparate impacts for major candidate population groups; the plan should specify examination of the initial and eventual passing rates; · major stakeholders should have input into the validation plan, and assessment experts should review the plan against current professional standards; · the plan should require periodic review of accumulated validity evidence by external reviewers and appropriate follow-up;
From page 81...
... An analysis of costs and feasibility should consider all components of the testing program, including test development, administration, applicant assessment time, scoring, and reporting. The analysis should be documented.
From page 82...
... Throughout its evaluation framework, the committee has stressed the necessity of adequate documentation as well as the importance of informing candidates about procedures. · The committee's criteria for judging test quality include the following: tests should have a statement of purpose; systematic processes should be used in deciding what to test and in assuring balanced and adequate coverage of these competencies; test material should be tried out and analyzed before operational decisions are made; test administration and scoring should be uniform and fair; test materials and results should be protected from corruptibility; standard-setting procedures should be systematic and well documented; test results should be consistent across test forms and scorers; information about tests and scoring should be available to candidates; technical documentation should be accessible for public and professional review; validity evidence should be gathered and presented; costs and feasibility should be considered in test development and selection; and the long-term consequences of licensing tests should be monitored and examined.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.