Skip to main content

Currently Skimming:

5 The Psychometric Quality of the Assessments
Pages 79-118

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 79...
... The assessments are the tools with which the board's primary goals are accomplished, and thus their psychometric quality is critical to the program's effectiveness. Our evaluation framework includes a number of other questions, but we view the psychometric evaluation as central to a review of a credentialing test.
From page 80...
... With regard to credentialing assessments, they lay out guidelines for the process of identifying the competencies to be assessed; developing the assessment and exercises; field-testing exercises; administering the exercises and scoring the responses; setting the passing standard; and evaluating the reliability of the scores, the validity of interpretations based on the assessment results, and the fairness of the interpretations and uses of these results. From our review of these standards, we identified a set of specific questions to investigate with regard to the development and technical characteristics of the NBPTS assessments.
From page 81...
... COMMITTEE'S APPROACH TO THE PSYCHOMETRIC EVALUATION Sources of Information Reviewed Our primary resource for information about the psychometric characteristics of the assessments is the annual reports prepared for the NBPTS by its contractor at the time, the Educational Testing Service, to summarize information related to each year's administrations, called Assessment Analysis Reports. We reviewed the three most recent sets of these reports, which provided information for administration cycles in 2002-2003, 20032004, and 2004-2005.
From page 82...
... In March 2007, the NBPTS provided us with a newly prepared technical report in draft form (National Board for Professional Teaching Standards, 2007) , presumably in response to our repeated efforts to collect information about the testing program.
From page 83...
... In this section we describe the procedures the national board established for conducting this work, and we note instances in which their procedures deviate markedly from established norms. Development of the Content Standards The content standards are the cornerstone of any assessment program. In the case of the national board, the overall design of the program called for a set of assessments for each of many areas of specialization, the standards for all of which would be closely linked to the five core propositions regarding the characteristics of accomplished, experienced teachers (see the list in Chapter 4)
From page 84...
... According to the board's handbook (National Board for Professional Teaching Standards, 2006a) , the NBPTS posts requests for nominations to the standards committees on its website, circulates the requests at conferences and meetings, and solicits nominations directly for committee members from disciplinary and other education organizations, state curriculum specialists and chief state school officers, education leaders, board-certified teachers, and the NBPTS board of directors.
From page 85...
... They also confer with other professionals in the field and the public on the appropriateness of the content standards and provide advice on the implementation of the certification process. Standards committee members are expected to be up to date on the contemporary pedagogical research in their particular field, and NBPTS staff indicated that reviews of this literature (or at least, lists of articles to read)
From page 86...
... The board recruits practicing teachers in the subject area and developmental level of each particular certificate -- soliciting nominations from professional organizations, teachers who have been involved in previous assessment development activities, and other interested teachers who volunteer. The recruited teachers are assigned to assessment development teams, which work with the test developer to construct draft portfolio and assessment center exercises and scoring rubrics that reflect the standards for the certificate area.
From page 87...
... The content standards are written with the aid of professional writers, which results in an easily readable "vision" of accomplished practice but not one that automatically translates into an assessment plan. With regard to the development of the content standards and assessment exercises, we conclude: Conclusion 5-1: The process used to identify the knowledge, skills, dispositions, and judgments to be assessed was conducted in a reasonable fashion for a certification test, using diverse and informed experts.
From page 88...
... THE NBPTS APPROACH TO Scoring the Assessments and SETTING THE PERFORMANCE STANDARDS Scoring of Assessments Training the Raters Portfolio and assessment center exercises are scored during different scoring sessions by different groups of raters (scorers) . Raters are not required to be board-certified teachers but must have a baccalaureate degree, a valid teaching license, and a minimum of three years of teaching experience.
From page 89...
... Overall, the expert panels judged that the classroom-based portfolio entries should be accorded the most weight, with somewhat less weight assigned to the assessment center exercises and the documentation of other accomplishments. Currently each of the three classroom-based
From page 90...
... During the assessment development phase, TAG explored a variety of processes for determining the cut score for the NBPTS assessments. These standard-setting studies are reported in various TAG reports and documented in Jaeger (1998)
From page 91...
... As we have noted, the NBPTS adjusted the cut scores in 1997 to make them consistent across assessments and to limit the impact of false-negative misclassifications; thus it is clear that the NBPTS considers the cut score to be adjustable when warranted. Given the structure of the assessment and the general approach taken in 1997, a case could be made for setting the passing score at 250, halfway between the average scores of 2 and 3.
From page 92...
... By equating the objective test scores across administrations and scaling the performance test scores to the objective test, one can indirectly link ("equate") the assessments across administrations (years)
From page 93...
... For the NBPTS, a portion of the exercise is scored by two raters and a portion is scored by a single rater. The scores of both the single-scored and double-scored raters are used in estimating interrater consistency (National Board for Professional Teaching Standards, 2007)
From page 94...
... Twenty-five percent of the exercises are scored by two the three administration cycles. The rater reliability estimates ranged from .51 to .94, with higher estimates reported for the early adolescent mathematics assessment.
From page 95...
... The committee reviewed data from three administration cycles and
From page 96...
... 96 ASSESSING ACCOMPLISHED TEACHING TABLE 5-2 Average Rater Reliability Across Three Administration Cycles (2002-2005) for Early Adolescence Mathematics and Middle Childhood Generalist Average Exercises Type Reliability Early adolescence mathematics     Developing and assessing mathematical thinking Portfolio .65 and reasoning Instructional analysis: whole class mathematical Portfolio .57 discourse Instructional analysis: small group mathematical Portfolio .67 collaboration Documented accomplishments: contributions to Portfolio .63 student learning Median portfolios .66 Algebra and functions Assessment .94 Connections Assessment .80 Data analysis Assessment .85 Geometry Assessment .86 Number and operations sense Assessment .94 Technology and manipulatives Assessment .73 Median assessment center exercises .86 Middle childhood generalist Writing: thinking through the process Portfolio .59 Building a classroom community through social Portfolio .53 studies Integrating mathematics with science Portfolio .54 Documented accomplishments: contributions in Portfolio .58 student learning Median portfolios .56 Supporting reading skills Assessment .53 Analyzing student work Assessment .54 Knowledge of science Assessment .62 Social studies Assessment .56 Understanding health Assessment .51 Integrating the arts Assessment .59 Median assessment center exercises   .55 24 certificates.
From page 97...
... The NBPTS estimates the overall internal-consistency reliability using an estimate developed by Cronbach and reported in Jaeger (1998) and National Board for Professional Teaching Standards (2007)
From page 98...
... A compromise approach involving the replacement of some assessment exercises by a number of shorter assessment exercises could improve internal consistency reliability without incurring much additional cost and without interfering with the relevance and representativeness of the exercises. It would not be easy to shorten or simplify the portfolios without also making them less representative of the performances of interest.
From page 99...
... The reverse is true for the middle childhood generalist, in which the reliabilities for the portfolio exercises tend to be higher than for the assessment center exercises. Estimating Decision Accuracy The accuracy with which the assessments identify which candidates should pass and which should not is at the heart of the assessment challenge for a certification program, and two types of decision errors can occur.
From page 100...
... for Early Adolescence Mathematics and Middle Childhood Generalist Average Exercises Type RXX Early adolescence mathematics     Developing and assessing mathematical thinking and Portfolio .21 reasoning Instructional analysis: whole class mathematical discourse Portfolio .14 Instructional analysis: small group mathematical Portfolio .20 collaboration Documented accomplishments: contributions to student Portfolio .17 learning Algebra and functions Assessment .48 Connections Assessment .27 Data analysis Assessment .23 Geometry Assessment .34 Number and operations sense Assessment .37 Technology and manipulatives Assessment .27 Median .25 Middle childhood generalist Writing: thinking through the process Portfolio .21 Building a classroom community through social studies Portfolio .19 Integrating mathematics with science Portfolio .21 Documented accomplishments: contributions in student Portfolio .19 learning Supporting reading skills Assessment .12 Analyzing student work Assessment .12 Knowledge of science Assessment .07 Social studies Assessment .09 Understanding health Assessment .14 Integrating the arts Assessment .14 Median   .14 not (that is, between 6 and 9 percent of 22,041)
From page 101...
... based on the internal consistency estimates would be more appropriate than the error rates based on the interrater reliability. We examined the decision accuracy specifically for the early adolescent mathematics assessments and the middle childhood generalist (see Table 5-4)
From page 102...
... , it is useful to check on the consistency with which different raters apply the scoring rubrics. Even if the assessment exercises and scoring rubrics are carefully developed and the raters are thoroughly trained, there is likely to be some variability, and this variability is likely to increase as the complexity of the exercises increases (and the NBPTS exercises call for complex performances)
From page 103...
... Over time, however, performance data on large numbers of candidates are generated and could be used to identify exercises that exhibit relatively low reliability or disparate impact. It is not clear how closely the board tracks such "item-level" data and uses them to potentially adjust either the scoring rubrics or the content of individual exercises.
From page 104...
... No results are reported in the Technical Report (National Board for Professional Teaching Standards, 2007) , but an example is available in Loyd (1995)
From page 105...
... describes this study but does not report the results, saying only that the results were satisfactory and the full reports were provided to the national board. Construct-Based Validity Evidence The board, with the assistance of its TAG, has also collected constructbased validity evidence for the NBPTS assessments.
From page 106...
... . Analysis of the student work samples showed a tendency toward more depth on the part of students taught by board-certified teachers than those taught by the unsuccessful candidates, although the differences were not statistically significant.
From page 107...
... The selection bias associated with the recruitment of nonboard-certified teachers makes it difficult to draw firm conclusions from this study. Criterion-Related Validity Evidence Criterion-related validity evidence is not typically expected for certification tests.
From page 108...
... We take it as a given that an individual who does not have such knowledge and skill should not be certified as a neurologist, and, at a more mundane level, we assume that a person who does not know what a stop sign looks like should not earn a driver's license. Given this interpretation and use of certification testing, traditional criterion-related validity evidence is not necessarily required, and one does not generally examine the validity of certification tests by correlating individual test scores with a measure of the outcomes produced by the individuals.
From page 109...
... Committee Comments The studies discussed in this chapter document efforts to validate the procedures used to identify the content standards, the extent to which assessment exercises and rubrics are consistent with the content standards and intended domain, the application of the rubrics and scoring procedures, and the extent to which teachers who become board certified demonstrate the targeted skills in their day-to-day practice. All of these studies tend to support the proposed interpretation of board certification as an indication of accomplished teaching, in that the board-certified teachers were found to be engaging in teaching activities identified as exemplary practice.
From page 110...
... With regard to the validity evidence, we draw two conclusions: Conclusion 5-5: Although content-based validity evidence is limited, our review indicates that the NBPTS assessment exercises probably reflect performance on the content standards. Conclusion 5-6: The construct-based validity evidence is derived from a set of studies with modest sample sizes, but they provide support for the proposed interpretation of national board certification as evidence of accomplished teaching.
From page 111...
... . Table 5-5 shows the effect sizes resulting from comparing performance for whites and African Americans on individual exercises on the middle childhood generalist and early adolescence mathematics assessments.
From page 112...
... . This trend also appears for the middle childhood generalist exercises, but the magnitude of the effect size difference is not as large.
From page 113...
... For the middle childhood generalist, the average overall pass rate was 35 percent across the three administration cycles. The African American and Hispanic pass rates were 12 and 21 percent, respectively, and for whites was 38 percent.
From page 114...
... The board also conducted analyses to assess the possibility that disparate impact might be a function of rater judgments and biases. Initially, they identified a small number of cases in the scoring process in which African American and white raters evaluated the performances of the same candidates.
From page 115...
... The board certification process exhibits disparate impact, particularly for African American candidates, but research suggests that this is not the result of bias in the assessments. Findings, Conclusions, AND Recommendations Our primary questions pertaining to the psychometric evaluation of the national board certification program for accomplished teachers are (a)
From page 116...
... It was difficult to obtain basic information about the design and development of the NBPTS assessments that was sufficiently detailed to allow independent evaluation. In early 2007, the NBPTS drafted a technical report in order to fill some of the information gaps, but for the program to be in compliance with professional testing standards in this regard, this material should have been readily available soon after the program became operational and should have been regularly updated (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education, 1999; Society for Industrial and Organizational Psychology, 2003)
From page 117...
... The use of portfolios and performance assessments allows the national board to focus the assessment on the competencies that they view as the core of advanced teaching practice and therefore tend to improve the validity of the assessments as a measure of these core competencies. The use of these assessments may also enhance the credibility of the assessment for various groups of stakeholders.
From page 118...
... Recommendation 5-4: The NBPTS should collect and use the available operational data about the individual assessment exercises to improve the validity and reliability of the assessments for each certificate, as well as to minimize adverse impact. Recommendation 5-5: The NBPTS should revisit the methods it uses to estimate the reliabilities of its assessments to determine whether the methods should be updated.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.