Skip to main content

Currently Skimming:

2. The Measurement of Student Achievement in International Studies
Pages 25-57

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 25...
... Part ~ Study Design
From page 27...
... The more specific the purpose, the more homogeneous the population of students, and the narrower the domain of measurement, however, the easier is the task of developing measures that will yield results that support valid interpretations and uses. A teacher who prepares an end-of-unit test in algebra faces a task with a fairly clearly defined content domain and knows a great deal about the common experiences of the students who will take the test.
From page 28...
... An immediately apparent complication is that assessments have to be translated into the multiple languages of participating countries. Variations among countries in educational systems, cultures, and traditions of assessment add to the complexity of the problems of international assessments.
From page 29...
... Hence, the domains defined for international assessments have negotiated limits that fall between the extremes. Once the boundaries have been agreed on, questions remain about the relative emphasis to be given to topics within the domain, about the relative importance of different levels of cognitive demands of the assessment tasks within each topic, about the length of the assessment, and about the mix of item types.
From page 30...
... In the latter instance, they are critical to the evaluation of a test for its appropriateness for a specific applied purpose. Details of the approaches used to develop specifications for the assessments have varied somewhat in previous international assessments, but the general nature of the approaches have had a great deal in common.
From page 31...
... In the Second International Mathematics Study (SIMS) , the main emphasis continued to be placed on content categories, but there was substantially greater involvement of mathematics educators and much greater salience was given to the mathematics curricula of the participating countries.
From page 32...
... Such variation in grain size can result in disproportionate numbers of items for some clusters of topics relative to the intended emphasis in relation to the whole content domain." The content domains for the international studies have been defined in practice to be somewhere between the intersection and the union of the content domains covered by the curricula of the participating countries, but are closer to the intersection than the union. Because of the promi
From page 33...
... COGNITIVE PROCESSES As noted, the content dimension of test specification tables has been primary in international assessments. The second dimension of the framework or table of specifications for the assessments generally has focused on the cognitive processes those items or assessment tasks are intended to measure.
From page 34...
... Measuring such higher order cognitive processes and achievement outcomes is more challenging than measuring factual knowledge and skills at applying routine algorithms. The fact that nearly any test development effort that solicits items from a broad range of subject-matter experts, as has been done in the IEA studies, will find an overabundance of items is symptomatic of the greater difficulty in writing items that will tap the higher level problem solving, analysis, explanation, and interpretation skills sought for the assessments.
From page 35...
... a fraction of the limited time that was available. ITEM FORMATS The criticism of international assessment on the grounds that they assess only relatively low-level cognitive processes reflects, in part, the difficulty of writing items that tap higher level skills and understanding.
From page 36...
... Students in the two grades where most students of the target age were enrolled were assessed for the two younger populations. Students in the last year of secondary school were broken down into a mathematics and science literacy subpopulation and subpopulations of students taking advanced mathematics and taking an advanced physics course.
From page 37...
... Finding a good balance between the needs for efficiency and the desire to measure a full range of cognitive processes poses a continuing challenge for international assessments. CURRICULUM AND INSTRUCTIONAL MATERIALS AND DEFINITION OF TEST CONTENT One of the prominent features of international studies of achievement conducted under the auspices of the IEA has been the emphasis on both
From page 38...
... 49) as follows: The tests were constructed on the basis of the common intended curriculum in all of the participating countries.
From page 39...
... Considerable effort was required to negotiate refinements in the framework, where it could be used as a detailed table of specifications for the assessment, giving topics and intended cognitive processes to be measured as well as numbers of items in different categories to reflect negotiated agreements of coverage and emphasis. Important distinctions have been made in the IEA studies between the explicit and implicit goals of a nation's curriculum, known as the "intended curriculum," and the content that is actually taught, known as the "implemented curriculum" (e.g., Schmidt & McKnight, 1995~.
From page 40...
... Although relatively large pools of items can be assembled drawing on previous international assessments and on items contributed by participating nations based on their own national assessments or written specifically for the international assessment, the quality of the items and the distribution relative to the requirements of the specifications are more problematic. SISS sent a matrix of science topics by teaching objectives to national centers with a request for items to measure the cells of the matrix.
From page 41...
... Although these considerations may have enhanced the relative performance of students in North America, it is impossible to know how much, if any, difference this potential bias had on the actual results of any of the international assessments. Certainly the review and approval of the content domains and the assessment specifications by participating countries were intended to minimize any such bias.
From page 42...
... TRANSLATION Before test items for an international assessment can be evaluated by representatives of participating countries, much less be field tested, they must be translated from the language in which the item was originally written into all the languages needed for use of the assessment in the participating countries. Test translation is a demanding enterprise.
From page 43...
... In a similar vein, items that do not discriminate or that have negative discrimination will not make useful contributions to the measurement on the main dimensions used for reporting results. Consequently, the more recent international studies generally have established certain guidelines for field-test item statistics.
From page 44...
... Of course, as has been noted previously, the item sets have been criticized for not doing a better job of measuring higher cognitive processes, and for limitations of coverage of the curricula of participating countries. A few problematic items also have been identified that count as correct responses that are either incorrect or not as good as an alternative response that is treated as incorrect.
From page 45...
... The most recent studies conducted by IAEP and IEA also have included some statistical analyses of the item responses as a means of flagging items that might be judged to be problematic. The IAEP studies included differential item functioning (DIF)
From page 46...
... TIMSS computed within-country item statistics of various kinds and used them to identify items that might be problematic for particular countries. In addition to the usual within-country difficulty and discrimination statistics, multiple-choice items were flagged if an incorrect option had positive point-biserial correlations with the total score, or if an item had a poor Rasch fit statistic.
From page 47...
... The 70 items for the 14-year-olds were divided into a core of 30 items administered to all students and four rotated forms of ten items each. For students in the last year of secondary school, the items were distinguished by subject area (biology, chemistry, or physics)
From page 48...
... The IEA and IAEP studies have made effective use of matrix sampling designs to allow for a broader coverage of content domains than otherwise would have been possible. SUMMARY SCORES Results of early international assessments were commonly reported in terms of total number of correct scores or average percentage of correct scores.
From page 49...
... Reporting results for the full assessment, including rotated forms, was more complicated and involved forms of linking rotated forms that were less theory based (see, for example, Miller & Linn, 1989) and somewhat problematic because the forms were not comparable in difficulty or content coverage.
From page 50...
... The curricula and the inter-related practices of teaching, learning, and testing to which the pupils are accustomed are of equal, arguably greater, importance. MULTIPLE SCORES As has been discussed, the international studies of achievement conducted by IEA traditionally have placed considerable emphasis on issues of curriculum differences and student opportunity to learn.
From page 51...
... "[T] he selection of items for the participating countries varied somewhat in average difficulty, ranging from 55-59 percent correct at the eighth grade and from 49-56 percent at the seventh grade.
From page 52...
... Starting from scratch, the FIMS and FISS accomplished remarkable feats to assemble a pool of items that passed muster with national committees of reviewers, and produced reliable measures covering relatively broad content domains. Certainly, those studies were subjected to considerable criticism, mostly about issues other than the quality of the measures, such as the comparability of populations of students in different countries that retained widely variable fractions of the age cohorts, and
From page 53...
... There were also, however, some criticisms of the quality of the tests, particularly their relevance to the curricula of different countries and the heavy reliance on multiplechoice items. The second round of studies made substantial strides to improve the alignment of the tests to the curricula of the participating countries, and went to great lengths to get information about the intended curriculum in each country and to develop measures of opportunity to learn so that the implemented curriculum could be related to the attained curriculum as measured by the SIMS and SISS assessments.
From page 54...
... . Mathematics achievement in the middle school years: IEA's Third International Mathematics and Science Study (TIMSS)
From page 55...
... , Third International Mathematics and Science Study (TIMSS) technical report, Vol.
From page 56...
... , Third International Mathematics and Science Study: Quality assurance in data collection (Chapter 5, pp.
From page 57...
... performance on international assessments of education. Educational Researcher, 23~7)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.