The National Academies Press

Currently Skimming:

3. Adapting Achievement Tests into Multiple Languages for International Assessments
Pages 58-79

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 58... ... standardizing test administration conditions (including matching motivational levels of the students taking the test in participating countries) Read the entire page →
From page 59... ... are carried out welland this usually means carrying out a combination of careful translations and reviews, conducting field tests of the adapted tests, and compiling validity evidence the results of an international comparative study such as TIMSS or OECD/PISA may be confounded by the consequences of poorly translated assessment materials. To the extent that a test adaptation changes the psychological meaning and/or test difficulty in the target languages or cultures, comparisons of student performance across language and cultural groups may have limited validity. Read the entire page →
From page 60... ... For example, in one recent study, it was necessary to ask Chinese students to "check their answers" rather than "fill in the bubbles" that appeared on the American version of the test and to change the placement of the artwork in the Chinese version of the test (Hambleton, Yu, & Slater, 1999~. Radical changes may be needed to make the item formats suitable. Read the entire page →
From page 61... ... talked about his difficulty in finding equivalent meanings for words like "cold fish" and "bleeding heart" and translating expressions such as "every cloud has a silver lining." The list of changes that are required to make a test valid in multiple languages and cultures often goes well beyond the already difficult task of translating a test. FIVE MYTHS ABOUT TEST ADAPTATIONS Hambleton and Patsula (1999) Read the entire page →
From page 62... ... Would a translator with little knowledge of test development principles know to keep answer choices of approximately the same length, so that length of answer choice does not become a clue to the correct answer? All too often in the cross-cultural literature, there is evidence of unqualified persons being involved in the test adaptation process. Read the entire page →
From page 63... ... describe numerous additional examples in which cross-cultural comparisons are flawed because of the nongeneralizability of construct definitions across cultures. Another example would be the definition, say, of the content to cover on the OECD/PISA 15-year-old assessment of mathematics achievement. Read the entire page →
From page 64... ... Field testing is not usually necessary. The cross-national testing literature includes thousands of examples of poorly adapted test items. Read the entire page →
From page 65... ... In international comparative studies, it is important to establish whether construct equivalence exists among participating countries, and if it does not, either considering "decentering" (i.e., revising the definition of the construct to be equivalent in each language and cultural group) or discontinuing the project. Read the entire page →
From page 66... ... may be the only option. But when cross-cultural comparisons are not of interest, it may be easier to actually produce a new test that meets the cultural parameters in the second-language group than to adapt an existing test that may have a number of shortcomings (e.g., a less than satisfactory definition of the construct, inappropriate item formats, an overly long test, or use of some culturally specific content) Read the entire page →
From page 67... ... After the test items are judged to be technically sound, then the equivalence of the adapted version and the original Hebrew version are compared. Translators look at several features of the adapted items: accuracy of the translation as well as clarity of the sentences, level of difficulty of the words, and fluency of the translation. Read the entire page →
From page 68... ... Not only is empirical evidence needed to support the validity of inferences from an adapted version of a test, but perhaps multiple empirical studies may be needed. A good example of what researchers might learn from a tryout of test items in a second language and culture is highlighted clearly in the papers by Allalouf and Sireci (1998) Read the entire page →
From page 69... ... The validity issue concerned whether or not the estimation items should be retained in the findings from the comparative study. The adapted test should be field tested using, whenever possible, a large sample of individuals representative of the eventual target population, and preliminary statistical analyses should be carried out, such as a reliability analysis and a classical item analysis. Read the entire page →
From page 70... ... needed to be shorter and more focused. Without specific guidelines, countries were coming up with their own guidelines for translators, but then standardization of translation procedures across participating countries was compromised. Read the entire page →
From page 71... ... Many countries found themselves translating sentences in the passive tense to the active tense. But such changes probably influenced the structure of the language in the adapted tests and may have affected the difficulty of the test items. Read the entire page →
From page 72... ... At the same time, a considerable investment in time and resources was needed to produce formally equivalent English and French versions of the test. But to do otherwise would have made it more difficult for equivalent tests to be produced in participating countries. Read the entire page →
From page 73... ... The test adaptation process is being fully documented and includes important features such as forward and backward translations, doubletranslation designs from single- and double-source-language versions of the test, national verification, and even international verification. All of these features enhance the quality of test adaptations for international comparative studies. Read the entire page →
From page 74... ... At the same time, these adapted tests will have limited value unless they are adapted with a high degree of concern for issues of usability, reliability, and validity in participating countries. There is a rapidly emerging psychometric literature on the topic of test adaptation methodology, and more advances can be expected in the coming years as researchers respond to the expanding need for adapted tests of high technical quality (see, for example, Hambleton, Merenda, & Spielberger, in press) Read the entire page →
From page 75... ... (1997~. TIMSS instrument adaptation process: A formative evaluation (Laboratory of Psychometric and Evaluative Research Rep. Read the entire page →
From page 76... ... European Journal of Psychological Assessment, 13, 29-37. van de Vijver, F Read the entire page →
From page 77... ... D.4 Test developers/publishers should provide evidence that item content and stimulus materials are familiar to all intended populations. D.5 Test developers/publishers should implement systematic judgmental evidence, both linguistic and psychological, to improve the accuracy of the adaptation process and compile evidence on the equivalence of all language versions. Read the entire page →
From page 78... ... A.3 Those aspects of the environment that influence the administration of a test should be made as similar as possible across populations for whom the test is intended. A.4 Test administration instructions should be in the source and target languages to minimize the influence of unwanted sources of variation across populations. Read the entire page →
From page 79... ... I.4 The test developer should provide specific information on the ways in which the sociocultural and ecological contexts of the populations might affect performance on the test, and should suggest procedures to account for these effects in the interpretation of results. Read the entire page →

From page 58...

... standardizing test administration conditions (including matching motivational levels of the students taking the test in participating countries)

Read the entire page →

From page 59...

... are carried out welland this usually means carrying out a combination of careful translations and reviews, conducting field tests of the adapted tests, and compiling validity evidence the results of an international comparative study such as TIMSS or OECD/PISA may be confounded by the consequences of poorly translated assessment materials. To the extent that a test adaptation changes the psychological meaning and/or test difficulty in the target languages or cultures, comparisons of student performance across language and cultural groups may have limited validity.

Read the entire page →

From page 60...

... For example, in one recent study, it was necessary to ask Chinese students to "check their answers" rather than "fill in the bubbles" that appeared on the American version of the test and to change the placement of the artwork in the Chinese version of the test (Hambleton, Yu, & Slater, 1999~. Radical changes may be needed to make the item formats suitable.

Read the entire page →

From page 61...

... talked about his difficulty in finding equivalent meanings for words like "cold fish" and "bleeding heart" and translating expressions such as "every cloud has a silver lining." The list of changes that are required to make a test valid in multiple languages and cultures often goes well beyond the already difficult task of translating a test. FIVE MYTHS ABOUT TEST ADAPTATIONS Hambleton and Patsula (1999)

Read the entire page →

From page 62...

... Would a translator with little knowledge of test development principles know to keep answer choices of approximately the same length, so that length of answer choice does not become a clue to the correct answer? All too often in the cross-cultural literature, there is evidence of unqualified persons being involved in the test adaptation process.

Read the entire page →

From page 63...

... describe numerous additional examples in which cross-cultural comparisons are flawed because of the nongeneralizability of construct definitions across cultures. Another example would be the definition, say, of the content to cover on the OECD/PISA 15-year-old assessment of mathematics achievement.

Read the entire page →

From page 64...

... Field testing is not usually necessary. The cross-national testing literature includes thousands of examples of poorly adapted test items.

Read the entire page →

From page 65...

... In international comparative studies, it is important to establish whether construct equivalence exists among participating countries, and if it does not, either considering "decentering" (i.e., revising the definition of the construct to be equivalent in each language and cultural group) or discontinuing the project.

Read the entire page →

From page 66...

... may be the only option. But when cross-cultural comparisons are not of interest, it may be easier to actually produce a new test that meets the cultural parameters in the second-language group than to adapt an existing test that may have a number of shortcomings (e.g., a less than satisfactory definition of the construct, inappropriate item formats, an overly long test, or use of some culturally specific content)

Read the entire page →

From page 67...

... After the test items are judged to be technically sound, then the equivalence of the adapted version and the original Hebrew version are compared. Translators look at several features of the adapted items: accuracy of the translation as well as clarity of the sentences, level of difficulty of the words, and fluency of the translation.

Read the entire page →

From page 68...

... Not only is empirical evidence needed to support the validity of inferences from an adapted version of a test, but perhaps multiple empirical studies may be needed. A good example of what researchers might learn from a tryout of test items in a second language and culture is highlighted clearly in the papers by Allalouf and Sireci (1998)

Read the entire page →

From page 69...

... The validity issue concerned whether or not the estimation items should be retained in the findings from the comparative study. The adapted test should be field tested using, whenever possible, a large sample of individuals representative of the eventual target population, and preliminary statistical analyses should be carried out, such as a reliability analysis and a classical item analysis.

Read the entire page →

From page 70...

... needed to be shorter and more focused. Without specific guidelines, countries were coming up with their own guidelines for translators, but then standardization of translation procedures across participating countries was compromised.

Read the entire page →

From page 71...

... Many countries found themselves translating sentences in the passive tense to the active tense. But such changes probably influenced the structure of the language in the adapted tests and may have affected the difficulty of the test items.

Read the entire page →

From page 72...

... At the same time, a considerable investment in time and resources was needed to produce formally equivalent English and French versions of the test. But to do otherwise would have made it more difficult for equivalent tests to be produced in participating countries.

Read the entire page →

From page 73...

... The test adaptation process is being fully documented and includes important features such as forward and backward translations, doubletranslation designs from single- and double-source-language versions of the test, national verification, and even international verification. All of these features enhance the quality of test adaptations for international comparative studies.

Read the entire page →

From page 74...

... At the same time, these adapted tests will have limited value unless they are adapted with a high degree of concern for issues of usability, reliability, and validity in participating countries. There is a rapidly emerging psychometric literature on the topic of test adaptation methodology, and more advances can be expected in the coming years as researchers respond to the expanding need for adapted tests of high technical quality (see, for example, Hambleton, Merenda, & Spielberger, in press)

Read the entire page →

From page 75...

... (1997~. TIMSS instrument adaptation process: A formative evaluation (Laboratory of Psychometric and Evaluative Research Rep.

Read the entire page →

From page 76...

... European Journal of Psychological Assessment, 13, 29-37. van de Vijver, F

Read the entire page →

From page 77...

... D.4 Test developers/publishers should provide evidence that item content and stimulus materials are familiar to all intended populations. D.5 Test developers/publishers should implement systematic judgmental evidence, both linguistic and psychological, to improve the accuracy of the adaptation process and compile evidence on the equivalence of all language versions.

Read the entire page →

From page 78...

... A.3 Those aspects of the environment that influence the administration of a test should be made as similar as possible across populations for whom the test is intended. A.4 Test administration instructions should be in the source and target languages to minimize the influence of unwanted sources of variation across populations.

Read the entire page →

From page 79...

... I.4 The test developer should provide specific information on the ways in which the sociocultural and ecological contexts of the populations might affect performance on the test, and should suggest procedures to account for these effects in the interpretation of results.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

3. Adapting Achievement Tests into Multiple Languages for International Assessments Pages 58-79

3. Adapting Achievement Tests into Multiple Languages for International Assessments
Pages 58-79