The National Academies Press

Currently Skimming:

Appendix 10. An Experiment in Evaluating the Quality of Translations
Pages 67-75

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 67... ... Conversely, a translation could be highly accurate and yet lacking in intelligibility; this would be likely to occur, however, only in cases where the original had low intelligibility. Essentially, the method for evaluating translations employed in this experiment involved obtaining subjective ratings for these two characteristics�intelligibility and fidelity�of sentences selected * Read the entire page →
From page 68... ... In rating "informativeness" these raters were provided with carefully prepared English translations of the original sentences, so that in effect they were comparing two sentences in English�one the sentence from the translation being evaluated, and the other the carefully prepared translation of the original. The second set of raters ("bilinguals") Read the entire page →
From page 69... ... : "The general idea is intelligible only after considerable study, but after this study one is fairly confident that he understands. Poor word choice, grotesque syntactic arrangement, untranslated words, and similar phenomena are present, but constitute mainly 'noise' through which the main idea is still perceptible .'t PREPARATION OF TEST MATERIALS AND COLLECTION OF DATA The measurement procedure was tested by applying it to six varied English translations�three human and three mechanical� 69 Read the entire page →
From page 70... ... These translations were of five passages varying considerably in type of content. (All the passages selected for this experiment, with the original Russian versions, have now been published by the Office of Technical Services, U.S. Read the entire page →
From page 71... ... Table 6 gives the over-all mean ratings and time scores for the six translations, arranged in order of general excellence according to our data. Consider first the mean ratings for intelligibility by the monolinguals. Read the entire page →
From page 72... ... As machine translations improve, it should be possible to scale them by the present rating procedure to determine how nearly they approach human translations in intelligibility. The monolinguals' mean ratings on "informativeness" (reflecting the lack of fidelity of the translations) Read the entire page →
From page 73... ... In fact, as may be seen from Figure I, there is some overlap between sentences from human translations and from mechanical translations; or, in other words, there are some sentences translated by machine that have higher ratings than some other sentences translated by human translators, even though, on the average, the humantranslated sentences are better than the machine-translated ones. These results imply that in order to obtain reliable mean ratings for translations, a fairly large sample of sentences must be rated. Read the entire page →
From page 74... ... 40n 302010TRANSLATION 1 5 6 7 ~ 9 0- 1 2 3 4 4O 30 20 10TRANSLATION 4 O 1 2 3020z LL LL CY lo 20, 10nl 1 203 4 TRANSLATION 2 10 0 1 2 3 4 8 ~ ~ ~ 7 8 ~ TRANS LAT ION 7 0 1 2 , ~ ~ ~ , ~ 9 20 10TRANSLATION 5 1 2 4 RANSLATION 9 ~ 4 ~ 6 7 MEAN INTELLIGIBILITY RATING 8 9 FIGURE 1. Frequency distribution of monolinguals' mean intelligibility ratings of the 144 sentences in each of six translations. Read the entire page →

From page 67...

... Conversely, a translation could be highly accurate and yet lacking in intelligibility; this would be likely to occur, however, only in cases where the original had low intelligibility. Essentially, the method for evaluating translations employed in this experiment involved obtaining subjective ratings for these two characteristics�intelligibility and fidelity�of sentences selected *

Read the entire page →

From page 68...

... In rating "informativeness" these raters were provided with carefully prepared English translations of the original sentences, so that in effect they were comparing two sentences in English�one the sentence from the translation being evaluated, and the other the carefully prepared translation of the original. The second set of raters ("bilinguals")

Read the entire page →

From page 69...

... : "The general idea is intelligible only after considerable study, but after this study one is fairly confident that he understands. Poor word choice, grotesque syntactic arrangement, untranslated words, and similar phenomena are present, but constitute mainly 'noise' through which the main idea is still perceptible .'t PREPARATION OF TEST MATERIALS AND COLLECTION OF DATA The measurement procedure was tested by applying it to six varied English translations�three human and three mechanical� 69

Read the entire page →

From page 70...

... These translations were of five passages varying considerably in type of content. (All the passages selected for this experiment, with the original Russian versions, have now been published by the Office of Technical Services, U.S.

Read the entire page →

From page 71...

... Table 6 gives the over-all mean ratings and time scores for the six translations, arranged in order of general excellence according to our data. Consider first the mean ratings for intelligibility by the monolinguals.

Read the entire page →

From page 72...

... As machine translations improve, it should be possible to scale them by the present rating procedure to determine how nearly they approach human translations in intelligibility. The monolinguals' mean ratings on "informativeness" (reflecting the lack of fidelity of the translations)

Read the entire page →

From page 73...

... In fact, as may be seen from Figure I, there is some overlap between sentences from human translations and from mechanical translations; or, in other words, there are some sentences translated by machine that have higher ratings than some other sentences translated by human translators, even though, on the average, the humantranslated sentences are better than the machine-translated ones. These results imply that in order to obtain reliable mean ratings for translations, a fairly large sample of sentences must be rated.

Read the entire page →

From page 74...

... 40n 302010TRANSLATION 1 5 6 7 ~ 9 0- 1 2 3 4 4O 30 20 10TRANSLATION 4 O 1 2 3020z LL LL CY lo 20, 10nl 1 203 4 TRANSLATION 2 10 0 1 2 3 4 8 ~ ~ ~ 7 8 ~ TRANS LAT ION 7 0 1 2 , ~ ~ ~ , ~ 9 20 10TRANSLATION 5 1 2 4 RANSLATION 9 ~ 4 ~ 6 7 MEAN INTELLIGIBILITY RATING 8 9 FIGURE 1. Frequency distribution of monolinguals' mean intelligibility ratings of the 144 sentences in each of six translations.

Read the entire page →

← Previous Chapter Skim

Next Chapter Skim →

This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.

Appendix 10. An Experiment in Evaluating the Quality of Translations Pages 67-75

Appendix 10. An Experiment in Evaluating the Quality of Translations
Pages 67-75