Skip to main content

Currently Skimming:

6 Evaluating Validity in the FSI Context
Pages 69-78

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 69...
... . An ongoing evaluation of the FSI test will need to consider such questions as the following: • Do the results of the assessment mean what the test designers think they mean for the context in which the assessment is used, and does the assessment function in the way they intended?
From page 70...
... This evidence should also include results from studies of fairness, including analyses to check whether test performance is biased by any factors that are irrelevant to the test's construct, such as the test-taker's ­ gender, age, race, or ethnicity. Evidence should also be collected about the consequences of the use of the test on the decisions the test is used to support, on the test takers, and on others in the Foreign Service.
From page 71...
... In the context of FSI, this claim concerns the relationships between the proficiency needed to address the kinds of language-based tasks that are on the FSI test and the proficiency needed to address the kinds of tasks that Foreign Service officers need to carry out in the target language. Performance on tasks in the assessment situation is never identical to performance in the real world.
From page 72...
... For example, a language proficiency test that focuses on making formal presentations and reading news articles will not capture the full range of linguistic resources needed for a particular job task that primarily requires social conversation and the exchange of email messages. Example Claim 2: Evaluating Task Performances to Produce Test Scores The second example claim looks at the way the performances on the test are evaluated (the second and third lower boxes and the inner ring in Figure 6-1)
From page 73...
... Studies of the scoring process using duplicate scoring by highly trained scorers might be a source of evidence about the way task performances are evaluated in the test situation. Example Claim 3: Scores and Their Interpretation The third example claim looks at the interpretations of the scores that are produced by the test (the middle two boxes in Figure 6-1)
From page 74...
... , and the interpretation suggests that test takers who receive a score of 3 or higher have adequate language proficiency to perform the tasks they will need to perform at their posts. Investigating this claim in the FSI context might involve collecting information from Foreign Service officers in the field about their ability to successfully carry out different typical tasks in the target language and comparing that information to their test scores.
From page 75...
... The committee reviewed nine sets of standards documents, paying specific attention to the guidelines most relevant to language assessment and assessment related to professional purposes. Two of these sets of standards focus specifically on language assessment: the International Language Testing Association Guidelines for Practice2 and the European Association for 2  See https://www.iltaonline.com/page/ILTAGuidelinesforPra.
From page 76...
... and the American National Standards Institute standards for personnel certification programs.5 These different standards documents lay out guidelines for identifying competencies to be assessed; developing the assessment and specific assessment tasks; field-testing assessment tasks; administering assessment tasks and scoring responses; setting the passing standard; and evaluating the reliability of the scores, the validity of interpretations based on the assessment results, and the fairness of the interpretations and uses of these results. Although the standards articulated in each document are tailored to different contexts, they address a number of common points.
From page 77...
... Test developers should maintain documenta tion about efforts to ensure that scores obtained under the speci fied test administrations procedures have the same meaning for all population groups in the intended testing population and that test takers with comparable proficiency receive comparable scores.
From page 78...
... Test developers should maintain up-to-date test-taker familiarization guides to reduce dif ferences across test takers in familiarity with the test format. These best practices provide a concrete guide for the different aspects of a testing program that should be evaluated to help ensure and establish the overall validity of the program's test results for its intended uses.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.