Skip to main content

Currently Skimming:

3 Recent Innovative Assessments
Pages 33-42

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 33...
... Brian Stecher and Laura Hamilton provided an overview of both current innovations and those that have not continued, and a panel of people connected with the programs offered their comments. Stecher pointed out that the sort of test that is currently typical -- multiple choice, paper and pencil -- was innovative when it was introduced on a large scale in the early 20th century, but is now precisely the sort that innovators want to replace.
From page 34...
... . After efforts to standardize scoring rubrics and selection criteria, the reliability improved, but evaluators concluded that the scores were not accurate enough to support judgments about school quality.
From page 35...
... Many performance assessments asked students to work both in groups and individually to solve problems and to use manipulatives in hands-on tasks. KIRIS included locally scored portfolios in writing and mathematics.
From page 36...
... WASL produced individual scores and was used to evaluate schools and districts; it was also expected to have a positive influence on instruction. Evaluations of WASL found that it met accepted standards for technical quality.
From page 37...
... Willhoft said that the initial test development contract was very inexpensive, considering the nature of the task, but when the contract was rebid costs escalated dramatically. And then, as public opinion was turning increasingly negative about the program, the policy makers who had initially sponsored it and worked to build consensus in its favor were leaving office, because of the state's term limit law, so there were few political supporters to defend the program when it was chal lenged.
From page 38...
... Had developers and policy makers moved more slowly and spent longer on pilot testing and refining, it might have been possible to iron out many of problems with scoring, reporting, reliability, and other complex elements of the assessments. Moreover, he noted that many of the states pushed forward with bold changes without necessarily having a firm scientific foundation for what they wanted to do.
From page 39...
... has a clinical skills component in which prospective physicians interact with patients who are trained participants. The trained patient presents a standardized set of symptoms so that candidates' capacity to collect information, perform physical examinations, and communicate their findings to patients and colleagues can be assessed.
From page 40...
... The Queensland program and the essay portion of the bar exams administered by states both involve local educators or other local officials in task selection and scoring, and this may limit the comparability of scores. When the stakes attached to the results are high, centralized task selection and scoring may be preferred, but at the cost of not involving teachers and affecting instruction.
From page 41...
... Automated scoring systems have been developed with various methodologies, but the software generally evaluates human-scored essays and identifies a set of criteria and weights that can predict the human-assigned scores. The resulting criteria are not the same as those a human would use: for example, essay length, which correlates with other desirable essay traits, would not itself be valued by a human scorer.
From page 42...
...  STATE ASSESSMENT SYSTEMS can provide an effective means of integrating classroom-based and large-scale assessment. A few issues are relevant across these technologies, Hamilton noted.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.