Skip to main content

Currently Skimming:

Commissioned Papers
Pages 147-224

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 149...
... As the school population increased 15 percent from 1960 to 1989, revenues from the sales of standardized tests increased 10 times as fast. More than a third of the elementary school teachers in a recent surveys saw the emphasis on standardized testing in U.S.
From page 150...
... Because most of the standardized tests used in mandated testing programs are of the multiple-choice variety, particular attention has been given to the argument that these tests promote a narrow approach to teaching, passive and low-level forms of learning, and a fragmented school curriculum. The amount of available research to address these concerns and arguments, however, is quite sparse.
From page 151...
... When asked what impact standardized tests had upon their classroom behavior, the most common response was that they changed their curriculum emphasis. Some teachers reported that the emphasis on standard EFFECTS OF MANDATED TESTING 151
From page 152...
... The Kansas teachers also reported that the state minimum competency testing reduced the time they spent teaching skills that the tests did not cover. Smith and Rottenbergi3 interviewed 19 elementary school teachers and then observed the classes of four of these teachers for an entire semester, during which externally mandated tests were administered.
From page 153...
... In interviews conducted by Darling-Hammond and Wise, teachers typically reported that when tests are used to measure teacher effectiveness or student competence, incentives are created to teach the precise test content instead of underlying concepts or untested content. Corbett and Wilson'9 studied the effects of state-mandated minimum competency testing programs in Maryland and Pennsylvania.
From page 154...
... Three quarters of the teachers of high-minority classes agreed that they felt pressure from their districts to improve their students' scores on mandated mathematics tests. Asked about the influences that mandated standardized tests have on their instructional practice, teachers of high-minority classes indicated stronger curriculum effects than did teachers of low-minority classes.
From page 155...
... Asked for the single most positive contribution that testing makes in their school, they most often cited the increased time spent teaching basic skills. Corbett and Wilson25 found that 85 percent of the Maryland educators and 30 percent of the Pennsylvania educators surveyed perceived at least a moderate spread of basic skills instruction throughout the curriculum as a result of their state minimum competency testing programs.
From page 156...
... For example, Freeman et al.37 conducted year-long case studies of several fourth-grade teachers to analyze their styles of textbook use and to determine how the different styles affected content overlap between the mathematics textbook used and five standardized tests of fourthgrade mathematics. The researchers defined five models of textbook use on the basis of their classroom observations.
From page 157...
... Only 16 percent of those same Kansas teachers had seen indications that the school curriculum was being narrowed as a result of the state minimum competency tests.40 In fact, according to Stake and Theobold,4i 199 out of 285 teachers surveyed reported that a general broadening of the curriculum had taken place in their schools over the last few years. Perhaps these differences in perspective can be attributed to the different ways in which researchers and teachers interpret "narrowing".
From page 158...
... A similar phenomenon affects the use of tech nology in teaching when such technology is not incorporated into mandated testing programs. Whereas 5 percent of the eighth-grade teachers reported increasing their emphasis on calculator activities, 20 percent reported a decreased emphasis.48 Only 2 percent reported an increased emphasis on computer activities whereas 16 percent reported a decrease in computer activities.
From page 159...
... Also, how strongly a mandated testing program influences instructional methods may, as with instructional content, be a function of the test stakes. The Kansas minimum competency test was a low-stakes test.50 When Corbett and Wilson5' asked whether teachers had adopted new instructional approaches as a result of the state minimum competency testing program, they found dramatically different responses in high-stakes Maryland compared with low-stakes New jersey.
From page 160...
... Three recent research studies have asked teachers about the extent of their test preparation practices. Smith and Rottenberg58 report that in the four elementary classes they observed, an average of 54 hours of class time was spent preparing for externally mandated standardized tests, which, in addition, required about 18 hours to administer.
From page 161...
... These educators were from districts across the country where standardized, groupadministered, norm-referenced tests were given. Of the 176 teachers surveyed, 55 percent reported using some type of test EFFECTS OF MANDATED TESTING 16
From page 162...
... Research suggests, however, that there is a move toward test preparation practices that are debatable at best, from the standpoint of both professional ethics and mathematics education. CLASSROOM ASSESSMENT PRACTICES The question of whether externally mandated testing programs affect a teacher's own assessment practices has been considered by a few researchers.
From page 163...
... According to Glasnapp et al.,76 62 percent of the Kansas teachers surveyed in 1987 either agreed or strongly agreed that the pressure on local districts to perform well on the state minimum competency tests led to undesirable educational practices. In the same survey, however, 60 percent of the teachers indicated that they considered the Kansas minimum competency testing, overall, to be beneficial to education in the state.
From page 164...
... Teachers committed to developing a new school curriculum at an elementary school observed by Livingston et al.~° and at a magnet high school observed by McNeil 8' had a common reaction to the demands of mandated testing. Livingston et al.82 reported the experiences of Westwood School, a K-2 school in Dalton, Georgia, where the teachers undertook a revision of the state mathematics curriculum.
From page 165...
... The Westwood School curriculum committee members contended that discrepancies between teachers' judgments and students' test scores lead to a deprofessionalization of teachers because of the view by parents that test scores are absolute indicators of students' learning.86 Of the Maryland teachers surveyed by Corbett and Wilson,87 58 percent reported that mandated testing has led to at least a moderate decrease in professional judgment in instructional matters. Included in the survey instrument were questions on the effects of mandated testing on teachers' work life: 70 percent of the respondents reported a major increase in demands on their time, 66 percent a major increase in paperwork, 64 percent a major increase in pressure for student performance, 55 percent at least moderate changes in staff reassignment, and 44 percent at least a moderate increase in worry about lawsuits.
From page 166...
... They note that test scores were the single most important factor used in the decision to place children into gifted programs and into an advanced junior high school curriculum. In their study, Romberg et al.93 report that 35 percent of the eighthgrade teachers surveyed indicate that district-mandated tests influence decisions about grouping students within the class for instruction, and 62 percent say the district test scores influence recommendations of students for course or program assignments.
From page 167...
... When the demands of mandated testing programs conflict with practices they deem more appropriate, however, they tend not to challenge these programs. Rather, they seek a middle ground in which they strive to meet both the demands of the testing program and their own view of what and how they should be teaching.
From page 168...
... As Silveri°° observed, "perhaps WYTIWYG should be more accurately dubbed WYGIWICT what you get is what I can teach."~°' Silver's point becomes especially important as some mandated testing programs change to incorporate reforms sought in such documents as the NCTM Standards. As these programs incorporate items with extended answers, calling upon students to |68 M E A S U R I N G W H A T C O U N T S
From page 169...
... The limitations on their ability and their willingness to teach in the ways sought by reformers will then begin to govern how the mandated testing affects their instruction. We may begin to see some teachers challenging or attempting to subvert a system of assessment that suits neither their teaching style nor their beliefs about essential mathematics content.
From page 170...
... D Miller, "Impact of a 'low stakes' state minimum competency testing program on policy, attitudes, and achievement," in Advances in program evaluation: Effects of mandated assessment on teaching, ed.
From page 171...
... 2~3 "Impact of 'low stakes' testing." 29 Mandated testing. 30 National Council of Teachers of Mathematics, Curriculum and evaluation standards for school mathematics, (Reston, VA: Author, 1989~.
From page 172...
... R Schwille, "The influence of different styles of textbook use on instructional validity of standardized tests," Journal of Educational Measurement, 20, ~ 1 983)
From page 173...
... 72 Mathematical Sciences Education Board and Board on Mathematical Sciences, National Research Council, Everybody counts: A report to the nation on the future of mathematics education, (Washington, DC: National Research Council, 1989~; Jean Kerr Stenmark, ea., Mathematics assessment. Myths, models, good questions, and practical suggestions, (Reston, VA: National Council of Teachers of Mathematics, 1991~.
From page 174...
... Silver, "Assessment and mathematics education reform in the United States," International Journal for Educational Research, 1 7, ( 1992)
From page 175...
... In a retrospective commentary on that movement and the uses and abuses of examinations in the pursuit of the educational reform efforts of that movement, McConni described the avowed purpose of nearly all achievement testing at the time as ensuring the maintenance of standards, including, as already noted, the enforcement of both prescribed subject matter and of some more or less definitely envisaged degree of attainment. If one is to raise any objections here, he must tread softly, because he is approaching what is to many educators in service, especially many of the older ones, the Ark of the Covenant.
From page 176...
... Mathematics instruction is presently seen as a principal vehicle through which American schools will prepare students in this domain, so it is clearly appropriate to consider the role of new tests in enhancing mathematics education. It is equally important to recognize the possible contradiction between the ideals of diversified approaches to assessment on the one hand and the specification of uniform standards of achievement for all students on the other History does tell us that the primacy of the latter can completely undermine the anticipated benefits of the former, and today's rhetoric on standards is characterized by a uniformity of goals of instruction, albeit a well-intentioned uniformity.
From page 177...
... . CONTENT CONSIDERATIONS FOR MATHEMATICS A C H ~ E V E M E N T By and large, the data available from large-scale, performance-based assessments of educational achievement come from operational assessment programs in the area of direct writing D E S I G N I N N O V A T I O N S
From page 178...
... Over time, the content domains sampled in direct writing assessments have become organized around traditional rhetorical modes of discourse, and the content specifications for the development of writing tasks and scoring protocols in many testing programs reflect an evolved conception of domains to be sampled.7 In considering anticipated features of innovative assessments in mathematics, the definitions of content domains should be care fully evaluated. Traditionally, mathematics has been regarded by test developers as an area in which substantial consensus existed with regard to content and the sequencing of subject matter.
From page 179...
... As was done originally in the development of the Writing Supplement for the Iowa Tests of Basic Skills, the QUASAR mathematics assessment developed a focused-holistic scoring protocols' for each task. The scoring protocols were organized with respect to three criteria for evaluat ing responses: mathematical knowledge, strategic knowledge, and communication.
From page 180...
... Depending on how domain definitions from the NCTM Standards are made operational, this component of variance may represent a confounding factor in the use of results from extended samples of performance on complex assessment tasks. Whether or not it is considered a confounding factor, the variance associated with verbal aspects of the responses to mathematical problems is likely to loom larger than it has in more traditional approaches to measuring mathematics achievement.
From page 181...
... The development of standards in mathematics, as well as in other parts of the school curriculum, presents a new challenge to the developers of achievement measures with respect to content quality and cognitive complexity, two aspects of the validity question discussed by Linn et al.~9 All major test publishers are presently engaged in efforts to revise instruments so that their content is more closely aligned with the NCTM Standards. The methods used to ascertain alignment typically involve the review of test materials by specialists in mathematics education and the classification of items according to the explicit statements of mathematics objectives.
From page 182...
... The empirical evidence gathered so far indicates that the judgments of content experts may not be highly reliable.20 Data that are available from content classifications of traditional test items raise questions about the fidelity of expert judges in evaluating test content. Comparisons of recent evaluations of the content of standardized achievement tests in mathematics2' with the content specifications supplied by developers (typically determined by subject matter experts" formal analysis of content and process required to obtain correct solutions)
From page 183...
... Glaser et al.24 selected several science performance assessment tasks for examination via student protocol analysis, including extended interviews with subjects participating in these assessments. Such analyses aim to reveal the degree of correspondence between the cognitive processes and skills the tasks were intended to measure and those actually elicited.
From page 184...
... Still another may have been so skilled or knowledgeable that the problem was solvable almost instantaneously without any conscious awareness of the cognitive steps involved. Experts are often less able than novices to provide detailed descriptions of their problem-solving activities.30 These concerns about the interpretability of a think-aloud protocol raise similar questions about the interpretability of written responses to probes about solution strategies in an operational assessment program.
From page 185...
... 3i Teaching to the test poses difficulties for score interpretation not just because it compromises normative information that accompanies most standardized tests. It is a practice that challenges the validity of test scores as indicators of the achievement domain sampled during test construction and has been shown in high-stakes situations to distort inferences to that domain.32 In evaluating novel approaches to assessment in mathematics, the generalizability of scores over raters, tasks, formats, and even subdomains has received considerable attention.
From page 186...
... Further, Koretz et al. indicate that the maximum boost to rater reliability that could be achieved with the data collected for the statewide assessment, obtained by aggregating over both portfolio entries and criterion scales, was only MEASURING WHAT COUNTS
From page 187...
... The authors argue that explaining low rater reliability in terms of statistical artifacts such as attenuation due to range restriction does not answer the more important question of what caused the reduction in variability in the first place. Although early studies of performance-based assessment concentrated on raters in the estimation of components of score variance, recent studies of the generalizability of extended responses to complex tasks also have raised fundamental questions about the behavior of examinees during the response process.
From page 188...
... Despite the large variance component due to person-by-task interaction, the overall generalizability coefficients for the nine tasks included on a given form of the assessment were in the .7 to .8 range. These values are markedly higher that many generalizability coefficients reported in the performance assessment literature, and they suggest a principle for performance assessment design that has long been recognized in the development of conventional achievement tests, namely that high levels of person-by-task interaction can be tolerated as long as the number of tasks (items)
From page 189...
... , but it would do so at consider able cost."45 Another example of the use of open-ended tasks in largescale mathematics assessment provides another perspective for understanding the nature of information about achievement in mathematics that is obtained by innovative approaches. Stevenson et al.46 describe the characteristics of open-ended geometry proofs administered to more than 43,000 high school students in North Carolina.
From page 190...
... Given the results of the North Carolina study, a relevant question for the development of mathematics performance tasks and rating scales might be phrased as follows: When the factors that produce lack of generalizability on complex tasks in mathematics are controlled to a degree deemed necessary for large-scale applications, will the constructs measured by the tasks rank-order examinees any differently than would a conventional test of related mathematics skills? The influence of problem format on variability in responses is an obvious consideration in understanding the levels of score reliability or generalizability that have been observed in open-ended assessments of mathematics achievement.
From page 191...
... Webb and Yasui demonstrate the substantive importance of understanding the reasons for lack of generalizability in the context of extended samples of student performance in mathematics. In so doing they reveal an important connection in the validity argument between generatizability analyses and construct interpretations of the results of a mathematics assessment.
From page 192...
... Unfortunately, there is limited empirical evidence from experimental measures in such domains of the kinds of generalizability that might be expected and, hence, little empirical basis for recommendations concerning the number of tasks that might be necessary for a given use of results. What is known about content sampling from the standpoint of conventional achievement tests provides clear evidence that the meaning of a test score can be quite easily manipulated by purposeful selection of items to match the objectives of a local curriculum or policy initiative.s2 Whenever there is a general concern about the sampling of tasks, there is a concomitant concern over the possibility that influences on task performance will be concentrated in subpopulations of examinees-subpopulations differentiated by race, gender, or some other correlate of opportunity to learn.53 On the subject of differential functioning of test questions by group, some specialists go so far as to argue there is no such thing as an unbiased item; rather, the responsibility of test developers is to ensure that content domains are sampled in such a way to balance out the bias, that is, to include enough variety in stimulus materials and balance in content that the assessment as a whole does not systematically favor one group over another.
From page 193...
... Generally speaking, the kinds of assessments that are currently being proposed in the context of educational reform efforts can only be linked across sites through some form of calibration, but calibration by professional judgment. One empirical example of an attempt to link direct writing assessments across states was described by Linn et al.58 essays written by students from one state were evaluated with the scoring protocols from another state.
From page 194...
... Consensus among experts has often proved more difficult to attain than one might expect. Evidence of content validity provided by professional judgments needs to be supplemented with empirical evidence of cognitive validity.
From page 195...
... Students may vary greatly in their performance on mathematics assessments depending on the particular tasks by which they are tested. Apparently some aspects of performance in mathematics are highly subdomain-specific or require specified knowledge above and beyond transferable skills.
From page 196...
... According to the broad brush of optimism, by setting standards high and holding all students to them, leaving none behind in Hamlin, educational leaders expect to see tomorrow's students stride into adulthood fully prepared for the demands of life and work in the twenty-first century. As the last century saw the rise of new assessment procedures to support the maintenance of the educational standards of that time, so now a proliferation of new, innovative assessments is already arising to measure progress in meeting today's new stan .
From page 197...
... '8 National Council on Education Standards and Testing [NCEST] , Raising standards for American education, (Washington, DC: United States Congress, 1 992~.
From page 198...
... Wang, "Validity evidence for cognitive complexity of performance assessments: An analysis of selected QUASAR tasks," International Journal of Educational Research, in press. 24 Cognitive theory 25 G
From page 199...
... Liu, "Empirical evidence for the reliability and validity of performance assessments," International Journal of Educational Research, in press)
From page 200...
... 59 Cf. National Council on Education Standards and Testing, Raising standards for American education.
From page 201...
... These national efforts consist of two types of often intersecting approaches to driving enhanced educational productivity. First, at the urging of the nation's governors, the federal government has initiated a series of efforts to promote national curriculum, performance, and opportunity to learn standards that would be adopted by states and local school districts on a voluntary basis.
From page 202...
... First, assessment should be used to support or improve teaching of important mathematics content and procedures. Second, mathematics assessment should support good instructional practice.
From page 203...
... An earlier, and similar educational reform effort involved the use of minimum competency tests (ACT) to determine whether a student would receive a high school diploma.
From page 204...
... v. Turlington, a challenge to the state of Florida's program to condition the award of a high school diploma upon successful performance on a minimum competency test.3 Florida's legislative goals were to promote educational accountability and insure that every school district provided "instructional programs which meet minimum performance standards compatible with the state's plan for education.
From page 205...
... Any use of mathematics assessments to place students in educational tracks will probably not LEGAL AND ETHICAL ISSUES 205
From page 206...
... For example, for mathematics and computational skills, SCANS concludes that "virtually all employees should be prepared to maintain records, estimate results, use spread sheets, or apply statistical process controls if they negotiate, identify trends, or suggest new courses of action."22 SCANS estimated that "less than half of young adults can demonstrate the SCANS reading and writing minimums; even fewer can handle the mathematics."23 To the extent that mathematics assessments track the SCANS' goals, there may be both education and employment-related legal consequences attached to mathematics reform initiatives. If a mathematics assess 206 MEASURING WHAT COUNTS
From page 207...
... that other tests or selection devices, without a similarly undesirable racial effect, would also serve the employer's legitimate interest in 'efficient and trustworthy workmanship.' ,, 2~3 It is important to note that the types of tests and criteria struck down by the Court under these "job relatedness" standards have included general high school diploma requirements and standardized tests of general ability, such as the Wondertic.29 Discriminatory tests have been found impermissible .
From page 208...
... Duke Power37 defining the "business necessity" defense for discriminatory acts in employment and the "job relatedness" requirement for employment requirements. The Act also bars the practice used in some employment testing programs of statistically adjusting or using different cutoff scores on the basis of race, color, religion, sex, or national origin.38 Many of those currently promoting the use of assessment to enhance educational achievement and the infusion of workplacerelated skills into the assessments have proposed the use of assess ment data to determine such things as the award of certificates of mastery of workplace skills.
From page 209...
... ,4i suggest close scrutiny of mathematics assessment proposals to determine whether they present potential problems under these statutes. Each content standard needs to be scrutinized to determine whether the standard would serve as an unlawful bar to participation by a handicapped person in either an educational program or in employment.
From page 210...
... Gender discrimination in education is directly addressed by the provisions of Title fX of the Education Amendments of 1972 42 and its implementing regulations.43 Title IX bars discrimination on the basis of sex in all educational programs and activities conducted by recipients of federal financial aid. Many states have similar provisions.44 The legal analysis of Title IX challenges to gender disparities on mathematics assessments would probably follow the type of analysis used under Title Vll of the Civil Rights Act of 196445 to assess discrimination in employment testing.46 In addition, the provisions of Title Vll barring gender discrimination in employment could also apply to use of the assessments in the workplace.
From page 211...
... on curricular validity reinforced a behaviorist orientation in education at the time and fairly widespread attention began to be paid, for both educational and constitutional reasons, to requirements that teachers teach to the content of high-stakes tests. The constitutional standards set forth in Debra P., and reiterated in subsequent federal cases,54 will, for reasons to be discussed below, need to be considered in assessing the potential legal consequences of mathematics assessment, particularly if the individual stakes associated with assessment performance are high.
From page 212...
... On the other hand, individual challenges by students and other groups on broad constitutional grounds may increase the number and rate of success as the more litigious members of our society apply their financial resources to efforts to obtain legal redress for educational grievances. This appears to be happening at present, for example, in cases involving challenges to disputed test scores from the Educational Testing Service.56 The Constitutional standards set forth in Debra P
From page 213...
... ISSUES OF PERSONAL CHOICE Additional constitutional issues may arise if programs using mathematics assessments follow the lead set by SCANS in its definitions of skills for the workplace. Some of these issues touch on some of the more controversial political matters presently confronting the nation.
From page 214...
... . ~ · ~ ~ ~ ~ ~ ~ ~ ~ O · ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ E Q U ~ T Y A N D T H E G o v E R N A N C E O F E D U C A T ~ O N With the proposal from some to establish a national assessment system that would truly be national, not federal, current reform initiatives acknowledge the long-standing tradition of state control of education.
From page 215...
... A laudable goal such as that of the National Council on Education Standards and Testing (NCEST) to create a system of "tests worth teaching LEGAL AND ETHICAL ISSUES 215
From page 216...
... Some states, such as New~ersey, have implemented receiverships for certain low-performing districts. If mathematics assessment information or other educational accountability reports begin to inform state-level reviews of local district educational achievement, then such variables as the mathematics assessment will come to have very high stakes consequences not only for students but also for local school districts.
From page 217...
... The Implementation Task Force of the National Council for Education Standards and Testing suggests that equitable distribution of resources among districts and among schools within districts is a critical component for implementation at each level of government.6~3 That group recognized that equity in funding is a key factor in the success of the endeavors and will become a major issue in all of the states.70 Federal programs in the past have been critical in providing assistance for the educationally disadvantaged. Such endeavors will need to continue but should be linked tightly to the common content and performance standards.7' NCEST in some respects seems to dismiss problems related to fiscal equity, hoping instead that national standards can create targets toward which educators can strive.72 NCEST argues that states and local districts could work together to overcome deficiencies in resources.73 Given the substantial difficulties that even one state, Texas, has had attempting to arrive at an equalization formula to LEGAL AND ETHICAL ISSUES 217
From page 218...
... Supreme Court refused to recognize that education is a funda mental right under the Constitution; however, if a fundamental right is, in essence, created as the result of the creation of an entitlement, then the level of judicial scrutiny of a governmental practice may be subject to the burdensome "strict scrutiny" level of analysis of practices that work to deny citizens' fundamental interests, a burden nearly impossible for government to meet. A related issue concerns the fact that the government will have created a legitimate expectation on the part of students that school attendance will result in attainment of a certain level of mathematics skills.
From page 219...
... From an economic perspective, a failure to effectively address the needs of all students will have devastating consequences for the future economic welfare of the entire nation. ······O·~·~O··~·~O~ C O N C ~ U S ~ O N This paper provides a brief summary of the principal legal and policy issues that might arise in challenges to a mathematics assessment initiative by members of protected groups traditionally underserved by the nation's schools, by any student who performs poorly on an assessment, or by individual school districts.
From page 220...
... i5 An item from the February 1991 Maryland School Performance Assessment Program Grade 8 Mathematics Assessment teacher's guide involves a task asking students to develop a survey clan to collect information on potential respondents to assist a developer s efforts to build a new restaurant. The lowest-scoring sample student answer is "I would ask people in the rich part of the county." Without doubt, that response lacks a richness of detail that reflects much understanding of sampling methodology even at the eighth grade level, but for a low-income student who could never contemplate having the opportunity to be a developer, the sample answer says it all.
From page 221...
... 27 See P Patterson, "Employment Testing and Title Vll of the Civil Rights Act of 1964" in Gifford and O'Connor, pp.
From page 222...
... 57 Raising Standards for American Education: A Report to Congress, the Secretary of Education, the National Education Goals Panel, and the American People. The National Council on Education Standards and Testing (hereinafter NCEST)
From page 223...
... E-15 73 Id. 74 Lonnie Harp, "Texas Finance Bill Signed Into Law, Challenges Anticipated, Education Week, 9 June 1993; Lonnie Harp, "Impact of Texas Finance Law, Budget Increase Gauged," Education Week, 1 6 june 1 993; Millicent Lawton, "Alabama judge Sets October Deadline for Reform Remedy," 23 june 1993.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.