3
ASSESSING IMPORTANT MATHEMATICAL CONTENT
Highquality mathematics assessment must be shaped and defined by important mathematical content. This fundamental concept is embodied in the first of three educational principles to guide assessment.
THE CONTENT PRINCIPLE
Assessment should reflect the mathematics that is most important for students to learn.
The content principle has profound implications for designing, developing, and scoring mathematics assessments as well as reporting their results. Some form of the content principle may have always implicitly guided assessment development, but in the past the notion of content has been construed in the narrow topiccoverage sense. Now content must be viewed much more broadly, incorporating the processes of mathematical thinking, the habits of mathematical problem solving, and an array of mathematical topics and applications, and this view must be made explicit. What follows is, nonetheless, a beginning description; much remains to be learned from research and from the wisdom of expert practice.
DESIGNING NEW ASSESSMENT FRAMEWORKS
Many of the assessments in use today, such as standardized achievement tests in mathematics, have reinforced the view that the mathematics curriculum is built from lists of narrow, isolated skills that can easily be decomposed for appraisal. The new vision of mathematics requires that assessment reinforce a new conceptualization that is both broader and more integrated.
The new vision of mathematics requires that assessment reinforce a new conceptualization that is both broader and more integrated. 
Tests have traditionally been built from test blueprints, which have often been two dimensional arrays with topics to be covered along one axis and types of skills (or processes) on the other.^{1} The assessment is then created by developing questions that fit into one cell or another of this matrix. But important mathematics is not always amenable to this cellbycell analysis.^{2} Assessments need to involve more than one mathematical topic if students are to make appropriate connections among the mathematical ideas they have learned. Moreover, challenging assessments are usually open to a variety of approaches, typically using varied and multiple processes. Indeed, they can and often should be designed so that students are rewarded for finding alternative solutions. Designing tasks to fit a single topic and process distorts the kinds of assessments students should be able to do.
BEYOND TOPICBYPROCESS FORMATS
Assessment developers need characterizations of the important mathematical knowledge to be assessed that reflect both the necessary coverage of content and the interconnectedness of topics and process. Interesting assessment tasks that do not elicit important mathematical thinking and problem solving are of no use. To avoid this, preliminary efforts have been made on several fronts to seek new ways to characterize the learning domain and the corresponding assessment. For example, lattice structures have recently been proposed as an improvement over matrix classifications of tasks.^{3} Such structures provide a different and perhaps more interconnected view of mathematical understanding that should be reflected in assessment.
The approach taken by the National Assessment of Educational Progress (NAEP) to develop its assessments is an example of the effort to move beyond topicbyprocess formats. Since its inception, NAEP has used a matrix design for developing its mathematics assessments. The dimensions of these designs have varied over the years, with a 35cell design used in 1986 and the design below for the 1990 and 1992 assessments. Although classical test theory strongly encouraged the use of matrices to structure and provide balance to examinations, the designs also were often the root cause of the decontextualizing of assessments. If 35 percent of the items on the assessment were to be from the area of measurement and 40 percent of those were to assess students' procedural
knowledge, then 14 percent of the items would measure procedural knowledge in the content domain of measurement. These items were developed to suit one cell of the matrix, without adequate consideration to the context and connections to other parts of mathematics.
Starting with the 1995 NAEP mathematics assessment, the use of matrices as a design feature has been discontinued. Percentages of items will be specified for each of the five major content areas, but some of these items will be doublecoded because they measure content in more than one of the domains. Mathematical abilities categories—conceptual understanding, procedural knowl
NAEP 19901992 Matrix 




Content 






Numbers and Operations 
Measurement 
Geometry 
Data Analysis, Probability, and Statistics 
Algebra and Functions 


Conceptual Understanding 






Mathematical Ability 
Procedural Knowledge 







Problem Solving 














edge, and problem solving—will come into play only at the final stage of development to ensure that there is balance among the three categories over the entire assessment (although not necessarily by each content area) at each grade level. This change, along with the continued use of items requiring students to construct their own responses, has helped provide a new basis for the NAEP mathematics examination.^{4}
One promising approach to assessment frameworks is being developed by the Balanced Assessment Project, which is a National Science Foundationsupported effort to create a set of assessment packages, at various grade levels, that provide students, teachers, and administrators with a fair and deep characterization of student
attainment in mathematics.^{5} The seven main dimensions of the framework are sketched below:

content (which is very broadly construed to include concepts, senses, procedures and techniques, representations, and connections),

thinking processes (conjecturing, organizing, explaining, proving, etc.),

products (plans, models, reports, etc.),

mathematical point of view (realworld modeling, for example),

diversity (accessibility, sensitivity to language and culture, etc.),

circumstances of performance (amount of time allowed, whether the task is to be done individually or in groups, etc.), and

pedagogicsaesthetics (the extent to which a task or assessment is believable, engaging, etc.).
The first four dimensions describe aspects of the mathematical competency that the students are asked to demonstrate, whereas the last three dimensions pertain to characteristics of the assessment itself and the circumstances or conditions under which the assessment is undertaken.
One noteworthy feature of the framework from the Balanced Assessment Project is that it can be used at two different levels: at the level of the individual task and at the level of the assessment as a whole. When applied to an individual task, the framework can be used as more than a categorizing mechanism: it can be used to enrich or extend tasks by suggesting other thinking processes that might be involved, for example, or additional products that students might be asked to create. Just as important, the framework provides a way of examining the balance of a set of tasks that goes beyond checking off cells in a matrix. Any sufficiently rich task will involve aspects of several dimensions and hence will
strengthen the overall balance of the entire assessment by contributing to several areas. Given a set of tasks, one can then examine the extent to which each aspect of the framework is represented, and this can be done without limiting oneself to tasks that fit entirely inside a particular cell in a matrix.
As these and other efforts demonstrate, researchers are attempting to take account of the fact that assessment should do much more than test discrete procedural skills.^{6} The goal ought to be schemes for assessment that go beyond matrix classification to assessment that elicits student work on the meaning, process, and uses of mathematics. Although the goal is clearly defined, methods to achieve it are still being explored by researchers and practitioners alike.
SPECIFYING ASSESSMENT FRAMEWORKS
An assessment framework should provide a way to examine the balance of a set of tasks that goes beyond checking off cells in a matrix. 
Assessment frameworks provide test developers with the guidance they need for creating new assessments. Embedded in the framework should be information to answer the following kinds of questions: What mathematics should students know before undertaking an assessment? What mathematics might they learn from the assessment? What might the assessment reveal about their understanding and their mathematical power? What mathematical background are they assumed to have? What information will they be given before, during, and after the assessment? How might the tasks be varied, extended, and incorporated into current instruction?
Developers also need criteria for determining appropriate student behavior on the assessment: Will students be expected to come up with conjectures on their own, for example, or will they be given some guidance, perhaps identification of a faulty conjecture, which can then be replaced by a better one? Will they be asked to write a convincing argument? Will they be expected to explain their conjecture to a colleague or to the teacher? What level of conjecture and argument will be deemed satisfactory for these tasks? A complete framework might also include standards for student performance (i.e., standards in harmony with the desired curriculum).
Very few examples of such assessment frameworks currently exist. Until there are more, educators are turning to curriculum frameworks, such as those developed by state departments of education
across the country, and adapting them for assessment purposes. The state of California, for example, has a curriculum framework that asserts the primacy of developing mathematical power for all students: "Mathematically powerful students think and communicate, drawing on mathematical ideas and using mathematical tools and techniques."^{7} The framework portrays the content of mathematics in three ways:

Strands (such as number, measurement, and geometry) run throughout the curriculum from kindergarten through grade 12. They describe the range of mathematics to be represented in the curriculum and provide a way to assess its balance.

Unifying ideas (such as proportional relationships, patterns, and algorithms) are major mathematical ideas that cut across strands and grades. They represent central goals for learning and set priorities for study, bringing depth and connectedness, to the student's mathematical experience.

Units of instruction (such as dealing with data, visualizing shapes, and measuring inaccessible distances) provide a means of organizing teaching. Strands are commingled in instruction, and unifying ideas give too big a picture to be useful day to day. Instruction is organized into coherent, manageable units consisting of investigations, problems, and other learning activities.
Through the California Learning Assessment System, researchers at the state department of education are working to create new forms of assessment and new assessment tasks to match the curriculum framework.^{8}
Further exploration is needed to learn more about the development and appropriate use of assessment frameworks in mathematics education. Frameworks that depict the complexity of mathematics enhance assessment by providing teachers with better targets for teaching and by clearly communicating what is valued to students, their parents, and the general public.^{9} Although an individual assessment may not treat all facets of the framework, the collection of assessments needed to evaluate what students are learning should be comprehensive. Such completeness is necessary if assessments are to provide the right kind of leadership for educa
tional change. If an assessment represents a significant but small fraction of important mathematical knowledge and performance, then the same assessment should not be used over and over again. Repeated use could inappropriately narrow the curriculum.
DEVELOPING NEW ASSESSMENT TASKS
Several desired characteristics of assessment tasks can be deduced from the content principle and should guide the development of new assessment tasks.
TASKS REFLECTING MATHEMATICAL CONNECTIONS
Current mathematics education reform literature emphasizes the importance of the interconnections among mathematical topics and the connections of mathematics to other domains and disciplines. Much assessment tradition is based, however, on an atomistic approach that in practice, if not in theory, hides the connections among aspects of mathematics and between mathematics and other domains. Assessment developers will need to find new ways to reflect these connections in the assessment tasks posed for students.
One way to help ensure the interconnectedness is to create tasks that ask students to bring to bear a variety of aspects of mathematics. An example involving topics from arithmetic, geometry, and measurement appears on the following page.^{10} Similarly, tasks may ask students to draw connections across various disciplines. Such tasks may provide some structure or hints for the students in finding the connections or may be more openended, leaving responsibility for finding connections to the students. Each strategy has its proper role in assessment, depending on the students' experience and accomplishment.
Another approach to reflecting important connections is to set tasks in a realworld context. Such tasks will more likely capture students' interest and enthusiasm and may also suggest new ways of understanding the world through mathematical models so that the assessment becomes part of the learning precess. Moreover, the "situated cognition" literature^{11} suggests that the specific settings and
Lightning Strikes Again! One way to estimate the distance from where lightning strikes to you is to count the number of seconds until you hear the thunder and then divide by five. The number you get is the approximate distance in miles. One person is standing at each of the four points A, B, C, and D. They saw lightning strike at E. Because sound travels more slowly than light, they did not hear the thunder right away. 1.Who heard the thunder first? _____ Why? Who heard it last? _____ Why? Who heard it after 17 seconds? _____ Explain your answer. 2. How long did the person at B have to wait to hear the thunder? 3. Now suppose lightning strikes again at a different place. The person at A and the person at C both hear the thunder after the same amount of time. Show on the map below where the lightning might have struck. 4. In question 3, are there other places where the lightning could have struck? Explain your answer. 
contexts in which a mathematical situation is embedded are critical determinants of problem solvers' responses to that situation. Developers should not assume, however, that just because a mathematical task is interesting to students, it therefore contains important mathematics. The mathematics in the task may be rather trivial and therefore inappropriate.
Test items that assess one isolated fragment of a student's mathematical knowledge may take very little time and may yield reliable scores when added together. However, because they are set in no reasonable context, they do not provide a full picture of the student's reasoning. They cannot show how the student connects mathematical ideas, and they seldom allow the student an opportunity to explain or justify a line of thinking.
Students should be clear about the context in which a question is being asked. Either the assumptions necessary for students to use mathematics in a problem situation should be made clear in the instructions or students should be given credit for correct reasoning under various assumptions. The context of a task, of course, need not be derived from mathematics. The example at right contains a task from a Kentucky statewide assessment for twelfthgraders that is based on the notion of planning a budget within certain practical restrictions.^{12}
Budget Planning Task You graduated from Fairdale High School 2 years ago, and although you did not attend college, you have been attending night school to learn skills to repair video cassette recorders while you worked for minimum wages at a video center by day. Now you have been fortunate to find an excellent job that requires the special skills you have developed. Your salary will be $18,000. This new job excites you because for some time you have been wanting to move out of your parents' home to your own apartment. During the past 2 years you have been able to buy your own bedroom set, a television, a stereo, and some of your own dishes and utensils. To move to your own apartment, you will need to develop a budget. Your assignment is to develop a monthly budget showing how you will live on the income from your new job. To guide you, read the list below. (A packet of resource materials is provided, including a newspaper and brochures with consumer information.)
Your budget for this project should be presented as a onepage, twocolumn display. Supporting this onepage budget summary, you should submit an explanation for each budget figure, telling how/where you got the information. 
Other examples of ageappropriate contexts can be found in the fourthgrade assessments developed by the New Standards Project (NSP), a working partnership of researchers and state and local school districts formed to develop new systems of performancebased assessments. One such problem includes a fairly complex task in which children are given a table of information about various kinds of tropical fish (their lengths, habits, prices, etc.) and are asked to propose how to spend a fixed amount of money to buy a variety of fish for an aquarium of limited capacity, under certain realistic constraints.^{13} The child must develop a solution that takes the various constraints into account. The task offers ample possibilities for students to display reasoning that connects mathematics with the underlying content.
THE CHALLENGES IN MAKING CONNECTIONS
The need to reflect mathematical connections pushes task development in new directions, each presenting challenges that require attention.
Assessment tasks can use unusual, yet realistic settings, so that everyone's prior knowledge of the setting is the same. 
Differential Familiarity Whatever the context of a mathematical task, some students will be more familiar with it than other students, possibly giving some an unfair advantage. One compensating approach is to spend time acquainting all students with the context. The NSP, for example, introduces the context of a problem in an assessment exercise in a separate lesson, taught before the assessment is administered.^{14} Presumably the lesson reduces the variability among the students in their familiarity with the task setting. The same idea can be found in some of the assessment prototypes in Measuring Up: Prototypes for Mathematics Assessment. In one prototype, for instance, a script of a videotaped introduction was suggested;^{15} playing such a videotape immediately before students work on the assessment task helps to ensure that everyone is equally familiar with the underlying context.
Another approach is to make the setting unusual, yet realistic, so that everyone will be starting with a minimum of prior knowledge. This technique was used in a study of children's problem solving conducted through extended individual taskbased interviews.^{16} The context used as the basis of the problem situation—a complex game involving probability—was deliberately constructed so that it would be unfamiliar to everyone. After extensive pilot testing of many variations,
an abstract version of the game was devised in which children's prior feelings and intuitive knowledge about winning and losing (and about competitions generally) could be kept separate from their mathematical analyses of the situation.
Task developers must consider whether students' assumptions affect the mathematics called for in solution of a problem. 
Clarifying Assumptions Task developers must consider seriously the impact of assumptions on any task, particularly as the assumptions affect the mathematics that is called for in solution of the problem. An example of the need to clarify assumptions is a performance assessment^{17} that involves four tasks, all in the setting of an industrial arts class and all involving measuring and cutting wood. As written the tasks ignore an important idea from the realm of wood shop: When one cuts wood with a saw, a small but significant amount of wood is turned into sawdust. This narrow band of wood, called the saw's kerf, must always be taken into account, for otherwise the measurements will be off. The tasks contain many instances of this oversight: If, for example, a 16inch piece is cut from a board that is 64 inches long, the remaining piece is not 48 inches long. Thus students who are fully familiar with the realities of wood shop could be at a disadvantage, since the problems posed are considerably more difficult when kerf is taken into account. Any scoring guide should provide an array of plausible answers for such tasks to ensure that students who answer the questions more accurately in realworld settings are given ample credit for their work. Better yet, the task should be designed so that assumptions about kerf (in this case) are immaterial to a solution.
Another assessment item^{18} that has been widely discussed^{19} also shows the need to clarify assumptions. In 1982, this item appeared in the third NAEP mathematics assessment: "An army bus holds 36 soldiers. If 1128 soldiers are being bussed to their training site, how many buses are needed?'' The responses have been taken as evidence of U.S. students' weak understanding of mathematics, because only 33 percent of the 13yearold students surveyed gave 32 as the answer, whereas 29 percent gave the quotient 31 with a remainder, and 18 percent gave just the quotient 31. There are of course many possible explanations as to why students who performed the division failed to give the expected wholenumber answer. One plausible explanation may be that some students did not see a need to use one more bus to transport the remaining 12 soldiers. They could squeeze into the other buses; they could go by car. Asked about their answers in interviews or in writing, some
students offer such explanations.^{20} The point is that the answer 32 assumes that no bus can hold more than 36 soldiers and that alternative modes of transportation cannot be arranged.
Few current assessment tasks provide all students with the opportunity to start the tasks, let alone work part way through it. 
Ease of Entry and Various Points of Exit Students should be allowed to become engaged in assessment tasks through a sequence of questions of increasing difficulty. They ought to be able to exit the task at different points reflecting differing levels of understanding and insight into the mathematical content. For too long, assessment tasks have not provided all students with the opportunity to start the task, let alone work part way through it. As a result some students have inadequate opportunities to display their understanding. The emerging emphasis on connections lets assessment tasks be designed to permit various points of entry and exit.
As an example consider a problem from the California Assessment Program's Survey of Academic Skills.^{21} The task starts with a square, into which a nested sequence of smaller squares is to be drawn, with the vertices of each square connecting the midpoints of the sides of the previous one. Starting with this purely geometric drawing task, the student determines the sequence of numbers corresponding to the areas of the squares and then writes a general rule that can be used to find the area of the n^{th} interior square. Such a task allows the student who may not be able to produce a general rule at least to demonstrate his or her understanding of the geometrical aspects of the situation.
TASKS REQUIRING COMMUNICATION
The vision of mathematics and mathematics assessment described in earlier chapters emphasizes communication as an critical feature. Developers are beginning to recognize that there are many ways to communicate about mathematical ideas and that assessment tasks have seldom made sufficient use of these alternatives.
Incorporating communication about mathematics into assessment tasks obviously calls for different forms of responses than have been common in the past. Students may respond in a wide variety of ways: by writing an essay, giving an oral report, participating in a group discussion, constructing a chart or graph, or programming a computer.
In the aquarium task cited above, students were asked to write a letter to their principal explaining what fish they would buy and why they made their choices. In a task from an Educational Testing Service program for middle school mathematics,^{22} students are asked to predict future running speed for male and female athletes, typically derived from graphs of tables they have constructed, and to justify their predictions in written form.
Some assessments can be carried out as individual interviews. The example below describes a question on an oral examination for 17yearold Danish mathematics students in the Gymnasium.^{23} (The Gymnasium enrolls less than 30 percent of the age cohort, and not all Gymnasium students take mathematics.) One clear benefit of an oral assessment is that it allows the assessor to identify immediately how students are interpreting the problem context and what assumptions they are making.
Tasks that require students to communicate about mathematics pose the following challenge: To what extent are differences in ability to communicate to be considered legitimate differences in mathematical power? Clearly efforts must be made to ensure that students are given the opportunity to respond to assessments in the language they speak, read, and write best. Different students will choose different preferred modes of mathematical communication. For example, some will be able to explain their reasoning more effectively with a table or chart than with an equation or formula; for others the reverse will be true. Hence tasks should be constructed, insofar as possible, to allow various approaches and various
Oral Assessments Expound on the exponential growth model, including formulas and graphs (the following data may be used as a basis). Under favorable circumstances the bacterium Escherichia coli divides every 20 minutes:
The hour wages (in Danish kroners) of female workers in Denmark were for the years 19631970:
At the oral examination it is expected that the student, unassisted will explain, e.g.,

To the extent that communication is a part of mathematics, differences in communication skill must be seen as differences in mathematical power. 
response modes. The scoring of the assessments must take the variety of valid approaches into account.
Nonetheless, some differences in performance will remain. To the extent that communication is a part of mathematics, differences in communication skill must be seen as differences in mathematical power. Means of fairly evaluating responses, accounting for both the student's and the assessor's preferred modes of communication for any given task, must be developed.
Of course, communication is a twoway street. Questions should be understandable and clear. The need for careful use of language and notation in stating a task has long been a goal of assessment developers, although one not always successfully achieved.
The example below from the Second International Mathematics Study (SIMS) illustrates this difficulty in a multiplechoice item.^{24} The aim of the item is evidently to tap students' abilities to apply the Pythagorean Theorem to a right triangle formed at the top of the figure by drawing a segment parallel to the bottom segment, concluding that × equals 8 m. Because the question is posed in a multiplechoice format, however, and because the figure is drawn so close to scale, the correct choice, C, can be found by estimating the length visually. Certainly this is a fully legitimate, and indeed more
Problem Solving Item from SIMS
elegant, way of solving the problem than the posers had probably intended, but if the goal is to determine students' knowledge of the Pythagorean relationship, then the item is misguided. Figures, like all other components of a task, must be carefully examined to see if they convey what the assessment designer intends.
SOLVING NONROUTINE PROBLEMS
The goal of assessment is to draw on the habits of thinking developed by students in their studies, not on specific problems they have learned to solve in their earlier work. 
Problem solving is the cornerstone of the reform effort in mathematics education. The types of problems that matter—the types we really wish to have students learn how to solve—are the ones that are nonroutine. It is not sufficient for students to be able to solve highly structured, even formulaic problems that all require the same approach. Students must face problems they have not encountered before and learn how to approach and solve them. Thus, in an assessment, to learn what a student can do with nonroutine problems, the problems must be genuinely new to the student. The goal is to draw on the habits of thinking developed by students in their studies, not on specific problems they have learned to solve in their earlier work.
There is some tension between the claim that nonroutine problems are the most legitimate kind of problem, on the one hand, and the need for fairness, on the other. Tasks must be constructed in such a way that the intended audience has all the prerequisite skills (or should have had all the prerequisite skills) yet has never seen a problem just like the one it is confronted with one the assessment. Nonroutine problems pose special issues for the security of assessment. To be novel, assessment tasks must not have been seen in advance. However, students should be exposed to a variety of novel problems in daily work to demonstrate to them that nonroutine problems are valued in mathematics education and that they should expect to see many such problems.
The challenge, ultimately, is to ensure that all students being assessed have had substantial experience in grappling with nonroutine problems as well as the opportunity to learn the mathematical ideas embedded in the problem. Teachers must be open to alternative solutions to nonroutine problems. When instruction has not been directed to preparing students for nonroutine problem solving, performance will likely be related more to what has been called aptitude than to instructionally related learning.
MATHEMATICAL EXPERTISE
New kinds of assessments call for new kinds of expertise among those who develop the tasks. 
New kinds of assessments call for new kinds of expertise among those who develop the tasks. The special features of the mathematics content and the special challenges faced in constructing assessment tasks illustrate a need for additional types of expertise in developing assessment tasks and evaluation schema. Task developers need to have a high level of understanding of children, how they think about things mathematical and how they learn mathematics, well beyond the levels assumed to be required to develop assessment tasks in the past. Developers must also have a deep understanding of mathematics and its applications. We can no longer rely on task developers with superficial understanding of mathematics to develop assessment tasks that will elicit creative and novel mathematical thinking.
SCORING NEW ASSESSMENTS
The content principle also has implications for the mathematical expertise of those who score assessments and the scoring approaches that they use.
JOINING TASK DEVELOPMENT TO STUDENT RESPONSES
A multiplechoice question is developed with identification of the correct answer. Similarly, an openended task is incomplete without a scoring rubric—a scoring guide—as to how the response will be evaluated. Joining the two processes is critical because the basis on which the response will be evaluated has many implications for the way the task is designed, and the way the task is designed has implications for its evaluation.
Just as there is a need to try out multiplechoice test questions prior to administration, so there is a need to try out the combination of task and its scoring rubric for openended questions. Students' responses give information about the design of both the task and the rubric. Feedback loops, where assessment tasks are modified and sharpened in response to student work, are especially important, in part because of the variety of possible responses.
EVALUATING RESPONSES TO REFLECT THE CONTENT PRINCIPLE
The key to evaluating responses to new kinds of assessment tasks is having a scoring rubric that is tied to the prevailing vision of mathematics education. If an assessment consists of multiplechoice items, the job of determining which responses are correct is straightforward, although assessment designers have little information to go on in trying to decide why students have made certain choices. They can interview students after a pilot administration of the test to try to understand why they chose the answers they did. The designers can then revise the item so that the erroneous choices may be more interpretable. If ambiguity remains and students approach the item with sound interpretations that differ from those of the designers, the response evaluation cannot help matters much. The item is almost always scored either right or wrong.^{25}
Designers of openended tasks, on the other hand, ordinarily describe the kinds of responses expected in a more general way. Unanticipated responses can be dealt with by judges who discuss how those responses fit into the scoring scheme. The standardsetting process used to train judges to evaluate openended responses, including portfolios, in the Advanced Placement (AP) program of the College Board, for example, alternates between the verbal rubrics laid out in advance and samples of student work from the assessment itself.^{26} Portfolios in the AP Studio Art evaluation are graded by judges who first hold a standardsetting session at which sample portfolios representing all the possible scores are examined and discussed. The samples are used during the judging of the remaining portfolios as references for the readers to use in place of a general scoring rubric. Multiple readings and moderation by more experienced graders help to hold the scores to the agreed standard.^{27} Together, graders create a shared understanding of the rubrics they are to use on the students' work. Examination boards in Britain follow a similar procedure in marking students' examination papers in subjects such as mathematics, except that a rubric is used along with sample examinations discussed by the group to help examiners agree on marks.^{28}
The development of highquality scoring guides to match new assessment is a fairly recent undertaking. One approach has been first
to identify in general terms the levels of desired performance and then to create taskspecific rubrics. An example from a New Jersey eighthgrade "Early Warning" assessment appears on the following page.^{29}
Profound challenges confront the developer of a rating scheme regardless of the system of scoring or the type of rubric used. 
A general rubric can be used to support a holistic scoring system, as New Jersey has done, in which the student's response is examined and scored as a whole. Alternatively, a much more refined analytic scheme could be devised in which specific features or qualities of a student's response are identified, according to predetermined criteria, and given separate scores. In the example from New Jersey, one can imagine a rubric that yields two independent scores: one for the accuracy of the numerical answer and one for the adequacy of the explanation.
Assessors are experimenting with both analytic and holistic approaches, as well as a amalgam of the two. For example, in the Mathematics Performance Assessment developed by The Psychological Corporation,^{30} responses are scored along the dimensions of reasoning, conceptual knowledge, communication, and procedures, with a separate rubric for each dimension. In contrast, QUASAR, a project to improve the mathematics instruction of middle school students in economically disadvantaged communities,^{31} uses an approach that blends taskspecific rubrics with a more general rubric, resulting in scoring in which mathematical knowledge, strategic knowledge, and communication are considered interrelated components. These components are not rated separately but rather are to be considered in arriving at a holistic rating.^{32} Another approach is through socalled protorubrics, which were developed for the tasks in Measuring Up.^{33} The protorubrics can be adapted for either holistic or analytic approaches and are designed to give only selected characteristics and examples of high, medium, and low responses.
Profound challenges confront the developer of a rating scheme regardless of the system of scoring or the type of rubric used. If a rubric is developed to deal with a single task or a type of task, the important mathematical ideas and processes involved in the task can be specified so that the student can be judged on how well those appear to have been mastered, perhaps sacrificing some degree of interconnectedness among tasks. On the other hand, general rubrics may not allow scorers to capture some important qualities of students' thinking about a particular task. Instead,
From a Generalized Holistic Scoring Guide to a Specific Annotated Item Scoring Guide Generalized Scoring Guide: Student demonstrates proficiency — Score Point = 3. The student provides a satisfactory response with explanations that are plausible, reasonably clear, and reasonably correct, e.g., includes appropriate diagram(s), uses appropriate symbols or language to communicate effectively, exhibits an understand of the mathematics of the problem, uses appropriate processes and/or descriptions to answer the question, and presents sensible supporting arguments. Any flaws in the response are minor. Student demonstrates minimal proficiency — Score Point = 2 The student provides a nearly satisfactory response which contains some flaws, e.g., begins to answer the question correctly but fails to answer all of its parts or omits appropriate explanation, draws diagram(s) with minor flaws, makes some errors in computation, misuses mathematical language, or uses inappropriate strategies to answer the question. Student demonstrates a lack of proficiency — Score Point = 1 The student provides a less than satisfactory response that only begins to answer the question, but fails to answer it completely, e.g., provides little or no appropriate explanation, draws diagram(s) which are unclear, exhibits little or no understanding of the question being asked, or makes major computational errors. Student demonstrates no proficiency — Score Point = 0 The student provides an unsatisfactory response that answers the question inappropriately, e.g., uses algorithms which do not reflect any understanding of the question, makes drawings which are inappropriate to the question, provides a copy of the question without an appropriate answer, fails to provide any information which is appropriate to the question, or fails to attempt to answer the question. Specific Problem: What digit is in the fiftieth decimal place of the decimal form of 3/11? Explain your answer. Annotated Scoring Guide: 3 points The student provides a satisfactory response; e.g., indicates that the digit in the fiftieth place is 7 and shows that the digits 2 and 7 in the quotient (.272727 …) alternate; the explanation of why 7 is the digit in the fiftieth place is either based on some counting procedure or on the pattern of how the digits are positioned after the decimal point. (The student could read fiftieth as fifteenth or fifth, identify 2 as the digit, and provide an explanation similar to the ones above.) 2 points The student provides a nearly satisfactory response which contains some flaws, e.g., identifies the pattern of the digits 2 and 7 (.272727 …) and provide either a weak or no explanation of why 7 is the digit in the fiftieth place OR converts 3/11 incorrectly to 3.666 … and provides some explanation of why 6 is the digit in the fiftieth place. 1 point The student provides a less than satisfactory response that only begins to answer the question; e.g., begins to divide correctly (minor flaws in division are allowed) but fails to identify "the digit" OR identifies 7 as the correct digit with no explanation or work shown. 0 points The student provides an unsatisfactory response; e.g., either answers the question inappropriately or fails to attempt to answer the question. 
anecdotal evidence suggests that students may be given credit for verbal fluency or for elegance of presentation rather than mathematical acumen. The student who mentions everything possible about the problem posed in the task and rambles on about minor points the teacher has mentioned in class may receive more credit than a student who has deeper insights into the problem but produces only a terse, minimalist solution. The beautiful but prosaic presentation with elaborate drawings may inappropriately outweigh the unexpected but elegant solution. Such difficulties are bound to arise when communication with others is emphasized as part of mathematical thinking, but they can be dealt with more successfully when assessors include those with expertise in mathematics.
Unanticipated responses require knowledgeable graders who can recognize and evaluate them. 
In any case, regardless of the type of rubric, graders must be alert to the unconventional, unexpected answer, which, in fact, may contain insights that the assessor had not anticipated. The likelihood of unanticipated responses will depend in part upon the mathematical richness and complexity of the task. Of course, the greater the chances of unanticipated responses, the greater the mathematical sophistication needed by the persons grading the tasks: the graders must be sufficiently knowledgeable to recognize kernels of mathematical insight when they occur. Similarly, graders must sharpen their listening skills for those instances in which task results are communicated orally. Teachers are uniquely positioned to interpret their students' work on internal and external assessments. Personal knowledge of the students enhances their ability to be good listeners and to recognize the direction of their students' thinking.
There may also be a need for somewhat different rubrics even on the same task because judgment of draft work should be different from judgment of polished work. With problem solving a main thrust of mathematics education, there is a place for both kinds of judgments. Some efforts are under way, for example, to establish iterative processes of assessment: Students work on tasks, handing it in to teachers to receive comments about their work in progress. With these comments in hand, students may revise and extend their work. Again, it goes to the teacher for comment. This backandforth process may continue several times, optimizing the opportunity for students to learn from the assessment. Such a model will require appropriate rubrics for teachers and students alike to judge progress at different points.
REPORTING ASSESSMENT RESULTS
Consideration of issues about the dissemination of results are often not confronted until after an assessment has been administered. This represents a missed opportunity, particularly from the perspective of the content principle. Serious attention to what kind of information is needed from the assessment and who needs it should influence the design of the assessment and can help prevent some of the common misuses of assessment data by educators, researchers, and the public. The reporting framework itself must relate to the mathematics content that is important for all students to learn.
There has been a long tradition in external assessment of providing a single overall summary score, coupled in some cases with subscores that provide a more finegrained analysis. The most typical basis for a summary score has been a student's relative standing among his or her group of peers. There have been numerous efforts to move to other information in a summary score, such as percent mastery in the criterionrelated measurement framework. One innovative approach has been taken by the Western Australia Monitoring Standards in Education program. For each of five strands (number; measurement; space; chance and data; algebra) a student's performances on perhaps 20 assessment tasks are arrayed in such a way that overall achievement is readily apparent while at the same time some detailed diagnostic information is conveyed.^{34} NAEP developed an alternative approach to try to give meaning to summary scores beyond relative standing. NAEP used statistical techniques to put all mathematics items in the same mathematics proficiency scale so that sets of items can be used to describe the level of proficiency a particular score represents.^{35} Although these scales have been criticized for yielding misinterpretations about what students know and can do in mathematics,^{36} they represent one attempt to make score information more meaningful.
Similarly, some teachers focus only on the correctness of the final answer on teachermade tests with insufficient attention to the mathematical problem solving that preceded it. Implementation of the content principle supports a reexamination of this approach. Problem solving legitimately may involve some false starts or blind alleys; students whose work includes such things are doing important mathematics.
Rather than forcing mathematics to fit assessment, assessment must be tailored to whatever mathematics is important to learn. 
Along with the efforts to develop national standards in various fields, there is a push to provide assessment information in ways that relate to progress toward those national standards. Precisely how such scores would be designed to relate to national standards and what they would actually mean are unanswered questions. Nonetheless, this push also is toward reporting methods that tell people directly about the important mathematics students have learned. This is the approach that NAEP takes when it illustrates what basic, proficient, and advanced mean by giving specific examples of tasks at these levels.
An assessment framework that is used as the foundation for the development of an assessment may provide, at least in part, a lead to how results of the assessment might be reported. In particular, the major themes or components of a framework will give some guidance with regard to the appropriate categories for reporting. For example, the first four dimensions of the Balanced Assessment Project's framework suggest that attention be paid to describing students' performance in terms of thinking processes used and products produced as well as in terms of the various components of content. In any case, whether or not a direct connection between aspects of the framework and reporting categories is made, a determination of reporting categories should affect and be affected by the categories of an assessment framework.
The mathematics in an assessment should never be distorted or trivialized for the convenience of assessment. Design, development, scoring, and reporting of assessments must take into account the mathematics that is important for students to learn.
In summary, rather than forcing mathematics to fit assessment, assessment must be tailored to whatever mathematics is important to assess.