Skip to main content

Currently Skimming:


Pages 111-174

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 111...
... The problem lies not so much with the range of measurement models available, but with the outdated conceptions of learning and observation that underlie most widely used assessments. Further, existing models and methods may appear to be more rigid than they actually are because they have long been associated with certain familiar kinds of test formats and with conceptions of student learning that emphasize general '.
From page 112...
... In this chapter, the three elements are defined more specifically, using terminology from the field of measurement: the aspects of cognition and learning that are the targets for the assessment are referred to as the construct or construct variables, observation is referred to as the observation model, and interpretation is discussed in terms of formal statistical methods referred to as measurement models. The methods and practices of standard test theory constitute a special type of reasoning from evidence.
From page 113...
... path analyses, Lazarsfeld's (1950) latent class models, item response theory (Langley, 1943)
From page 114...
... The notion of "telling stories that match up with what we see" corresponds to the technical concept of conditional independence in formal probability-based reasoning. Conditional independence means that any systematic relationships among multiple observations are due entirely to the unobservable construct variables they tap.
From page 115...
... Nevertheless, the BEAR example illustrates many of the principles that the committee is setting forth, including the need to pay attention to all three vertices of the assessment triangle and how they fit together. The IEY curriculum developers have conceptualized the learner as progressing along five progress variables that organize what students are to learn into five topic areas and a progression of concepts and skills (see Box 4-11.
From page 116...
... Observations of student performance consist of assessment tasks (which are embedded in the instructional program, and each of which has direct links to the progress variables) and link tests (which are composed of short-answer items also linked to the progress variables)
From page 117...
... The interpretation of these judgments is carried out using progress maps graphic displays used to record the progress of each student on particular progress variables over the course of the year. The statistical underpinning for these maps is a multidimensional item response model (explained later)
From page 118...
... of the learner, representing relatively more or less of the competency that is common to the set of items and their responses. This can be summarized graphically as in Figure 4-2, where the latent construct variable ~ (represented inside an oval shape in the figure to denote that it is unobservable)
From page 119...
... . The representation in Figure 4-2 corresponds to a class of measurement models called item response models, which are discussed below.
From page 120...
... The observation model is simplified to focus only on the sum of the responses with the individual item responses being omitted (see Figure 4-31. For example, if a CTT measurement model were used in the BEAR example, it would take the sum of the student scores on a set of assessment tasks as the observed score.
From page 121...
... Because of serious practical limitations, however, other theories such as generalizability theory, item response modeling, and factor analysis were developed to enable study of aspects of items. Generalizability Theory The purpose of generalizability theory (often referred to as G-theory)
From page 122...
... First, they allow one to characterize how the conditions under which the observations were made affect the reliability of the evidence. Second, this information is expressed in terms that allow one to project from the current assessment design to other potential designs.
From page 123...
... Furthermore, with IRM it is possible to predict the properties of a test from the properties of the items of which it is composed. In IRM, the construct model is still represented as a single continuous variable, but the observation model is expressed in terms of the items (as in Figure 451.
From page 124...
... As with CTT, with IRM one can still tell a story about a student's proficiency with regard to the latent construct. One can now, additionally, talk about what tends to happen with specific items, as expressed by their item parameters.
From page 125...
... Figure 4-5 can also be used to portray a one-dimensional factor analysis, although, in its traditional formulation, factor analysis differs somewhat from IRM. Like IRM, unidimensional factor analysis models the relationship between a latent construct or factor (e.g., mathematics computation skill)
From page 126...
... In practice very few tests are constructed in a way that would allow the items to be truly considered a random sampling from an item population. Latent Class Models In the measurement approaches described thus far, the latent construct has been assumed to be a continuous variable.
From page 127...
... If one assumed that a student's responses are "caused" by being in ordered latent classes corresponding to the successive scores in the Designing and Conducting Investigations scoring guide, one could construct something like the progress map in Figure 4-6, although the vertical dimension would lose its metric and become a set of four categories. For interpretation purposes, this map would probably be just about as useful as the current one.
From page 128...
... . Note that in this example, one might have analyzed the results separately for each of the progress variables and obtained four independent IRM estimations of the student and item parameters, sometimes referred to as a consecutive approach (Adams, Wilson, and Wang, 19971.
From page 129...
... This approach is not the same, however, as explicitly modeling changes in performance. The account that follows should make clear that quite flexible and complex formal models of growth and change are available to complement the status models described in the previous section.
From page 130...
... This can be done with each of the three types of models described above true-score models (e.g., CTT) , models with continuous latent variables (e.g., IRM)
From page 131...
... 4 CONTRIBUTIONS OF MEASUREMENT AND STATISTICAL MODELING TO ASSESSMENT 12 1 1 10 9 8 7 6 5 4 3 2 1 y=x~ 3 y=x~ 2 y=x~ 1 1 2 3 4 5 6 T;me Four regression lines, varying in intercept 12 1 1 10 9 8 7654 3 2- it, , I O 1 2 3 4 5 6 T;me Four regression lines, varying in slope 12 1 1 10 9 8 6 5 4 3 2 1 O A , , , , , I O 1 2 3 4 5 6 T;me J Y= 2.0X y= 1.5x Y= 1.0x Y= 0.5x y= 1.33x 2 v=0.50x~ 6 y = 1.00x ~ 0 y= -0.66x 8 Four regression lines, varying in intercept and slope FIGURE 4-10 Families of linear models: /'slopes as outcomes. /' adapted from Kreft and De Lecuw ( 1998, pp.
From page 132...
... 132 KNOWjNG WHAT STUDENTS KNOW FIGURE 4-11 Families of linear models: /'random coefficients./' Adapted from Kreft and De Lecuw {1998, pp.
From page 133...
... Modeling of Change In Continuous Latent Variables There are a number of approaches that extend the above ideas into the continuous latent variables domain. One such approach is to modify the true-score approaches to incorporate measurement error; an example is the "V-known" option in the HLM software (Raudenbush, Bryk, and Congdon, 1999)
From page 134...
... and PANMARK (van de Pol, Langeheine, and de long, 19891. INCORPORATION OF COGNITIVE ELEMENTS IN EXISTING MEASUREMENT MODELS The array of models described in the previous section represents a formidable toolkit for current psychometrics.
From page 135...
... Second, if we indeed value clinical judgment and a diversity of opinions among appraisers (such as certainly occurs in professional settings and post secondary education) , we will have to revise our notions of high-agreement reliability as a cardinal symptom of a useful and viable approach to scoring student performance....
From page 136...
... . And the third call- for something beyond "a single, summary statistic for student performance" could be addressed using multidimensional item response models or, more broadly, the range of multiattribute models (as in Figure 4-7, above)
From page 137...
... Enhancement Through Diagnostics Another fairly common type of enhancement in educational applications is the incorporation of diagnostic indices into measurement models to add richer interpretations. For example, as noted above, the call for something beyond "a single, summary statistic for student performance" could be addressed using multidimensional IRMs; this assumes, however, that the 6See Chapter 5 for discussion of norm-referenced vs.
From page 138...
... 138 KNOWING WHAT STUDENTS KNOW BOX. ~ Reporting Individual Achievement in Spelling Pictured below is a developmental continuum that has been constructed as a map for monitoring children's developing competence in spelling.
From page 139...
... 4 CONTRIBUTIONS OF MEASUREMENT AND STATISTICAL MODELING TO ASSESSMENT BOX 4-4 Keymath Diagnostic Arithmetic Test Here the range for successive grades is shown on the left side of the figure. Shown on the right are arithmetic tasks that are typically learned during that year.
From page 140...
... 140 KNOWING WHAT STUDENTS KNOW BOX 4-5 Map of Writing Achievement at the National ·eve! The figures below illustrate the use of progress maps to depict changes in development over time, applied in this case to writing achievement at the national level.
From page 141...
... 4 CONTRIBUTIONS OF MEASUREMENT AND S1A1~1~AL ~ODEUNG 10 ASSESSMENT ^ ~ as: an: a^ s s 0 1 .,.I 1 ~ 1 lag potion ~ the wrong achievement contnuum~n 1ha 1~ panel am "box and whisker? Cots mpresentng Student achievement as found in a national survey of Austr~ian school students.
From page 142...
... For example, within the Rasch approach to continuous IRM, a technique has been developed for interpreting discrepancies in the prediction of individual student responses to individual items. (These types of diagnostics can also be constructed for items with most IRM software packages.)
From page 143...
... Regardless of whether one finds these enhancements satisfying in a formal statistical modeling sense, there are three important points to be made about the use of such strategies. First, they provide a bridge from familiar to new and from simpler to more complex formulations; thus they may be needed to aid the comprehension of those not immersed in the details of measurement modeling.
From page 144...
... to be ones the student would be more likely to get right; we would expect items located above the student to be ones the student would be more likely to get wrong; and we would not be surprised to see items located near the student to be gotten either right or wrong. The exact delineation of "near" is somewhat arbitrary, and the Quest authors have chosen a particular value that we will not question here.
From page 145...
... . Reprinted with permission of the Australian Council for Educational Research, Ltd.
From page 146...
... In this figure, the expected performance on each item is shown by the gray band through the middle of the figure. The actual responses for each item are shown by the height of the darker gray shading in each of the columns (representing individual items)
From page 147...
... 29~. ADDING COGNITIVE STRUCTURE TO MEASUREMENT MODELS The preferred statistical strategy for incorporating substantive structure into measurement models is to make the measurement model more complex by adding new parameters.
From page 148...
... Hierarchization Returning to the Wolf et al. quotation given earlier, their initial call was for "a developmentally ordered series of accomplishments." A statistical approach to this would be to posit a measurement model consisting of a series of latent classes, each representing a specific type of tasks.
From page 149...
... There are a number of ways to develop measurement models suitable for hierarchical contexts, depending on which of the several approaches outlined above true score models, IRM, and latent class modeling one is using. For example, hierarchicalfactor analysis has been used to postulate hierarchies of dimensions (i.e., dimensions that "cause" subdimensions)
From page 150...
... ~ Attribute3M ) _ 7 All.; 41 ~ | Response 1 | I Response2 | Attribute} Resoonse3 1 ~~ ~1 ' 1 FIGURE 4-15 Latent class model with a hierarchy on the student construct model.
From page 151...
... In this approach, students are considered to be members of one of a set of latent classes, but each latent class may be characterized by a latent continuum or some other latent structure (for an alternative formulation, see Yamamoto and Gitomer, 19931. For example, Wilson's (1984, 1989)
From page 152...
... Two examples are reviewed in the next section the unified model and M2RCML.7 A third example, referred to as Bayes nets, is then described. Unified Model and M2RCML The first instance to be described was developed specifically for the sort of case in which it can be assumed that students' performance on tasks can be categorized into distinct and qualitative latent classes.
From page 153...
... Second, it encompasses a class of measurement models, based on Rasch modeling, which the authors view as being particularly favorable to the development of cognitive theories. According to the model, in an observable situation, the knowledge a student has determines the actions he or she selects to achieve a desired goal.
From page 154...
... , greater flexibility is needed, and that is one of the possibilities offered by the approach described in the next section. Bayes Nets A more general modeling approach that has proven useful in a wide range of applications is Bayesian inference networks, also called Bayes nets
From page 155...
... Conditional independence relationships introduced earlier in this chapter play a key role, both conceptually and computationally. The basic idea of conditional independence in Bayes nets is that the important interrelationships among even a large number of variables can be expressed mainly in terms of relationships within relatively small, overlapping, subgroups of these variables.
From page 156...
... The applications of Bayes nets to assessment described below have this character, reflecting their heritage in a psychometric history even as they attack problems that lie beyond the span of standard models. After presenting the rationale for Bayes nets in assessment, we provide an example and offer some speculations on the role of Bayes nets in assessment in the coming years.
From page 157...
... That is, even given the student's knowledge, skills, and strategy preference and given the demands of an item under both of the strategies, we cannot say for sure whether the student will solve the problem correctly; the best we can do is model the probability that he or she will do so. Bayes nets rely on conditional independence relationships.
From page 158...
... 158 KNOWING WHAT STUDENTS KNOW TABLE 4-2 Skill Requirements for Fraction Items Skills Used* If Method A Used Item # Text 1 If Method B Used 2 5 6 7 2 3 4 5 321-23= x x x x x 6 4 ___ = x 7 3-2 1 = x 8 9 3 _ 3 = 4 8 3 7 - 2 = x x x x x x x x x x x x 10 4 14 - 2 17 = x x x x x x 11 4 3l - 2 34 = x x x x x x 11_ 1= x x 14 15 2- 1 = 3 54 -3 5 = x x x x x x x x x x 16 45-14= x x x x 17 7 - - 5 = x x x x x 18 4 10 - 2 18 = x x x x x x x 19 20 7-134= x 4 1-15 = x 3 3 x x x x x x x x x x x x x x *
From page 159...
... Figure 4-17 depicts base rate probabilities of students possessing a certain skill and getting a certain item correct, or the prior knowledge one would have about a student known to use method B before observing any of the student's responses to the items. Figure 4-18 shows how one's beliefs about a particular student change after seeing that student answer correctly gA recursive representation of the joint distribution of a set of random variables X,.
From page 160...
... 160 KNOWING WHAT STUDENTS KNOW Basic fraction subtraction (Skiii 1) | Simplify/: 1 (Skill 2)
From page 161...
... 4 CONTRIBUTIONS OF MEASUREMENT AND STATISTICAL MODELING TO ASSESSMENT Yn° - ==¢ = I_ Skill 2 ~ / 1' Yne~¢ / a;/ Skill 3 / yes Final_ Skills 1 & 3 lilt ~ Item 9 Item 16 1~ Item 12 \ \ Item 14 / / / / 1 Byes i v [~ v ~ ~ 3 as 1, 2, 3, 4 5 Item 4 Item 11 Item 20 1~ YE: Item 10 Item 18 \ items ~ Item 8 ~3 Yno Skill 5 ~4,5 1~ 1 Item 7 Item 19 1 ~ FIGURE 4-17 Base rate probabilities for Method B NOTE: Bars represent probabilities, summing to one for all possible values of a variable.
From page 162...
... 162 KNOWING WHAT STUDENTS KNOW Yn° - ~~~~— Skill 2 one ~ 4` Item 9 Skills 1 & 2\ oE3 Item 12 Skills i ~ 2, 3, 4 \ .ojki'' ~ \ \ Ite-m 6~_nl ~ ~ ~ 1 _ yes ' - no Ski I 5 item4 , ~ item 11 ~ YneoS r3 Ski 4 :1< Skills 1, 3, 4 at, Skills4, 3, 4, 5 Item 7 Item 19 11 ~ 3~ 4~ 5 Item 15 FIGURE 4-18 Updated probabilities for Method B following item responses. NOTE: Bars represent probabilities, summing to one for all the possible values of a variable.
From page 163...
... Each item now has three parents: minimally sufficient sets of procedures under Method A and Method B plus a new node, "Is the student using Method A or Method B?
From page 164...
... Situations involving such mixes pose notoriously difficult statistical problems, and carrying out inference in the context of this more ambitious student model would certainly require the richer information mentioned above. Some intelligent tutoring systems of the type described in Chapter 3 make use of Bayes nets, explicitly in the case of VanLehn's OLEA tutor (Martin and VanLehn, 1993, 1995)
From page 165...
... MODELING OF STRATEGY CHANGES In the preceding account, measurement models were discussed in order of increasing complexity with regard to how aspects of learning are modi°This section draws heavily on the commissioned paper by Brian Junker. For the paper, go to .
From page 166...
... Many models for Case 1 (that is, modeling strategy changes among students, but assuming that strategy is constant across assessment tasks) are variations on the latent class model of Figure 4-8.
From page 167...
... In fact, one can build a version of the Mislevy and Verhelst model that does much the same thing; one simply builds the latent class model within task instead of among tasks. It is not difficult to build the full model or to formulate estimating equations for it.
From page 168...
... Measurement models currently available can support many of the kinds of inferences that cognitive science suggests are important to pursue. In particular, it is now possible to characterize students in terms of multiple aspects of proficiency, rather than a single score; chart students' progress over time, instead of simply measuring performance at a particular point in time; deal with multiple paths or alternative methods of valued performance; model, monitor, and improve judgments on the basis of informed evaluations; and model performance not only at the level of students, but also at the levels of groups, classes, schools, and states.
From page 169...
... The long-standing tradition of leaving scientists, educators, task designers, and psychometricians each to their own realms represents perhaps the most serious barrier to progress. ANNEX 4-~: AN APP0CA~ON OF BALES NETS IN AN INTELLIGENT TUTORING SYSTEM As described in Chapter 3, intelligent tutoring systems depend on some form of student modeling to guide tutor behavior.
From page 170...
... strong.41 ~ _ Knowledge weak .59 ~ _ Use of Gauges Evaluations of Canopy Actions / Landing Gear Scenario \ Requisites—Split Possible sppit .1 7 earn .1 7 reoun .1 rrraend r 1 Evaluations of Landing Gear Actions ANNEX FIGURE 4-1 Simplified version of portions of the inference network through which the HYDRIVE student model is operationalized and updated. NOTE: Bars represent probabilities, summing to one for all the possible values of a variable.
From page 171...
... Potential observable variables cannot be predetermined and uniquely defined in the manner of usual assessment items since a student could follow countless paths through the problem. Rather than attempting to model all possible system states and specific possible actions within them, HYDRIVE posits equivalence classes of system-situation states, each of which could arise many times or not at all in a given student's work.
From page 172...
... The grain size and the nature of a student model in an intelligent tutoring system should be compatible with the instructional options available (Kieras, 19881. The subsystem and strategy student-model variables in HYDRIVE summarize patterns in trouble shooting solutions at the level addressed by the intelligent tutoring system's instruction.
From page 173...
... 173 PAssessment Desi Princi t n and Use es. Practices.
From page 174...
... Whenever the top digit in a column is 0, the student writes the bottom digit in the answer; i.e., 0 - N = N 42lo Whenever the top digit in a column is 0, the student writes 0 in 120 the answer; i.e., 0 - N = 0.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.