The National Academies Press

Currently Skimming:

3 Assessment Methods for College Competencies
Pages 77-116

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 77... ... The quality of assessment matters. This chapter focuses on the nature and quality of existing assessments of the identified competencies; summarizes well-established principles of assessment development and validation that higher education stakeholders should keep in mind as they develop, select, and/or use assessment to support student success; and considers available options and future directions for improving assessment practices. Read the entire page →
From page 78... ... The chapter ends with conclusions and recommendations. CURRENT ASSESSMENT METHODS The committee examined the status of current assessments based on its review of the literature on intra- and interpersonal competencies, focusing primarily on the eight identified competencies. Read the entire page →
From page 79... ... and the eight identified competencies in particular. Among the 87 assessment instruments used across the intervention studies discussed in Chapter 2, 74 (85%) Read the entire page →
From page 80... ... Despite their prevalence in assessments of the eight competencies, selfrating scales have several well-known limitations, which are discussed further later in this chapter. The first is social desirability: respondents may distort their responses to avoid embarrassment and project a favorable image (Zerbe and Paulhus, 1987) Read the entire page →
From page 81... ... Current assessments of the eight competencies identified in Chapter 2 do not rely on others' ratings, although ETS formerly offered the Personal Potential Index (PPI) , which used this approach to measure several competencies, including planning and organization -- behaviors related to conscientiousness. Read the entire page →
From page 82... ... Interviews Interviews are common in higher education admissions, particularly for graduate school, medical school, and other professional schools. They likely are often intended to determine some of the eight identified intrapersonal competencies, such as growth mindset, even if implicitly and imperfectly, and they suffer from the same limitations that characterize other self-report measures. Read the entire page →
From page 83... ... In the K-12 arena, for example, it is becoming increasingly common to ask students to assess their effort and efficacy in completing their work. Similarly, students' responses to challenging problems provided indicators of behaviors related to conscientiousness in a number of the intervention studies discussed in Chapter 2. Read the entire page →
From page 84... ... SJTs differ from self- and others' ratings in that they provide a hypothetical situation and ask the respondent to select the most appropriate response to that situation from a set of possibilities or to rate the appropriateness of each possibility. To date, assessment instruments designed to measure the eight competencies identified in Chapter 2 have not used this format. Read the entire page →
From page 85... ... These assessment methods tend to address implicitly some of the eight identified competencies, including behaviors related to conscientiousness, intrinsic goals and interest, growth mindset, and academic self-efficacy. However, these methods are not evident in the research reviewed in Chapter 2. Read the entire page →
From page 86... ... These examples are exceptions, however. Overall, the assessments used to measure each of the eight identified competencies differed across different investigators. Read the entire page →
From page 87... ... Reliability Reliability refers to the degree to which "test scores for a group of test takers are consistent over repeated applications of a measurement procedure and hence are inferred to be dependable and consistent for an individual test taker" (American Educational Research Association et al., 2014, p. Read the entire page →
From page 88... ... The Standards point to construct underrepresentation -- defined as "the extent to which a test fails to capture important aspects of the construct domain that the test is designed to measure, resulting in test scores that do not fully represent that construct" (American Educational Research Association et al., 2014, p. Read the entire page →
From page 89... ... . In fact, assessment instruments used to measure all of these competencies might contain similar or identical item content. Read the entire page →
From page 90... ... . For example, because sense of belonging and growth mindset are conceptually distinct constructs, Read the entire page →
From page 91... ... , and then correlating scores from the assessment with such outcome measures as graduation, GPA, absenteeism, time to degree, or any number of academic outcomes valued by higher education leaders. Evidence for interpretation of test scores as indicating readiness for college success comes from such correlations between the test scores and such indicators of college success. Read the entire page →
From page 92... ... A more immediate validity issue concerns faking responses on intra- and intrapersonal competency assessments, particularly if they include Likertscale response items, which is the case for 85 percent of the assessments used in the interventions reviewed. Respondents can be tempted to use the extremes of the scale -- for example, to endorse all positive statements as "most like me" and all negative statements as "least like me." Indeed, students and applicants are often motivated to appear diligent, enthusiastic, and appreciative to those for whom they are completing an assessment (e.g., faculty members, potential employers, even researchers) Read the entire page →
From page 93... ... . Situational judgment tests and performance tests, discussed earlier, and forced-choice formats, discussed below, are intended to reduce the possibility of faking on intra- and interpersonal competency assessments and thereby increase the validity of scores. Read the entire page →
From page 94... ... (American Educational Research Association et al., 2014, p. Read the entire page →
From page 95... ... . THE QUALITY OF CURRENT ASSESSMENTS OF THE IDENTIFIED COMPETENCIES This section considers evidence on the reliability, validity, and fairness of existing assessments of the eight intrapersonal competencies identified in Chapter 2. Read the entire page →
From page 96... ... As shown in Table 3-1, of the 46 studies that assess at least one of the eight competencies, fewer than half provide evidence of the reliability of the assessments used. Studies reporting on reliability almost uniformly report coefficient alpha, a measure of internal consistency. Read the entire page →
From page 97... ... Assessment Quality in Established Instruments To date, relatively few assessment instruments measuring one or more of the eight competencies have undergone enough research, development, and testing to yield durable evidence of reliability, validity, and fairness. That said, there are a few notable exceptions. Read the entire page →
From page 98... ... . The relationship between the 15 scales and final course grades showed modest evidence of predictive validity, with self-efficacy for learning and performance showing the highest validity (r = 0.41) Read the entire page →
From page 99... ... . Engage and SuccessNavigator ACT and ETS, the major publishers of college admissions tests, have each developed assessments of college readiness -- Engage and SuccessNavigator, respectively -- that measure a few of the eight competencies identified by the committee, along with a range of other competencies. Read the entire page →
From page 100... ... . TOWARD BETTER MEASUREMENT The committee's analysis of the quality of existing assessments of the eight identified competencies indicates room for improvement. Read the entire page →
From page 101... ... In addition, test specifications lay out a detailed plan for who will be tested and how the test will work. The Standards (American Educational Research Association et al., 2014, p. Read the entire page →
From page 102... ... The Standards also suggest that the process of documenting the validity of the interpretation of test scores starts with the rationale for the test specifications. The specifications should be subject to external review of their quality by qualified and diverse experts (American Educational Research Association et al., 2014, p. Read the entire page →
From page 103... ... . In the present context, universal design means designing items that will be accessible to as wide a range of the intended examinee population as possible -- for example, by eliminating unnecessarily complex language (when such language is construct-irrelevant) Read the entire page →
From page 104... ... Test Administration Test administration encompasses, for proctoring, the qualifications of those involved in the test administration, security, timing, translations, and issues associated with accommodations for test takers with special needs. All of these issues are considered to ensure that an assessment measures the construct it is intended to measure and to minimize the effects of cheating, adverse testing conditions, and other factors that might otherwise induce construct-irrelevant bias or variance on test scores. Read the entire page →
From page 105... ... . New Testing Techniques Supporting Improvement Rigorous test development processes can help improve the quality of existing measures of the eight competencies, as can new approaches to ameliorating some of the shortcomings of self-report measures, the most common type of measures currently used to assess the eight identified competencies. Read the entire page →
From page 106... ... An example item reflecting sense of belonging can be found in the PISA 2012 survey (OECD, 2013b) : "I feel like I belong at school." Students report their response on the four-point Likert scale "strongly agree," "agree," "disagree," and "strongly disagree." A corresponding anchoring vignette for this item might be something like the following: "After a class lecture, Rodrigo will discuss the class with his peers comfortably and without a sense of competition. Read the entire page →
From page 107... ... Results of an experimental laboratory study suggest that Likert scale and forced-choice methods provided similar information about respondents with respect to the traits measured, although assessment scores in both formats were affected by instructed faking conditions (Heggestad et al., 2006) Read the entire page →
From page 108... ... used a technology-assisted "diligence task" to measure academic self-regulation, a behavior related to conscientiousness. Designed to mirror students' real-world behavior when trying to complete homework in the face of digital distractions, the task gave participants the choice of completing boring single-digit subtraction problems or consuming media (either brief, viral videos or playing the video game Tetris) Read the entire page →
From page 109... ... A few of the intervention studies in Chapter 2 used multiple assessment methods to gather additional information about the target competency (see Appendix B) Read the entire page →
From page 110... ... Nonetheless, it must be recognized that the context of intra- and interpersonal competency assessment is wide-ranging and encompasses numerous individual, group, and institutional entities operating and interacting simultaneously (e.g., diverse students and peer groups, instructors with varying roles and experience, classrooms with the potential to create and facilitate opportunities to exhibit and develop intra- and interpersonal competencies, institutions and departments that help establish both mission and culture) Read the entire page →
From page 111... ... In addition to answering this sort of question, multilevel models can incorporate longitudinal data (e.g., whether a construct measured among instructors at time A affected another construct among students at time B) ; measurement error variance (e.g., modeling relationships while accounting for the fact that psychological measures are never perfectly reliable) Read the entire page →
From page 112... ... The test development practices used to create assessments of cognitive knowledge and skills that meet professional standards are equally applicable to intra- and interpersonal competency assessments. Current Assessments The committee examined the assessments used in the intervention studies targeting the eight competencies identified above and commissioned a literature search on measurement of these competencies. Read the entire page →
From page 113... ... RECOMMENDATION 6: Institutions of higher education should not make high-stakes decisions based solely on current assessments of the eight identified competencies, given the relatively limited research to date demonstrating their validity for predicting college success. Assessments for Low-Stakes Purposes Researchers and practitioners in higher education also use assessments for low-stakes purposes, such as to evaluate the quality of interventions, policies, and instructional practices or simply to monitor student change over time. Read the entire page →
From page 114... ... Definition of Constructs Being Assessed After reviewing both general principles for assessment development and use and recent research on measurement of the eight identified competencies, the committee concluded that defining each competency clearly and comprehensively is a critical first step in developing high-quality assessments. Clear definitions are especially important in light of the wide variety of terms used for these competencies. Read the entire page →
From page 115... ... RECOMMENDATION 8: Federal agencies and foundations should support additional research, development, and validation of new intra and interpersonal competency assessments that address the shortcom ings of existing measures. Fairness in Assessment The Standards make clear that fairness to all individuals for whom an assessment is intended should be a driving concern throughout assessment development, validation, and use. Read the entire page →
From page 116... ... RECOMMENDATION 9: Researchers and practitioners in higher education should consider evidence on fairness during the develop ment, selection, and validation of intra- and interpersonal competency assessments. Consideration of Contextual Factors Self-, peer, or instructor ratings of such an intrapersonal competency as conscientiousness or such an interpersonal competency as teamwork may vary depending on local norms (e.g., reference group effects) Read the entire page →

From page 77...

... The quality of assessment matters. This chapter focuses on the nature and quality of existing assessments of the identified competencies; summarizes well-established principles of assessment development and validation that higher education stakeholders should keep in mind as they develop, select, and/or use assessment to support student success; and considers available options and future directions for improving assessment practices.

3 Assessment Methods for College Competencies Pages 77-116

3 Assessment Methods for College Competencies
Pages 77-116