The National Academies Press

Currently Skimming:

4 The Methodologies Used to Derive Two Illustrative Rankings
Pages 49-64

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 49... ... Users of rankings should clearly understand the basis of the ranking, the choice of measures, and the source and extent of uncertainty in them. It is highly unlikely that rankings calculated from composite measures will serve all or even most purposes in comparing the quality of doctoral programs. Read the entire page →
From page 50... ... The rankings were calculated using the opinions of faculty in each program area of both what was important to program quality in the abstract and, separately, how experts implicitly valued the same measures when asked to rate the quality of specific programs. • Transparent. Read the entire page →
From page 51... ... USE OF RANKINGS In attempts to rank doctoral programs, sports analogies are especially inappropriate. There are no doctoral programs that, after a long regular season of competition followed by a month or more of elimination playoffs, survive to claim "We're Number 1! Read the entire page →
From page 52... ... Those who view rankings as a competition may find this abundance of rankings confusing, but those who care about informative indicators of the quality of doctoral programs will likely be pleased to have access to data that will help them to improve their doctoral programs. Read the entire page →
From page 53... ... DATA Answers to questions provided by 4,838 doctoral programs at 221 institutions and combinations of institutions in 59 fields across the sciences, engineering, social sciences, arts, and humanities covering institutional practices, program characteristics, and faculty and student demographics obtained through a combination of original surveys and existing data sources (NSF surveys and Thompson-Reuters publication and citation data) Read the entire page →
From page 54... ... The survey-based approach is idealized. It asks about the characteristics that faculty feel contribute to quality of doctoral programs without reference to any particular program. Read the entire page →
From page 55... ... This method of calculating ratings and rankings takes into account variability in rater assessment of the things that contribute to program quality within a field, variability in the values of the measures for a particular program, and the range of error in the statistical estimation. It is important that these techniques yield a range of rankings for most programs. Read the entire page →
From page 56... ... All faculty were given a questionnaire and were sent the National Survey of Graduate Faculty, asked to identify the program characteristics in which contained a faculty list for up to 50 programs three categories that they felt were most in the field. Raters were asked to indicate familiarity important, and then identify the categories that with program faculty, scholarly quality of program were most important. Read the entire page →
From page 57... ... Once the ratings were obtained, they were then related to the 20 measures through a modified regression technique.12 Specification of the Measures In addition to the reputational measures the 1995 study provided a few program characteristics: faculty size, percentage of full professors, and percentage of faculty with research support. In addition, awards and honors received in the previous five years and the percentage of program faculty who had received at least one honor or award in that period were given for the arts and humanities. Read the entire page →
From page 58... ... Again, the committee was unable to collect citation data for the humanities. • The committee for the current study asked the institutional coordinators to name the programs they wished to include, but it did define a program as an academic unit that fits at least three of these four criteria: ⎯ Enrolls doctoral students ⎯ Has a designated faculty ⎯ Develops a curriculum for doctoral study ⎯ Makes recommendations for the award of degrees.13 Because separate programs were being housed in different academic units, a few institutions used this definition to split what would normally be considered a program into smaller units that still met the criteria -- that is, what is normally perceived as a unified program was ranked as separate programs. Read the entire page →
From page 59... ... Such errors can arise from clerical mistakes, from misunderstandings by respondents about the nature of the data requested from them, or from problems within the public databases used. That said, even though the input data underwent numerous consistency checks and the participating respondent institutions were given the opportunity to provide additional quality assurance, the committee is certain that errors in input data remain, and that these errors will propagate through to the final posted rankings. Read the entire page →
From page 60... ... On the one hand, reputational measures are generally recognized to have many strengths, including subtlety and breadth of assessment and the widespread use of such markers. On the other hand, reputational measures may reflect outdated perceptions of program strength as well as the well-known halo effect by which some weak programs at a strong institution may be overrated.15 On balance, recognition of these shortcomings resulted in the committee's decision to reject the direct use of these perceived quality measures. Read the entire page →
From page 61... ... The measures of importance to the faculty were correlated with the perceived quality measure, suggesting that these two parameters describe valid measures of real program quality. The R ranking, then, reflects the relation of the subjective ratings to the data, but by relying entirely on objective data, even this measure, in effect, eliminated any subjective adjustments raters might make in the way they perceived the quality of specific programs, as contrasted with the application of rules they might apply to evaluate programs in general. Read the entire page →
From page 62... ... Each of these programs was rated separately, but they were all included in computation of the range of rankings. For example, Harvard has three doctoral programs under "Economics," and Princeton has two doctoral programs under "History."16 Because the assessed quality of these programs tends to be similar, multiple programs from the same university could occupy multiple slots in a similar position in the range of rankings, thereby "crowding out" or reducing the rankings of other programs entering higher-ranking ranges and thus distorting the reported results. Read the entire page →
From page 63... ... Small differences in the variables can result in major differences in the range of rankings, especially when a program is very similar on other measures to other programs in its field. But individual instances of programs that should have been ranked considerably lower or significantly higher than the tables indicate may emerge, and so it is strongly recommended that the rankings of individual programs be treated with circumspection and caution and analyzed carefully. Read the entire page →

From page 49...

... Users of rankings should clearly understand the basis of the ranking, the choice of measures, and the source and extent of uncertainty in them. It is highly unlikely that rankings calculated from composite measures will serve all or even most purposes in comparing the quality of doctoral programs.

4 The Methodologies Used to Derive Two Illustrative Rankings Pages 49-64

4 The Methodologies Used to Derive Two Illustrative Rankings
Pages 49-64