National Academies Press: OpenBook
« Previous: 4 Progress in Science
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

5
Methods of Assessing Science

This chapter presents a general framework for thinking about methods of assessing science retrospectively or prospectively, reviews the conceptual and empirical literatures on the selected methods, and discusses their likely relevance and feasibility for research priority–setting decisions in the Behavioral and Social Research (BSR) Program of the National Institute on Aging (NIA). The focus here is on seeking methods that can provide science managers with the best possible input to priority-setting decisions while also achieving basic goals of accountability and rational decision making. Quantitative methods are attractive in terms of accountability, in the accountant’s sense of comparing different investments in research on a common numerical scale. They are also conducive to improved outcomes to the extent that the measures are valid indicators of what they purport to measure.

Most of this chapter is devoted to examining the strengths and weaknesses of various methods of assessing science. Science assessments have become commonplace and include assessments of fundamental science (National Science and Technology Council, 1996), of technology development programs (Link, 1996; Ruegg and Feller, 2003), and of the performance of specific academic and research laboratories, in both the United States and other countries (e.g., Bozeman and Melkers, 1993; Moed et al., 2004). The assessments include commission reports, agency-specific commissioned evaluations, and academic works. As Table 5-1 shows, various assessment methodologies have arisen from different disciplinary and multidisciplinary perspectives, measuring different aspects of scientific activity, and addressing various science and technology policy and assessment questions.

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

TABLE 5-1 Some Methodologies for Science Assessment and the Attributes of Scientific Activity They Measure

Methods

Attributes Measured

Coauthorship links, multinational research articles

Scientific collaboration, globalization

Patent citation analysis

Economic value of patents

Cross-disciplinary coauthorships and citations

Multidisciplinarity and interdisciplinarity of research

Citations from clinical guidelines, regulations, and newspapers

Practical use of research

Scientist-inventor relationships, citations from articles to patents

Knowledge flows from science to technology

Co-occurring word and citation analysis

Sociocognitive structures in science

Use of first names of authors or inventors

Participation of women in science

SOURCE: Moed et al. (2004).

These assessment efforts have generated several broadly accepted “best practice” principles. For example, the National Science and Technology Council (1996:xii) set forth the following nine principles for assessment of fundamental science programs:

  • Begin with a clearly defined statement of program goals.

  • Develop criteria intended to sustain and advance the excellence and responsiveness of the research system.

  • Establish performance indicators that are useful to managers and encourage risk-taking.

  • Avoid assessments that would be inordinately burdensome or costly or that would create incentives that are counterproductive.

  • Incorporate merit review and peer evaluation of program performance.

  • Use multiple sources and types of evidence, for example, a mix of quantitative and qualitative indicators and narrative text.

  • Experiment in order to develop an effective set of assessment tools.

  • Produce assessment reports that will inform future policy development and subsequent refinement of program plans.

  • Communicate results to the public and elected representatives.

These principles are generally sensible, but they leave some important questions unaddressed. One of these is how to establish useful performance

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

indicators and incorporate peer review and evaluation at the same time. In this chapter, we adopt a conceptual framework for thinking about assessment methods that we think will allow such questions to be addressed more systematically. Our recommendations are in Chapter 6.

A FRAMEWORK: ANALYSIS AND DELIBERATION AS ASSESSMENT STRATEGIES

We find it useful to consider the issues of research assessment, both prospective and retrospective, in light of a distinction made in a previous National Research Council (NRC) study. In Understanding Risk: Informing Decisions in a Democratic Society (1996), an NRC committee distinguished between two methods for seeking practical understanding that it called analysis and deliberation. Analysis “uses rigorous, replicable methods developed by experts to arrive at answers to factual questions”; deliberation “uses processes such as discussion, reflection, and persuasion to communicate, raise and collectively consider issues, increase understanding, and arrive at substantive decisions” (p. 20). In stylized terms, counting patents or citations to studies, constructing network diagrams of communication patterns, and enumerating publications in designated major journals are analytic methods, whereas peer review conducted through discussions in advisory panels is a deliberative method.

Understanding Risk noted that science policy decisions typically employ both analysis and deliberation and argued that it is appropriate for them to do so. Among the reasons identified for using deliberation are that the most useful type of analysis often is not self-evident and is best determined through dialogue involving both the potential producers and the users of the analysis, and that judgment is inevitably involved in finding the meaning of analytic findings and uncertainties for specific decisions, particularly when the decisions must be made against multiple objectives. The report defined the challenge for public policy as one of finding procedures (called analytic-deliberative processes in the report) that appropriately integrate the two methods. In an effective analytic-deliberative decision process, those involved in making a decision determine the kinds of analysis they need, see that the analysis is conducted as needed, and deliberate on the choices they face, informed by the analysis and discussion of its strengths and limitations.

A central point of Understanding Risk was that even in such enterprises as environmental risk assessment, which are commonly seen as relying almost completely on analysis, the need for deliberation is critical. Expenditures on analysis can have little practical value if the analysis is not directed to the most important questions for decision makers. Deliberation is needed to ensure that government procures the right science for the pur-

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

pose. Deliberation is also critical because a single set of scientific findings can have various implications for policy, depending on judgments about matters that analysis alone cannot resolve, such as how much weight to give to the different outcomes of a policy choice that has multiple consequences and how to act in the face of gaps and uncertainties in available knowledge. Deliberation is needed to give due consideration to the possible meanings of what is and is not known. So even in very analysis-heavy areas of policy, the value of analysis ultimately depends on the quality of the deliberation that shapes and interprets the analysis.

Research policy presents a different situation from environmental and health policy with respect to the roles of analysis and deliberation. The value of deliberation is well established for making decisions about scientific research portfolios, as reflected in the careful efforts that research agencies such as the National Science Foundation (NSF) and the National Institutes of Health (NIH) make to devise and reevaluate their peer review and advisory processes. However, the value of analysis, especially that grounded in the use of quantitative measures, remains in dispute. Debate also continues about whether use of analytical methods has contributed to improved science policy decision making or has been dysfunctional (Perrin, 1998; Radin, 2000; Feller, 2002; Weingart, 2005). As already noted, the prevailing view in the scientific community emphasizes expert peer review as the most effective available method.

Reframing the debate along the lines suggested by Understanding Risk, that is, in terms of the appropriate roles of analysis and deliberation, may help to find optimal ways to use both sources of information. We begin by noting that all policy decisions in a democracy are ultimately deliberative. The issue is not whether to replace deliberation with analysis in making decisions, because decisions will continue to be deliberative. The issues are whether there are useful roles for analysis in a deliberative decision process and, if so, how the use and interpretation of analysis should be organized (and by whom) in research policy making. Thus, it is useful to focus attention on a set of empirical questions such as these:

  • Can deliberations about the past progress of scientific fields and the best way to shape research portfolios be better informed by the use of appropriate analytic methods?

  • If so, which analytical tools hold promise for better informing judgments about behavioral and social research on aging?

  • What institutional structures and procedures are effective for selecting, shaping, and interpreting analysis to inform research policy choices?

  • How do different structures and procedures for analytic deliberation affect the distribution of decision-making influence and authority among researchers, research managers, and representatives of society?

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

In keeping with the tenor of mainstream conclusions of the academic research community that is the primary performer of BSR-funded research, we find it convenient to start with the judgment of the Committee on Science, Engineering, and Public Policy (1999a) report on evaluating research programs, that expert judgment is the best means of evaluating research. We further accept the widespread assessment found in the bibliometric literature that there is no single approach that will always work best and therefore that it makes sense to develop a toolbox of methods, both analytic and deliberative, for informing judgment (e.g., Grupp and Mogee, 2004). Different analytic tools might have value for different assessment purposes. They might be useful for measuring research results, organizing information brought to bear by applying other analytic tools (e.g., to arrive at numerical weights for different kinds of information), or helping to structure deliberative processes. Thus, it is appropriate to ask both about the validity of particular measures or indicators for particular purposes and about how such measures might add value to a deliberative, judgment-based process.

For convenience, we divide the following discussion into methods that are primarily analytical, those that are primarily deliberative, and those that combine both strategies. On the basis of an initial review of a larger set of decision-making techniques, we have selected three analytical approaches as most applicable to the needs for prospective and retrospective assessment as defined by BSR: Bibliometric analysis of the results of research and the connections among research efforts, reputational studies (such as can be obtained by surveying the members of research communities), and decision analysis.1 We also discuss peer evaluation procedures, usually a purely deliberative method. Finally, we turn to analytic-deliberative approaches. A familiar one in the context of the NIH is the Consensus Development Conference, which combines analysis and deliberation but has not been adapted for making research policy decisions. We also discuss one ongoing effort in the NRC to employ an analytic-deliberative approach to a problem of comparing research in different fields, in this case, energy research.

ANALYTICAL METHODS

As noted in Chapter 4, comparative analysis of scientific progress across fields presents major challenges. The uneven pace and seemingly unpredictable paths of scientific progress and of its application to practical problems make it hard to get unambiguous meaning from even the most systematic analysis of past events in a field. Comparisons across fields are even more difficult because the paths toward progress and the barriers to it may vary systematically from one field to another. These are among the reasons that scientists and science managers have at times resisted the use of analytical techniques, especially quantitative ones, for assessing science. In addition,

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

there is the possibility that quantitative methods may be applied in automatic ways that exclude the judgment of the people who know the science best. In the discussion that follows, we presume that the value of analyses is not to replace judgment, but to inform it. We consider the potential roles of analytic techniques in that light.

Bibliometric Analysis

The term scientometrics broadly relates to the generation, analysis, and interpretation of quantitative measures of science and technology. As described by van Raan (1988b:1), the field is based on the use of “mathematical, statistical, and data-analytical methods and techniques for gathering, handling, interpreting, and predicting a variety of features of the science and technology enterprise, such as performance, development, dynamics.” Bibliometrics, the quantitative study of patterns of published scientific output and their use (e.g., citations), is the subset of scientometrics that is our primary focus of attention.2

Bibliometric and other scientometric methods were developed originally for exploring the workings of the scientific enterprise, that is, as descriptive and analytical tools, not as evaluative or predictive ones (Thackray, 1978; Godin, 2002). Their descriptive accuracy was originally validated against expert opinion. Scientometric researchers believe that a better quantitative understanding of scientific processes is needed in order to build and validate theories in the sociology of knowledge (e.g., van Raan, 2004). The distinction between the descriptive uses of bibliometrics to understand the working of science and the evaluative uses to assess performance (van Leeuwen, 2004) is important because the strengths and weaknesses of any quantitative approach, and its value to its users, depend on the questions being posed and the use to which the technique is put.

Measurement of publications and citations can be used to describe the activities of a nation, an institution, a research group, or an individual; the dynamics of fields of science that can be specified in bibliometric terms (e.g., by their leading journals or by keywords that can be found in the titles or abstracts of publications); and the relationships between and among specified fields. It can be used to build and test theories of the content and structure of science (e.g., Price, 1963), to demonstrate the contribution of publicly funded science to technological innovation (e.g., Narin et al., 1997), to highlight “hot” areas of science or hot researchers, or to track the import and export of ideas among fields.3 When bibliometric measures are treated as outputs, they can be combined with input measures, such as expenditures or personnel complements, to compare the past performance of research institutes, departments, and the like, or of fields, subfields, and

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

disciplines. In this use, they have the advantage of making different things comparable on the same scale.

One potentially valuable contribution of bibliometrics to the assessment of scientific fields is that it makes possible the assessment of the import and export of ideas between fields by following cross-citation patterns (van Leeuwen and Tijssen, 2000). By identifying the authors of articles published in or cited by a diverse set of journals, it is possible in principle to identify patterns of scientific collaboration across fields. It also is possible, by examining the scholarly profiles of the collaborators or their institutions or both, to assess whether particular established or newly emerging research fields are attracting the best and brightest of a nation’s current and future scientists (Glanzel and Schubert, 2004; Morillo et al., 2001). Bibliometric data might also be useful for discerning and offsetting observed tendencies of proposal review panels to discriminate against “crossdisciplinary proposals that lack an established peer group” (Porter and Rossini, 1985:38).4

However, bibliometric measures have shortcomings as a guide to evaluative use and research decision making by mission agencies. Bibliometrics emphasizes publications in peer-reviewed journals. It does not account for practical applications that may be of value to research sponsors, research performers, and society. It provides no place for nonacademics to apply their values in gauging the societal importance of research findings. As usually implemented, it advantages journal authors over book authors or others whose works are not in major databases (Lamont and Mallard, 2005), and it favors quantitative work (which is more likely to appear in journals than books) and authors who speak to narrower and academic audiences (Clements et al., 1995, looking at sociology, in Lamont and Mallard, 2005). It favors types of research that suit high-impact journals over other types of research, such as clinical and application-based research (Kaiser, 2006). And it may overvalue scientific outputs that are frequently cited because they are controversial or wrong. Many of these shortcomings can be alleviated to a degree by careful research design, but however well this is done, the evaluative meaning of bibliometric comparisons requires interpretation, as we discuss below.

To move beyond a general review of the strengths and weaknesses of bibliometric techniques as a means of setting research priorities, we were briefed by Anthony F.J. van Raan, a leading developer and analyst of bibliometric techniques, on what one might learn from those techniques; we then commissioned a pilot study designed to determine whether it was possible to map the direction of behavioral and social science research in aging using bibliometric indicators. Committee members, whose expertise extends across (and beyond) the behavioral and social science domain of BSR’s program, specified keywords intended to define certain areas of research on aging of programmatic concern to BSR. For each area, committee members

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

supplied an initial list of core journals in which research containing these words was likely to be published. Ed Noyons, a distinguished bibliometrican and specialist in bibliometric mapping from the University of Leiden, Netherlands, was commissioned to conduct a fuller search of bibliometric citations based on these keywords and to develop bibliometric maps of relationships between and among research clusters, journals, and authors.

The pilot study quickly revealed that the basic outputs of the exercise, such as the size of the corpus of work in a field and the boundaries of the field (e.g., which key articles are and are not included) were quite sensitive to the choice of keywords. The pilot study strongly suggested that if bibliometric indicators are to be used for research assessment, considerable reliance must be placed on the subject-matter experts to guide and review the work of the specialists in bibliometrics who will perform the actual studies. Several iterations of generation and analysis of data will probably be needed before the assigned experts are satisfied with the output. The reliability of this method, that is, the extent to which different experts’ lists of keywords would yield similar results, is unknown. Thus, the meaning of analyses that are sensitive to expert judgment on the input end is likely to be open to different interpretations by experts who have different views of the research area in question. These concerns are likely to be most serious when bibliometric analysis is used to assess the dynamics of emerging research fields that lack established publication outlets or generally shared terminology.

Reputational Studies

Surveys and interviews have often been used to solicit the views of representative samples of scientific communities about issues on which judgments are to be made. An example is the periodic surveys the NRC has organized to assess research doctorate programs in American universities (e.g., National Research Council, 1995b, 2003). The reputational approach has the advantages that, unlike informal peer-review discussions that draw on reviewers’ understandings of the reputations of researchers and research fields, it is systematic, it can be used continually over time, and its methods can be made transparent. The approach also has significant validity problems for its usual purpose, which is to compare entities that are presumed to be of the same type (such as university departments of psychology or economics). The problems include biases that may be introduced by relying on reputation (e.g., sensitivity to name recognition effects driven by the size of the research unit or the presence of a single well-known individual) and the difficulties of comparability among entities that may have the same names but are quite different in composition or objectives. In addition, the nature of the entities being compared can change over time, as, for example, when

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

taxonomies of fields become outmoded (see National Research Council, 2003, for further discussion).

Reputational approaches have additional limitations for the task of concern in this study, making comparisons across different scientific fields or subfields. Most fundamental is that there is no single scientific community that can be surveyed to get meaningful information. Very few individuals, if any, are equally well informed about each of the fields to be compared, so that sampling techniques create a difficult, perhaps insoluble, dilemma. It is possible to create an acceptable representative sample of researchers across the broad area in which comparisons are to be made (behavioral and social research on aging), but such a sample will include many respondents who are well informed about their own parts of this broad field but not about other parts. Alternatively, it is possible to create acceptable representative samples for each of the narrower fields to be compared, but this procedure will reproduce the problem that led to this study in the first place: the possibility that different standards of quality are being used in different subfields, making community judgments noncomparable across fields. We have been unable to identify a way out of this dilemma.

We do not see value in reputational studies for making comparisons of different fields without a prior demonstration that there is a valid method of eliciting comparable judgments. Value may be gained by systematically eliciting judgments of research progress from samples of narrow research communities, especially if there may be differences in judgments within the field (e.g., between younger scholars and the ones most likely to be placed on deliberative peer review groups). However, surveys should not replace judgment, and research managers need to judge whether the potential knowledge to be gained from adding a survey to judgment is worth the incremental cost of survey research. Our judgment is that it will be worthwhile only in special cases.

Decision Analysis

The above analytical methods all inform judgment by providing decision-relevant information that decision participants would not otherwise have. Decision analysis, by contrast, provides a set of techniques that can be used to organize and structure deliberation.

Decision-analytic techniques have not been given much attention in science policy, and, when proposed as decision aids, they have often met stiff resistance from scientists (Fischhoff, 2000; Arkes, 2003). We see these techniques as worthy of renewed attention because they have proved useful for assisting choices in other practical contexts in which (a) decisions are complex, (b) decisions have consequences for multiple important outcomes, (c) considerable uncertainty exists about how each choice will affect the

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

outcomes, and (d) opinions diverge about the relative value of the outcomes. For example, these techniques have been used to help design safety features in complex technologies, to assess the environmental and public health risks of chemicals, and to inform decisions about the siting of hazardous waste facilities. Decision-analytic techniques help clarify and allow for separate consideration of the key elements of a decision, particularly the relationships between actions and their various consequences, the valuation of these consequences, and the relationships among the decision elements (e.g., Edwards, 1954; Behn and Vaupel, 1982; Howard and Matheson, 1989; Pinkau and Renn, 1998; van Asselt, 2000; Jaeger et al., 2001; North and Renn, 2005).

Decision analysis offers both quantitative and qualitative methods. Quantitative techniques include benefit-cost analysis, multiattribute utility analysis (von Winterfeldt and Edwards, 1986), value-tree analysis (e.g., Keeney and Raiffa, 1976), value-of-information analysis (Raiffa, 1968), quantitative characterization of uncertainties (Morgan and Henrion, 1990), and prediction markets (Berg and Reitz, 2003). The usefulness of these techniques depends on the availability of quantitative estimates of the effects of policy choices or new scientific information on highly valued outcomes that are reasonably accurate or have estimable uncertainties. It also depends on developing some justifiable method for aggregating different kinds of outcomes. Because of the shortcomings of fundamental understanding of how research activities lead to scientific or technological progress (see Chapter 4), the continuing uncertainty or loose coupling of such progress when it occurs to the desired societal objectives, and the difficulties associated with aggregating different kinds of outcomes, these basic requirements are not currently met for research policy on behavioral and social science and aging. Thus, we do not recommend the use of quantitative techniques of decision analysis to inform decisions about setting priorities for basic behavioral and social science research on aging.

Qualitative techniques of decision analysis, by which we mean techniques for structuring or organizing decision problems without attempting to quantify the effects of decisions, are more modest in their objectives than the quantitative approaches, but they seem to have greater potential for assisting with priority setting in research policy. Decision analysis, used to structure choices, can make decision processes more transparent, thus contributing to accountability, by creating frameworks for examining issues, focusing deliberation on explicit evaluative criteria, and helping diverse groups understand the bases of their divergent judgments (North and Renn, 2005). It is likely that the best ways to employ decision science approaches for structuring research policy choices will have to be developed over time and adapted to meet particular needs (Fischhoff, 2000).

Here we note two approaches that may provide useful starting points

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

for such development. Both involve developing simple conceptual models of how research might contribute to a set of science policy objectives. One approach to doing this involves influence diagrams (Clemen, 1991). These are directed graphs in which each node represents a variable, arrows point from predictor variables to predicted variables, and the practical outcome variables that motivate research funding are prominently included (see Box 5-1).

Another approach specifies the objectives of the choice at hand and further specifies elements or contributing factors to each objective as a way to structure consideration of the available options. This approach was used in a recent NRC study (2005a) that recommended five priority areas for social and behavioral science research to improve environmental decision

BOX 5-1

Influence Diagrams of the Impacts of Scientific Research

Fischhoff (2000) suggests that the process of developing influence diagrams of the pathways from research to its scientific results and societal benefits can clarify the place of various research activities in the larger enterprise and promote more focused discussion of priorities, even if credible numbers cannot be calculated to estimate the strengths of the relationships that the arrows represent. Such discussion could systematically address such questions as whether anyone in the scientific community is receiving research support to understand each element in the influence diagram and “whether the research investments are commensurate with the opportunities” (Fischhoff, 2000:82).

As an example, Fischhoff presents an influence diagram in which the variable of central interest is the public health risks of Cryptosporidium. The diagram shows the roles of events in the biophysical environment (e.g., contamination of drinking water resulting from a flood), responses of individuals and organizations to the events, engineering practices (e.g., routine testing of the water), mass media coverage, and other factors. In such a diagram, various kinds of scientists can locate the points at which their research is relevant to reducing the risks.

This diagram emphasizes a practical, health-related outcome that research might help improve. Similar conceptual models might be developed for NIA’s practical goals for research, such as to “improve health and quality of life of older people” and to “reduce health disparities among older persons and populations” (National Institute on Aging, 2001); for considering other important NIA goals for research, such as to “understand healthy aging processes”; or for comparing research programs that contribute differentially to different research goals.

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

making in the public and private sectors. The study panel was given three criteria for selecting the top-priority areas: “the likelihood of achieving significant scientific advances, the potential value of the expected knowledge for improving decisions having important environmental implications, and the likelihood that the research would be used to improve those decisions” (National Research Council, 2005a:12). The panel decided that it could reduce problems of differing interpretations of these three broad criteria by specifying each criterion in more detail. Thus, it began by identifying factors that are likely to act as means to the ends implied by each criterion. For example, the panel members agreed to rate potential science priorities highly on the criterion of likelihood of achieving significant scientific advances if the following factors were judged to apply (p. 15):

  • The research community is ready and able to conduct the research (e.g., concepts, methods, and data are available but not yet adequately applied in this area).

  • Successful research would provide new frameworks for thinking or sources of understanding (e.g., data, methods) that could lead to advances in environmental decision making over time.

  • Successful research would overcome or reduce gaps in knowledge or skill that now inhibit opportunities for improved environmental decisions in a given context.

Each panel member agreed to consider how each of the contributing factors applied to each of the potential science priorities. The panel then engaged in a discussion of each of the suggested priorities in light of the criteria and the contributing factors to each, aimed at reaching consensus on a short list of recommended science priorities.

BSR might develop a similar list of dimensions or types of scientific progress that might be made in the research fields the office supports to use in priority-setting discussions of its advisory board or other appropriate deliberative bodies. For example, research fields might be judged to be making progress along such dimensions as (Lamont, 2004):

  • generativity or intellectual productivity (leading to new discoveries and theories);

  • growth (e.g., attracting students and researchers, creating journals and societies, gaining funding);

  • range (investigating an increasing scope of issues);

  • theory development (linking a widening scope of issues within a shared conceptual framework; developing and testing hypotheses about phenomena within a common framework);

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
  • interdisciplinarity (engaging questions raised by or of interest to other fields; framing issues that integrate previously separate fields);

  • attraction (gaining the attention of researchers in other fields);

  • intellectual diffusion (developing ideas or methods that are used in other fields); and

  • diffusion to practice (dissemination of scientific information to potential users in fields of policy, business, law, medical practice, etc.).

BSR might sponsor a small series of organized discussions in which researchers working in different parts of the program’s research portfolio would propose working lists of important scientific outcomes of BSR-sponsored research, such as those above. Such discussions would be used to generate a working list of scientific objectives for BSR-sponsored research.

On the basis of these discussions, BSR might hold further exploratory exercises to identify possible contributing factors to the key dimensions of scientific progress they identify. Such exercises might make possible more nuanced, focused, and transparent discussions about how different elements of the BSR research portfolio contribute to the institute’s scientific objectives. We do not recommend that BSR attempt at this time to develop quantitative measures of these dimensions that can be used to summarize the progress of different scientific fields. Any such measures will need considerable development and validation if they are to become useful, and, regardless of how much validation work is done, we emphasize that measures of the dimensions of scientific progress should be used as inputs to deliberative processes, not as replacements for them. This is because priority setting inevitably requires a weighing of the various dimensions of progress and the various program goals.

To help structure consideration of how BSR-supported research activities may contribute to the practical goals outlined in the NIA Strategic Plan, the BSR Program might convene diverse groups of scientists and potential users and beneficiaries of the research in exercises to create influence diagrams or other simple models of the ways in which BSR research might contribute to these practical goals. The models could be used to focus deliberations about how different research activities fit into the BSR Program’s objectives and where shifts in research emphasis might be justified in terms of these objectives. Again, we see these simple models as potentially useful to focus and inform deliberations about the program’s research portfolio, not as a step toward developing quantitative algorithms that would take the place of deliberation.

Subsequent to developing such exercises to elaborate the practical and scientific objectives of BSR-sponsored research, the program should consider experimenting with exercises in which groups representing the producers

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

and various important potential users of its research deliberate together about how the BSR research portfolio can better advance the office’s scientific and practical objectives. Such exercises might adapt the procedures used in the NIH Consensus Development Conference approach (see below). Such deliberations should proceed from the explicit recognition that research in different fields may be justified appropriately on different grounds. Because the objectives of BSR are widely shared across NIA, the decision-analytic exercises suggested here may be useful across the institute and deserve NIA-wide support.

As suggested by outcomes generated by various Foresight undertakings (described below), such structured deliberations may have different outcomes depending on how various constituencies are represented in the processes and what roles NIH staff and other public officials have in the process. Therefore, the processes by which they are organized should also be studied. We recommend that BSR support a series of structured deliberations involving groups that are diverse in terms of constituency to identify ways of constituting and instructing these groups that arrive at a consistent consensus that is defensible to both working scientists and agency officials. We emphasize that the value of all these decision analytic approaches is as inputs to decision making, to make consideration more systematic and transparent, not as substitutes for careful deliberation.

DELIBERATIVE METHODS

The best known deliberative approach for assessing research is peer review. Peer review panels, study sections, advisory committees, and visiting committees all engage groups of experts, usually researchers, in deliberations about science policy choices. They typically meet in person and, through discussion, arrive at collective judgments about the quality of research proposals or programs that are used as advisory input by research managers. Peer review panels typically do not rely in any explicit way on scientometric or other analytical methods (see Bornmann and Daniel, 2005). Still, there are methods to peer review. These typically involve procedures to ensure that review groups represent the full range of relevant expertise, are balanced with respect to viewpoints on matters to be deliberated, are independent of undue pressures from outside influences, and do not embody conflicts of interest, as well as procedures for review and oversight of the composition of review groups and sometimes also of their reports. Such methods are often recorded in the procedural guidelines of federal agencies, the NRC, and other organizations that routinely organize peer review groups. Decision makers in agencies may be given varying levels of discretion with regard to deviating from the collective judgment of peer review panels.

Most of the research on peer review processes concerns their use to

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

compare research proposals prospectively in a single field (e.g., Chubin and Hackett, 1990). Depending on the review process, the fields may vary from narrow in content to quite diverse and interdisciplinary. Although the copious literature on peer review contains numerous examples and personal observations on how the system operates in specific cases, we have found no systematic research on peer review for making comparisons among scientific fields, a type of assessment that raises issues somewhat different from those central for assessing individual research proposals. Thus, research on peer review processes must be interpreted carefully to draw conclusions about how such processes might best be used for comparing fields.

Early studies of peer review in the natural sciences show that success in obtaining funding was associated with bibliometric indicators of the quality of the investigators’ past work, such as the numbers of past publications and of citations to those publications, but not to other characteristics (see Lamont and Mallard, 2005, for a review; also Campanario, 1998a, 1998b; Blank, 1991; Wessely, 1996). Other studies point to consensus among natural sciences as to the concept of quality (Dirk, 1999). These findings were widely interpreted as supporting the validity of peer review. In the social sciences, however, consensus among reviewers about quality has been seen as the exception rather than the rule (Cole, 1983; Hargens, 1987), perhaps reflecting the existence of competing standards of quality (Mallard et al., 2005). For example, a study of 12 review panels sponsored by 5 funding organizations found that reviewers in the social sciences and humanities operate with a variety of concepts of the originality of research and that some of these pertain to nonintellectual characteristics of the investigator (e.g., risk-taking, integrity) (Guetzkow et al., 2004). Reviewers also differed in the importance they placed on the potential social impact of the research vis-à-vis the intellectual quality of the scholarship and in the rationales they favored in arguing for or against supporting proposals. These differences were related to the experts’ fields (e.g., social science versus humanities) and to the priorities evident in the reviewers’ own research. They were also strongly influenced by the instructions that funding agencies gave to their reviewers (Mallard et al., 2005). These findings suggest that a funding organization that clearly defines its own criteria for evaluating research can convene review panels that will apply those criteria, but that without such definition, social science review panels may use inconsistent evaluative criteria.

There is some evidence to suggest that peer review disadvantages interdisciplinary research. Specifically, there is evidence that reviewers tend to favor research that belongs to their own field or school of thought (Porter and Rossini, 1985; Travis and Collins, 1991) and that follows established paths and is therefore low risk (Langfeldt, 2001). Laudel (2006) suggests, however, that such biases can be overcome by creating review panels like

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

those that review the German Sonderforschungsbereiche (SFB). The SFBs are interdisciplinary consortia of research groups that come up for funding renewal every 3-4 years. Their review groups are interdisciplinary and stable, and they interact in a deliberative manner over time with the SFBs, thus increasing understanding within the review panel and between the panel and the research group (see also Lamont et al., 2006).

As noted, although peer review is frequently used for assessing the relative progress of research fields and for setting priorities among them, little research exists on these efforts. For example, the NRC frequently convenes groups of experts to recommend research priorities in scientific disciplines or in interdisciplinary areas. However, these panels rarely report in detail on how they selected or applied evaluative criteria (an exception is discussed in the next section). Some commentators on peer review for comparative assessment are quite cynical about the strategic use of appointment to a peer review panel to promote reviewers’ own organizational interests (e.g., Stigler, 1993; Rhoades, 2002). We have found no comparative studies of peer review processes for priority setting that examine how the structure or process of their deliberations affects the results. Similarly, we have found no studies that show how peer reviewers or peer review groups deal with the need to compare research activities that have different objectives or with the existence of diverse perspectives within review groups on the relative importance of these objectives.

Existing research on peer review thus raises but does not resolve several additional issues that we think are important for judging the progress and potential of research fields. One is whether expert review processes tend to exclude breakthrough innovations. Limited evidence can be found on both sides of this debate (Langfeldt, 2001; Rinia et al., 2001). Another is how peer review groups can deal with the different meanings of creativity in different fields.

Another important issue concerns who should be involved in the deliberations that assess scientific progress and set research directions. Scientific peer review processes by definition assume that only experts in the relevant scientific fields (i.e., the peers of the researchers) are competent to participate in deliberative review processes. This assumption has been called into question when the science is interdisciplinary (a situation that can greatly diminish the availability of true peers) and when the research is being funded for its potential practical value as well as for its potential contribution to knowledge for its own sake.

When research is being supported in part because of its potential practical value, it is often argued that the research agenda should be influenced by broadly based deliberations of groups that include both producers and potential users of the research (e.g., Committee on Science, Engineering,

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

and Public Policy, 1999a; National Research Council, 1996, 2005a; Renn et al., 1996). The argument is based not only on reference to principles of democracy, but also on the claim that more competent and decision-relevant choices are made when decision-making bodies have this kind of mixed representation. The approach has been employed fairly extensively in environmental and energy policy arenas, and sometimes in making decisions about basic research. It has sometimes been used in panels with a narrow purview, such as for reviewing research proposals, for which benefits have also been claimed from the inclusion of user representatives. Thus, deliberative processes that include both producers and the various kinds of users or beneficiaries of projected research deserve serious consideration by BSR in setting research directions for areas that are suspected of needing improvement in terms of their production of useful knowledge.

ANALYTIC-DELIBERATIVE METHODS

Analytic-deliberative methods are those in which judgment is based in part on information from scientific theory and data or other systematic sources of knowledge. The idea of expert judgment informed by quantitative data has several relevant exemplars, in which experts interpret quantitative measures and evaluate their import for a decision at hand. Medical diagnosis and treatment provide good examples. Physicians, acting alone or in multispecialty groups, consider available test results, the patient’s reported symptoms and observable condition, and other quantitative and qualitative information before making their judgments about the correct diagnosis and the appropriate course of treatment. They monitor these same sources of information to evaluate the success of the treatment and to consider changes in it. They can make better judgments with the right quantitative test results than without them, but they use judgment in interpreting the data. Trial juries also deliberate on information that includes the results of analyses of evidence and sometimes the conflicting interpretations of the evidence by experts. As the example of trial juries suggests, it is possible to conduct useful analytic-deliberative processes in which some of the participants are not experts in the scientific issues being considered. In fact, many of the participatory processes noted above, such as those employed in environmental policy, include important roles for nonexperts in guiding and interpreting scientific analyses.

Analytic-deliberative methods in public policy are normally used by groups of people, sometimes consisting only of experts in a field, and sometimes also involving science managers or representatives of groups that might be affected by the decisions being considered. Below, we consider three examples that may be relevant to the needs of BSR.

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

NRC Comparative Assessment of Fields of Energy Research

The NRC’s Board on Energy and Environmental Systems has produced a series of studies that compare disparate areas of energy research supported by the U.S. Department of Energy (DOE) to judge the benefits that have been gained from past research and that might flow from future research in order to assess the past performance of research areas and inform judgments about the relative priority that should be given to future research in these areas. Although energy research is in many respects quite different from behavioral and social research on aging, it is similar in certain respects that may make the energy case instructive: the research comes from disparate fields and draws on different expertise, its potential benefits are both scientific and practical, and the practical benefits are both economic and noneconomic in nature (e.g., national security, in the case of energy policy). Thus, in neither field is it easy to find expert reviewers who are competent across the range of substantive areas to be reviewed, and in neither field is there a satisfactory common metric for comparing research progress.

The committees working under the board on this effort have developed an analytical matrix designed to meet three criteria: simplicity but flexibility, transparency to decision makers, and consistency (in the sense of allowing analysis of different fields of research within a common category system) (National Research Council, 2001a, 2005c; Fri, 2004). The retrospective assessment (National Research Council, 2001a) of the benefits and costs of research used a two-dimensional matrix. The rows of the matrix distinguish three kinds of benefits and costs defined by the objectives of the DOE research effort: economic, environmental, and security. The columns reflect the certainty of the benefits. They distinguish “realized benefits,” for which the technology is developed and economic and policy conditions are favorable for commercialization, “options benefits,” which refer to technologies that are developed but for which economic or policy conditions are not now favorable, and “knowledge benefits,” defined as “economic, environmental, or security net benefits that flow from technology for which R&D has not been completed or that will not be commercialized.”

The 2001 report provides considerable detail on the kinds of benefits and costs that belong in each category and on how those benefits were to be estimated (National Research Council, 2001a:Appendix D). It emphasizes methodological considerations, such as the need to assess net benefit and the need to rely on data from sources independent of the research sponsor. The committee emphasized the need to consider all types of benefits, not only the economic ones, which are the most easily quantified. It made an explicit decision not to try to reduce each type of benefits to a dollar metric for comparisons. It collected information on the costs and benefits of the selected research programs from program managers and comments from

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

industry and public interest groups. Thus, the committee collected the available analytic information and organized it around the cells of the matrix, but it ultimately relied on deliberative processes to reach its conclusions (National Research Council, 2001a).

The Committee on Prospective Benefits of DOE’s Energy Efficiency and Fossil Energy R&D Programs prepared a prospective assessment (National Research Council, 2005c) using a modified analytic matrix that retained the distinction among the three objectives of the R&D programs. It considered the probability of the program achieving its goals of producing new technologies and the conditional probability of market acceptance of those technologies, evaluating these outcomes in relation to three scenarios of possible energy futures.

It would be possible to develop an analogous approach for assessing behavioral and social research on aging. BSR could develop a simple but flexible evaluation methodology that is transparent and that could be applied consistently across fields. NIA strategic planning documents could specify the key objectives of BSR research. Retrospective analyses could consider a matrix of results that assessed realized benefits (e.g., to health and well-being), options benefits (e.g., development of techniques and procedures in health care), and knowledge benefits from each field in relation to the NIA research objectives. Knowledge benefits include not only knowledge that is applicable to technology or health care, but also improved basic understanding of processes of aging even if that knowledge has no foreseeable application. Prospective analyses would involve judgments of the likelihood that research investments would yield knowledge, options, and realized benefits of the types desired by BSR and NIA.

Foresight Techniques

Foresight as a technique for aiding science policy decisions has been defined as “the process involved in systematically attempting to look into the longer-term future of science, technology, the economy, environment and society with the aim of identifying the areas of strategic research and the emerging generic technologies likely to yield the greatest economic and social benefits” (Martin, 1996, p. 158). The approach is predicated on the beliefs that there are many possible futures, and that “the choices made today can shape or even create the future” (p. 159). Foresight approaches emphasize consultative processes among relevant stakeholders, with extensive provision for feedback among participants. A variety of techniques are employed to elicit projections of future trends and opportunities. These include creation of scenarios, trend analysis, Delphi techniques, technology roadmapping, among others. Foresight differs from the use made of advisory groups by federal agencies in the United States to project or recommend

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

future trends and opportunities in science in at least the following ways: it systematically engages a more diverse set of stakeholders in a single exercise; it employs specific techniques to structure future possibilities; and it incorporates iterative processes in which participants may modify their projections in light of information garnered about projections of other participants.

Foresight is a well-established approach for assessing prospective developments in science in several European countries, Canada, Australia, and Japan (Martin, 1996). For example, a recent review of the United Kingdom’s Foresight Programme, launched in 2002, concludes that “the Programme has achieved its objectives of identifying ways in which future science and technology could address future challenges for society and identifying potential opportunities. It has succeeded in being regarded as a neutral interdisciplinary space in which forward thinking on science-based issues can take place” (Policy Research in Engineering, Science and Technology, 2006:3).

Selected consideration and use of the technique is evident among U.S. science agencies (National Academy of Public Administration, 1999). However, it has been used less frequently than standing or specially constituted advisory panels that do not employ structured Foresight techniques. Selected advisory committees and external study commissions across agencies may have considered or used variations of Foresight. Several reasons may be adduced for the comparatively limited formal use of Foresight techniques in assessing and projecting future scientific trends in the United States. One is its association with the political imbroglios that led to the demise of the Office of Technology Assessment. Although we have not attempted to evaluate past experiences or current usage of Foresight methods, we note their relevance for possible adaptation to the needs of BSR for improved methods for informing science policy decisions.

NIH Consensus Development Conference Model

The Consensus Development Conference is a familiar analytic-deliberative process in NIH. This model, which has been used more than 120 times since 1977, follows a carefully thought out rationale and set of procedures (for a detailed description, see http://consensus.nih.gov/ABOUTCDP.htm). It is used to produce State of the Science Statements, which summarize available knowledge on controversial issues in medicine of importance to health care providers, patients, and the public. It is also used to produce Consensus Statements, which address issues of medical safety and efficacy, may go into economic, social, legal, and ethical issues, and may include recommendations.

Consensus development conferences are deliberative in that the appointed panels discuss the implications of available scientific information

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

for medical practice and related issues and seek a consensus that reflects a collective judgment. They are different from the usual scientific peer review panels in that the membership is not restricted to scientists. They are analytic-deliberative because they rely not solely on judgment, but on systematic efforts to review the scientific literature and gather information from experts on the medical technology or treatment in question (analyses), and because the experts respond to inquiries from the panel and engage in discussion with it, thus closing the circle between analysis and deliberation in ways that can potentially change both processes. The consensus development process includes various safeguards of the independence and credibility of the panels, whose members are screened for bias and conflict of interest and deliberate in executive session to protect their independence from outside influence. The consensus statements are widely disseminated by NIH, but they are not government documents. They are statements by the panel, and their credibility flows from the reputations of the panel members and the procedures for ensuring that the panel is balanced, well informed, and independent.

Consensus panels are notable for their breadth of participation. They are chaired by “a knowledgeable and prestigious person in the field of medical science under consideration” who is not “identified with an advocacy position on the conference topic.” They include research investigators in the field, health professionals who use the technology in question, methodologists, and “public representatives, such as ethicists, lawyers, theologians, economists, public interest group or voluntary health association representatives, consumers, and patients.” Members are selected for their ability to weigh evidence and to do collaborative work, as well as for their absence of identification with advocacy positions or financial interest related to the conference topic.

The consensus development model has been used in NIH for providing advice on a variety of policy-related topics, but not in the area of research policy. In principle, though, elements of this model could be included in an analytic-deliberative process for advising on research policy in BSR or more broadly in NIA. To do this, several issues would need to be confronted:

  • Who would be represented? For example, how broad should the participation be beyond the research community? In particular, what roles should various beneficiaries of research, from health care professionals to patients, have in advising on NIA research priorities?

  • How would analysis be organized to support deliberation? Given the limitations of all the analytical approaches available in research policy, attention would have to be given to ensuring that the results of bibliometric or other methods of analysis are presented as data to be interpreted judiciously,

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

much as data from medical research are. The process would have to take into account the fact that the evidence on science policy choices is usually of lower quality than the evidence on medical treatments.

  • How would results from a research policy consensus conference feed into institute decisions? This raises the same issues of research managers’ levels of discretion and of the balance of influence and power between research managers and others that arises with ordinary scientific peer review. With nonresearchers at the table in a consensus conference setting, these issues take on a different tone.

  • What are advantages and disadvantages of this model compared with current, more purely deliberative, peer review and advisory processes?

As these examples and experience in other areas of public policy decision making suggest, processes that incorporate relevant analytic techniques and information into deliberations in groups that represent the range of scientific knowledge and policy perspectives needed for wise decisions can result in recommendations and decisions with several desirable properties. The recommendations and decisions can be well informed about the available evidence, systematic in consideration of the evidence from all relevant policy perspectives, accountable, and even consensual among groups representing diverse perspectives. Because well-organized analytic-deliberative processes can entrain the full range of knowledge sources and perspectives on its interpretation, they are well suited to producing these desirable results—but they do not always produce them. Although research in some fields of public policy is beginning to identify the conditions and practices that are conducive to achieving these results (e.g., National Research Council, 1996, 1999; Renn et al., 1996), similar bodies of research have not yet been developed for the use of analytic-deliberative processes in science assessment. At present, it is worthwhile to seek to adapt practices from other fields, such as those described above, while also working to improve systematic knowledge about which processes of science assessment best meet the needs of organizations like BSR.

CONCLUSIONS AND RESEARCH NEEDS

Conclusions

  1. Assessing the progress and potential of scientific fields is a complex problem of multiattribute decision making under uncertainty. Scientific research activities have multiple objectives, including those of advancing pure science, building scientific capacity, and providing various kinds of societal benefits. Every research policy choice and every research activity will have its own profile with regard to effects on different objectives, and there is no

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

agreed weighting among the objectives. Consequently, judgment is required to assess the evidence regarding how science is progressing toward each objective, as well as to consider the weight to be given to progress toward each objective.

  1. None of the available analytical methods of science assessment is sufficiently valid to justify its use for assessing scientific fields or setting priorities among them. Judgment must be applied to interpret the results from these methods and discern their implications for policy choices. This situation seems unlikely to change any time soon. Therefore, the most appropriate use of quantitative methods is as inputs to analytic-deliberative processes of decision making. Analytic methods have the advantage in principle of making it possible to account for the progress of different fields in the same units, thus supporting priority-setting decisions in an accountable way. Each of them, however, has significant practical limitations. For example, bibliometric studies provide measures of scientific activity and of the extent to which disciplines and fields influence one another. They also have well-known limitations: they emphasize publications in the periodical literature over other scientific activities, and information about publications and citations must be interpreted in terms of the importance, correctness, and mission relevance of those activities. In addition, citation measures have been criticized as being susceptible to gaming, and reputational studies have the same limitation. Surveys of scientists to elicit their judgments are unlikely to be useful for comparing different research fields because few scientists are knowledgeable across fields, and no method is available for ensuring the comparability of judgments across the potential respondents. Quantitative methods from decision analysis are not suitable for informing science policy decisions by BSR because there is insufficient basic understanding of the paths from research activities to scientific or technological progress.

  2. Choices within NIA that involve comparisons among fields of behavioral and social science research can be better informed, more systematic, more accountable, and more strongly defensible if they are informed by appropriate systematic analyses of what these fields have produced and are likely to produce. We consider it possible to constitute expert review panels that draw on their own experiences and insights, augmented by quantitative data on the outputs, outcomes, impacts, productivity, or quality of research, to arrive at better informed and more systematically considered expert judgments about the progress and prospects of scientific fields than they could reach without quantitative data. We think that processes that organize ongoing exchanges of judgments between bodies of scientists and science managers can produce wiser decisions than processes based on either-or thinking. Although analytic techniques should not substitute for careful deliberation, deliberation informed by analysis can produce better results

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

than deliberation not so informed. In Chapter 6, we offer recommendations for structuring decision processes toward this end.

Research Needs

Several lines of research can contribute to the knowledge base needed for a social science of science policy (Marburger, 2005) that would improve science policy decision making. This research would aim to fill the above gaps in knowledge. The research effort should be broadly based to provide broader benefits and clearer knowledge about which aspects of scientific progress are general and which are domain- or discipline-specific. In addition, a broad effort may provide general lessons about advancing interdisciplinary and mission-relevant science that can flow from research sponsored by any of a number of agencies. Research is needed to achieve the following three objectives.

  1. Improving basic understanding of scientific progress and the roles of research funding agencies in promoting it. Research is needed to examine the nature and paths of progress in science, including the roles of decisions by science agencies. To support BSR, research is needed on progress in fields of behavioral and social science related to aging.

    Scientific progress is usefully understood in terms of a causal stream that roughly moves from (a) processes that structure research to (b) inputs to research to (c) scientific outputs to (d) scientific outcomes to (e) impacts on society, as these terms are defined in National Research Council (2005c) and elaborated in Chapter 4. Society closes the circle by providing inputs and structure for research, generating research questions, and in other ways. But for the purpose of evaluating the programs of science agencies, it is useful to focus on how variables earlier in the stream affect variables later in the stream. Thus, scientific progress is usually evaluated on its outcomes and impacts. Assessments of research programs must consider these consequences in light of the level of effort (e.g., processes and inputs) that went into trying to achieve them.

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

Research on the nature and paths of scientific progress can build basic understanding of conditions that facilitate and impede such progress. The research might include:

  • Historical analyses of the evolution of scientific fields—their rise, continued fecundity, or decline—performed by or vetted by professional historians to ensure adherence to professional standards, especially in attributing causation. “Stories of discovery” or progress, as supported by BSR and other federal science agencies, while useful in putting a face to agency claims of contributing to scientific advance, are limited as tools of analysis. They are subject to selection bias that arises from examining only the successes from among the investments made by an agency or program. They also tend to highlight agency-specific contributions, without considering the importance of other sources of contributions to progress. What are needed are studies of fields that are generally considered to have been productive and of fields that are not so considered, conducted in a manner that meets professional historical standards (e.g., Nye, 1993; Kohler, 2002). These studies could usefully focus on how the processes that organize research programs and the inputs to those programs have affected scientific outputs, outcomes, and impacts.

  • Advanced bibliometric analyses of the development of research fields, to provide a window into the development of research fields over time and the flows of influence among them. These studies should look at outputs in relation to measures of inputs and processes and also in relation to indicators of scientific outcomes and impacts. Particular emphasis should be placed on the cross-fertilization of research findings from one disciplinary domain to another and the emergence of new fields of knowledge. Some of the studies should consider the usefulness of bibliometric indicators of research outputs as measures of research progress, defined in terms of each of the sponsoring agency’s program goals. Such potential indicators will require careful methodological analysis to assess their validity and potential biases before they are ready for practical use, even as inputs to decisions.

  • Studies of scientific progress using emerging databases of conference proceedings or other prepublication scientific outputs. In many fields, new research results are first presented in technical reports or at conferences. Data on such kinds of activity may provide earlier indicators of scientific progress than bibliometric measures.

  • Analyses of research vitality or interest shown by active scientists in lines of research, focusing on research directions that are widely considered in hindsight to have been successful or unsuccessful in terms of yielding major scientific advances or societal impacts. The studies should examine the ways that the vitality of scientific fields may relate to subsequent scientific outcomes and impacts. For example, studies should be made of the

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

career paths of productive scientists (“stars”) in terms of their choice of research topics, the journals in which they publish, and the career paths of the graduate students they train. Such studies could test the hypothesis that progress in a field can be predicted from the quality of the researchers who are willing to allocate their time to a specific line of inquiry.

  • Studies of the effects of the structure of research fields on their progress. These studies might compare the consequences for the development of scientific fields, particularly new fields, of research portfolios that emphasize large centers, database development efforts, or interactive workshops, with more traditional research portfolios emphasizing funding to individual investigators and small research groups.

Research on the roles of science agency decisions in scientific progress can help the offices and agencies that sponsor it to make decisions about how to select and train research managers and organize advisory groups so as to better promote program goals for advancing science. This research might include:

  • Studies of the role of officials in science agencies in promoting scientific progress. Some of these studies might follow the example of past research done for U.S. foundations (e.g., Kohler, 1991; Rooks, 2006) that has investigated how program managers have acted as entrepreneurs who help build new research fields and as stewards of vital fields. The research might also include studies of the characteristics of effective research entrepreneurs and stewards and studies of the effects of science agencies’ practices of hiring, training, and evaluating program managers on their scientific entrepreneurship and stewardship.

  • Studies of how expert advisory groups, including study sections and advisory councils, make decisions affecting scientific progress (e.g., comparing decision making in disciplinary versus interdisciplinary advisory groups; examining the effects of emphasizing explicit review criteria, such as innovativeness, on group decisions; examining how review groups consider multiple decision criteria; investigating hypotheses, such as that peer review groups generally select in favor of methodological rigor at the expense of innovation and that different advisory groups have distinct cultural differences that affect their ability to nurture scientific innovation).

  • Studies of the effects of the organization of advisory groups on their success at promoting interdisciplinary and problem-focused scientific activity and ultimately at improving scientific outcomes and societal impacts. These studies might examine the roles of advisory group chairs in shaping group decision rules; the effects of the characteristics of group members individually and collectively; and the processes of training, mentoring, and

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

socializing advisory group members and of oversight of advisory group processes.

  1. Improving understanding of the uses of analytic techniques in making research policy decisions. This research would support the development, trial use, and empirical investigation of a variety of quantitative measures and decision-analytic techniques for assessing the results of past research investments and setting research priorities. The studies would seek to validate analytical techniques and to determine their best uses, which may be different for different analytic techniques. The research might include:

  • Studies comparing multiple indicators of research vitality, outputs, outcomes, or impacts of lines of research with each other and with the unaided judgment of experts in these areas to see whether it is possible to develop reliable and valid quantitative measures of scientific progress through a convergence of indicators and to determine whether any such measures might be useful as leading indicators that predict critical scientific outcomes or impacts.

  • Comparative studies of fields that are widely judged to differ in rates of progress toward positive outcomes and impacts to see whether particular quantitative indicators or a convergence of indicators yield results consistent with expert judgment.

  • Studies to assess the value of providing information developed through specific analytic techniques, such as bibliometric studies, for research priority setting. Studies using cross-citation patterns or analyses of academic and professional career trajectories of researchers and students can show whether such analyses add significantly to the decision-relevant knowledge of expert review groups and whether and how this information alters their recommendations.

  • Studies of scientific impact using databases that cover citations in policy documents and the popular press, with the results examined from the perspectives of research scientists and policy makers.

Tests of ways to employ a convergence of information from different analytic methods to inform priority setting. This research might identify whether certain ways of combining information from multiple sources can contribute to more robust and reliable decision making than reliance on any single method.

  1. Improving the incorporation of techniques for analysis and systematic deliberation into advisory and decision-making procedures. This research should explore and assess techniques for structured deliberation, some of

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

them incorporating information from potential quantitative indicators of scientific progress and potential, for retrospective assessment and priority setting. The research would be used to elaborate and refine deliberative methods for organizing peer review and expert advice. The research should include the following:

  • Studies to develop techniques for structuring decision analysis for use in the research priority-setting tasks facing BSR. Some studies might develop influence diagrams modeling the relationships of scientific activities (processes, inputs, outputs) to BSR goals (especially outcomes and impacts) and explore the feasibility of using these to structure deliberation. The influence diagrams might be developed by outside researchers or BSR staff, in consultation with the program’s advisory council. Some studies might explore ways to structure discussions within deliberative groups around the multiple goals in the NIA program plan or around lists of types of scientific outputs, outcomes (e.g., dimensions of scientific progress), and impacts. These studies might involve the use of simulated advisory groups.

  • Trials of analytic techniques for informing and structuring decisions in the deliberations of actual review and advisory panels or shadow panels created for experimental purposes. Some studies might provide panels with the most relevant available quantitative indicators for their tasks and leave them a period of time during their deliberations to discuss the meaning of the indicators for the decision at hand. Resources permitting, parallel panels could serve as comparison groups. In some studies, panels would be asked to apply structured methods for considering quantitative and qualitative information about the activities in the fields to be compared in relation to explicit criteria, such as lists of BSR strategic goals or dimensions of scientific progress, or to use influence diagrams showing plausible paths from research to the achievement of desired program goals. The studies would examine the effect of the interventions on (a) panel members’ reports of whether and how their thinking or their recommendations were affected; (b) indicators of decision quality, such as the number of relevant decision objectives and pathways from decisions to the achievement of objectives that are considered in the deliberations; and (c) the creation of a sufficiently explicit record of the rationale for the advisory panel’s recommendations to improve accountability and allow for a better informed exchange of judgments between researchers and research managers.

  • Studies to adapt existing analytic-deliberative assessment approaches, such as the NIH Consensus Development Conference model to the purposes of assessment of research areas and research priority setting in BSR. Some of these studies might incorporate the above techniques for informing and structuring decisions. Some of the studies might include nonscientists, selected to represent the perspectives of the potential users or beneficiaries

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×

of the research, in the analytic-deliberative process. These studies could explore how adding these perspectives may affect the ways in which the advisory groups assess the benefits of research for basic understanding and for society.

  • Comparative studies of advisory panels of different composition, particularly for recommending research priorities. For example, BSR, NIA, and the NIH Center for Scientific Review might vary the breadth of expertise of experts or the balance between senior and junior researchers. Such research would provide an empirical base for assessing the reliability of deliberative advice and the sensitivity of the advice to the intellectual backgrounds and practical orientations of panel members. Such experiments would also offer evidence to evaluate such claims as that panels of researchers are too conservative to support promising high-risk research or too uncritical in areas of expertise of only one or two panel members.

  • Studies involving the instruction and training of advisory panel members to consider specific BSR and NIA objectives, including mission relevance, that go beyond generic considerations of the quality of proposed research.

NOTES

  

1. Treated as subsets of the broader methodologies covered in this report and thus omitted from specific discussion are various Foresight techniques (Irvine and Martin, 1984) and mechanisms for scoring R&D priorities.

  

2. Analysis of other prominent performance measures contained within the larger scope of scientometric inquiry, such as patent statistics and publication-patent relationships, are not relevant to much of BSR’s research portfolio, which produces different kinds of impacts.

  

3. Debates about the relative contributions of theoretical and empirical approaches to scientific advance and about leader-follower relationships between them are staples in the history of science and entail issues that extend well beyond the scope of this report (see, e.g., Galison, 1999).

  

4. Bibliometric evidence on the social sciences, for example, consistently shows that sociologists and political scientists cite articles from economics journals more frequently than economists cite sociological or political science journals. These findings have been alternatively interpreted as indicating the greater generalizability and precision of economic modes of analysis, and thus its greater intellectual vitality, and as documenting the intellectually closed-loop, solipsistic nature of economic thinking (Laband and Pietter, 1994; MacRae and Feller, 1998; Reuter and Smith-Ready, 2002).

Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page95
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page96
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page97
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page98
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page99
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page100
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page101
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page102
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page103
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page104
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page105
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page106
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page107
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page108
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page109
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page110
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page111
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page112
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page113
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page114
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page115
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page116
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page117
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page118
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page119
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page120
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page121
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page122
Suggested Citation:"5 Methods of Assessing Science." National Research Council. 2007. A Strategy for Assessing Science: Behavioral and Social Research on Aging. Washington, DC: The National Academies Press. doi: 10.17226/11788.
×
Page123
Next: 6 Conclusions and Recommendations »
A Strategy for Assessing Science: Behavioral and Social Research on Aging Get This Book
×
Buy Paperback | $48.00 Buy Ebook | $38.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

A Strategy for Assessing Science offers strategic advice on the perennial issue of assessing rates of progress in different scientific fields. It considers available knowledge about how science makes progress and examines a range of decision-making strategies for addressing key science policy concerns. These include avoiding undue conservatism that may arise from the influence of established disciplines; achieving rational, high-quality, accountable, and transparent decision processes; and establishing an appropriate balance of influence between scientific communities and agency science managers. A Strategy for Assessing Science identifies principles for setting priorities and specific recommendations for the context of behavioral and social research on aging.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!