Evaluating Research in ECSE
Why does experimental computer science and engineering (ECSE) flourish on some campuses and struggle on others? Some of the differences are the result of historical accident, but the issue is much deeper than that. Many experimentalists believe that the academic career deck is stacked against them. The committee also found that publication practices in ECSE emphasize conference publication over archival journal publication, a fact likely to be negatively interpreted by the ''paper counters" of university promotion and tenure committees.
Furthermore, there are differing interpretations within the computer science and engineering (CS&E) field itself of what constitutes scholarly work. The issue can perhaps be constructively introduced by reporting the results of a small, informal survey in which about 20 computer scientists from around the country were asked by the chair of the committee whether they thought a faculty member should "get tenure for inventing the mouse."
The mouse is an example of an artifact that has realized the goals of ECSE and exemplifies an ECSE research success. It is an encapsulation of ECSE research knowledge in the following ways:
The mouse falls within the scope of ECSE, having mechanical, electronic, and software components concerned with human-computer interfaces.
The concepts underlying the mouse fundamentally improve the functionality of the human-computer interface.
The concepts were shown to be "better" quantitatively.
The mouse has had a significant impact as witnessed by a variety of subsequent implementations, improvements, and applications, as well as widespread use.
Despite these qualities, the replies to the chair's informal survey correlated strongly with whether the respondent was an experimentalist (yes) or a theoretician (no). The question exposed fundamental differences of opinion concerning the nature of research accomplishments. It also emphasizes that the research of junior faculty members—either theoreticians or experimentalists—whose senior faculty are predominantly in the other area might not be fully appreciated at promotion time.
In this chapter two general questions of evaluation are considered. The first concerns how CS&E implements its quality standards for research. Treating this matter entails a careful review of publication forums and traditions in ECSE. The second concerns differences in experimental and theoretical research and how these differences affect a professor's evaluation.
PUBLICATION AND OTHER FORMS OF DISSEMINATION
The scholarly articulation of a contribution is a key characteristic of research, and all intellectual communities have mechanisms through which new knowledge and information are disseminated and explicated. In addition, certain communities place considerable value on establishing priority and claiming credit for new ideas and innovations. Not surprisingly, the particular mechanisms used by any given community depend on the efficacy with which those mechanisms facilitate the dissemination of information and the establishment of priority.
Communication with other researchers in ECSE has several aspects. As in all fields, the first goal is to convey the content of the work. Next in importance, the academic researcher in ECSE wishes to convince other researchers or developers to use an idea or implementation. This requires the researcher to demonstrate the worth of the idea. Such arguments can be made on a quantitative or qualitative basis, although the former is likely to be more easily conveyed. The idea must be reported in great enough detail to allow others to reproduce it, or the actual implementation that embodies the idea (i.e., the artifact) must be provided to the community. Reproduction of experimental data may also require the availability of a genuine implementation.
For potential adopters of an idea, both timeliness and quality of publication are important. Timeliness is critical because ECSE moves so rapidly, and ideas that take a long time to reach potential users often become irrelevant or obsolete. Quality is important because new ideas must be well explained, as well as convincing in their technical arguments, with comparative discussion of other approaches and often extensive quantitative evidence that substantiates the merit of an approach. A strong refereeing process plays a valuable role in identifying important and innovative ideas and in promoting those that are well justified. It also helps to ensure that earlier work is properly attributed and that a claimed innovation is in fact new work.
Researchers in ECSE use several forums for the publication of their research: conference proceedings, archival journals, technical reports. As importantly, they also disseminate information through a variety of "nonstandard" channels (e.g., distributing software artifacts, creating and distributing videotapes, presenting demonstrations off-site) so that they can demonstrate intangible and dynamic properties of artifacts for other researchers who wish to interact directly with their work. Such nonstandard channels are critical to ECSE research, especially for proof-of-concept or proof-of-existence artifacts.
Conference proceedings and journal articles are the most important publication channels and are discussed in greater detail below. Technical reports provide a detailed description of work in progress that enables other researchers to collaborate with the author(s) and validate and enhance the work. They are the main vehicle for immediate distribution of technical information and gaining feedback on the value of a work. Under most circumstances, technical reports are not refereed beyond the immediate department; in some cases, such publication requires at most the approval of the department head. Technical reports are freely distributed, and many technical reports are available on-line (via Internet access).
A substantial majority of respondents to the CRA-CSTB survey of ECSE faculty preferred conferences as the means of dissemination by which to achieve maximum intellectual impact; many fewer preferred journals. Conferences were preferred primarily because of timeliness and, to a lesser extent, the better audience offered by conferences (i.e., they are better focused). Researchers who favored journals were almost equally split among three motivations: university recognition, stronger refereeing, and a wider audience.
Although researchers favored conference publication by a significant majority, a large majority of the researchers surveyed also indicated their belief that journals were much more effective in gaining university recognition. Most indicated that the reason for this was that university administrators put more emphasis on journals; very few indicated that journals had higher prestige or greater impact.1 Put differently, only a small number of respondents to the CRACSTB survey agreed that the best publication vehicle to gain university recognition was also the best vehicle for intellectual impact on the field; the remainder felt that there was a conflict between these two vehicles.
The leading journals in ECSE include the Association of Computing Machinery (ACM) journals, the Institute of Electrical and Electronics Engineers (IEEE) transactions as well as the more selective IEEE magazines, and the leading independent private journals such as the Journal of Parallel and Distributed Computing or Artificial Intelligence .2 These journals are characterized by a rigorous and demanding refereeing procedure and rather selective publication (although it varies considerably by publication).
The primary characteristic of these journal publications is a thorough and often lengthy review process. (See Appendix B for a fuller discussion.) This review process enables the referees to request changes to a paper and to ensure that such changes are carried out with the help of the editor. Typically, papers are reviewed by three outside referees, although in many cases one of the referees may fail to produce a report, leaving the editor with only two recommendations.
On the basis of referee reports, the editor makes a decision to accept the paper, to request minor or major revisions, or to reject the paper. When major revisions are required, most editors send the paper back to at least a subset of the original referees. When minor revisions are required, the editor either examines the paper or sends it back to the referees.
The questions on which journal reviewers tend to focus are, Are the results right? Are the weaknesses fixable? and, What value will this have for posterity? Journal reviewers typically spend considerable time understanding the argument presented in the paper and finding ways to strengthen it. Even highly favorable reviews of a paper usually have extensive comments about how to improve it.
Many people believe that most papers are significantly improved by the refereeing and revision process. Among the improvements are clearer exposition, higher level of completeness and correctness, and better comparisons with other work. Editors of major journals have observed that papers written by less experienced authors are often seriously lacking in one of these areas.
Journal papers typically are not constrained by length, although budget limitations have led to requests for authors to shorten papers or to divide them into two parts for publication. Journals are typically classified as private journals or professional society journals. Of these, the society journals are regarded as more prestigious. However, many authors in ECSE are drawn to private journals because they tend to publish more rapidly, while still maintaining high standards for refereeing.
In ECSE, journal articles have special value in consolidating and summarizing work for the long term. Because there are few limitations on length and because of the greater emphasis on completeness, possibly at the expense of timeliness, journal articles are an ideal mechanism to review what has been learned throughout a major portion of a project's lifetime, and to place that knowledge into a broader context.3 By contrast, journal articles are less suitable as a means for disseminating information about intermediate results whose long-term significance may become clear only when the full context of the work can be presented.
The following is a good example of an archival journal article serving this role: Davidson, Jack W., and Christopher W. Fraser. 1984. "Code Selection Through Object Code Optimization," ACM Transactions on Programming Languages and Systems 6(4):505–526. This paper is the "consolidation" paper for the peephole optimizer described in Chapter 1.
The leading conferences in ECSE are typically carefully refereed (although by a different process than is used for journals) and have high standards for acceptance, as indicated by relatively low rates of acceptance. Conferences that meet these standards include the International Symposium on Computer Architecture (ISCA), the conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), the conference on Programming Language Design and Implementation (PLDI), the Symposium on Operating System Principles (SOSP), and the SIGCOMM and SIGGRAPH conferences. Papers published as part of these conferences are of comparable significance to those published in the best of journals.
The paper selection process for these conferences relies heavily on the program committee as the primary referees. Although external referees are also used, typically at least one-half of the refereeing process is handled by the program committee. A paper usually receives at least four, and often five, reviews. (For example, a recent ISCA conference averaged 4.4 reviews for each of 209 submitted papers.) Because these are the leading conferences, the program committee generally consists of highly respected individuals. Thus, this round of refereeing is often as thorough and discriminating (sometimes more so) as the refereeing done by journal publications. Indeed, because conferences are often unable to request extensive revision of submitted papers, strong papers with flaws are often rejected, whereas for a journal they would be revised for additional consideration. Because the conference selection process is relatively rapid, a paper that is rejected can be revised and resubmitted to another conference or to a journal. The ability to do this depends on having high-quality feedback on the papers.
The major disadvantage of the conference review and selection process is the lack of an opportunity to review revisions to papers. This capability is the major additional quality control that can be exercised by journals. The committee's data show that a second review is required for about one-half of the papers published in journals such as ACM's Transactions on Computer Systems and Transactions on Programming Languages and Systems. To provide the opportunity for improving papers in a similar fashion to that achieved by a second refereeing, several conferences have adopted a method called shepherding, in which papers that are worthy of acceptance but have some problems are handled by an appointed member of the program committee. This person, called the shepherd, works with the authors to convey additional comments from the program committee and ref-
erees and reviews the revised paper prior to publication. The idea of shepherding arose in connection with the SOSP conference, where it is used heavily (often more than one-half of the papers are shepherded), but it has recently been adopted by other conferences, although usually on a smaller scale (with only about one-quarter of the papers being shepherded).
The major disadvantage of conference publication is the limitation in length. Many conferences limit the final paper as well as the version submitted for refereeing. For example, the submitted summaries are often limited to 5,000 to 7,500 words. With such limits, the submitted version may differ somewhat from the final printed paper. Experience has shown that well-written papers can fit within the limit and still contain enough information to allow referees to make an accurate judgment, if the program committee understands the area well. Papers that require substantial additional background may not easily fit within these constraints, or within the final length limitations imposed when the paper is accepted.
A minor disadvantage of conference publication is the somewhat limited distribution, compared with that of journals. Conference attendees, who generally include the majority of researchers actively studying a topic, all receive a copy. Additionally, ACM's special interest groups (SIGs) often send a copy to all members. For example, ACM's Special Interest Group on Computer Architecture (SIGARCH) sends a copy of the ISCA and ASPLOS proceedings to all of its members. However, not all SIGs follow this custom, and even for those that do, broad circulation is not customary for all of the conferences sponsored by the SIG. Finally, libraries have in the past not always appreciated the importance of conference publication to ECSE, and so conference holdings at many libraries are often incomplete or nonexistent.
The major advantage of conference publication is the greatly reduced time to publication. The typical leading conferences have submission dates that are roughly six or seven months before the conference date. The leading journals have average submission-to-publication times of more than two years.4 This time differential is discussed further in Appendix B.
Like journals, conferences vary widely in their selectivity. In addition to the highly selective smaller conferences, there is also a set
of conferences that, while demanding, tend to have larger programs and multiple, parallel tracks. Because many of these conferences are large, the refereeing process is necessarily less focused and cannot be as carefully done. These conferences probably compare in selectivity to the less demanding journals. There are also workshops and informally refereed conferences with even lower selectivity. The better papers presented in such workshops are often revised and extended for submission to a leading conference or journal.
Although the overall selectivity of a conference is one indicator of the quality of papers presented, it is at best a gross measure. Thus, a conference for which 30 papers of 100 are accepted (acceptance rate of 30 percent) may well have a higher overall quality than one for which 50 of 200 papers are accepted (acceptance rate of 25 percent). Accordingly, acceptance rate is only one factor to consider in determining the intellectual importance of any given conference.
The questions on which conference paper reviewers tend to focus have a different character from those of journal reviewers. Although concerned with technical accuracy, conference reviewers tend to pay much more attention to questions such as, Is this work important? Will others in the community care about this work? and, Is it timely? As noted earlier, conference referees tend to prefer outright rejection rather than extensive revision because of their tight time constraints, although they often make comments intended to strengthen the paper. These two factors—importance/timeliness and tight reviewing deadlines—mean that papers rejected by important conferences would often have passed the quality threshold for journal publication, although perhaps with revisions required.
Many of the same observations about timeliness and selectivity were made in a study of information needs in the sciences undertaken by the Research Libraries Group. Its study of publication and publication dissemination in computer science states the following: ''In computer science, conferences are the venue for presenting important new research, and competition for the opportunity to do so is intense. In fact, presenting a paper at the more prestigious conferences is preferred to publication in a leading journal."5 This source also indicates the important advantage that conferences offer in time to publication, as does the Abrahams letter cited above.6
Conference proceedings have one additional advantage over journal publication—they are presented to live audiences, typically in 20 to 30 minutes. Feedback from the audience, both as part of the formal presentation and in informal conversations in the hall or over meals, often has a direct and immediate impact on the progress of a project.
Artifacts as a Medium for Dissemination
Because an artifact often embodies aspects of the intellectual content of ECSE research that may be intangible, it is important to consider how this content is communicated to the research community.
Artifacts released to the research community are a very different medium from publications. Where publications describe work, artifacts are themselves the work. As noted earlier, good publications pass through peer review that typically involves the judgments of several reviewers and a few editors. Artifacts must instead pass a "marketplace" test, in which the relevant community as a whole votes "with its feet" (or its keyboards!) and defines work with significant impact. People are often most easily persuaded that an artifact provides better functionality by trying it out rather than by reading about it. Note that in terms of "getting the details right," nothing is more exacting than the artifact itself—it has to work!
How can impact be measured? The principle underlying impact in ECSE is simple: an artifact or an idea has impact if it changes the way other people work. Useful artifacts are by definition useful to many people. Other potential measures of impact include how long a given artifact has been used, how many people spend substantial time modifying and enhancing it, and how many other pieces of experimental research build on it, although none of these measures are easy to obtain or even to define precisely.7
One immediate consequence of the focus on impact is that the importance and significance of a given research contribution may not
be immediately evident. This is partly due just to the complexity of artifacts—complicated phenomena often take time to understand no matter how articulate the researcher is. Additionally, even when an idea is evidently good, its impact depends in part on others adopting it, possibly in the creation of artifacts, which in turn takes time and delays when this impact can be measured. Often, the more novel the artifact or idea, the longer it takes to propagate into the community.
Each form of artifact—software, computers, chips, graphic images, databases—carries with it different requirements for dissemination. Nearly all of the nonpublication forms of artifact dissemination rely on the Internet. It is therefore axiomatic that Internet access is a necessity for conducting experimental research.
Typical forms of dissemination are as follows:
Software. The source text of the program and documentation are generally made available for access by anonymous FTP from the host computer of the researcher who produced it.8 The program is usually free of charge to other researchers. To a lesser degree, software is distributed on magnetic tape provided by the creator, usually for a nominal handling charge, through central libraries such as the Netlib for free, or through secondary sources such as vendors.
Computers. Access to experimental computers is usually provided by researchers to other researchers via "remote log-in," which allows them to run programs on the machine over the Internet without being physically present. In addition to providing access, the researcher must provide documentation on the machine and its specific software, some amount of local disk storage, and some amount of "hand holding."
Chip designs. Standard structures, such as "pads" or the "multibus design frame," are distributed like software via the Internet, but most chip designs are exchanged as designs only rarely. The systems built using the chips may be displayed in some form (e.g., by remote access if they are computers or via demonstrations if they are not).
Graphic images. Dissemination most often takes the form of software to generate the images, but this may require that the recipient have a sophisticated graphics display device. Films and demonstrations at conferences, such as SIGGRAPH, are also important.
Computer-aided design (CAD) tools. Like graphics, distribution is most often in the form of software, but demonstrations at conferences are extremely significant.
Data. A wide range of data is distributed by anonymous FTP over the Internet. Examples include trace data of programs, graphic data sets such as the Utah Tea Pot, benchmark programs such as the Perfect Club (from the University of Illinois) test suites, chip designs for evaluating CAD tools, and test data sets.
These are generic forms of dissemination and do not include personal exchanges between researchers.
The distribution of artifacts is an activity performed by academic ECSE researchers that is not typical in traditional academic disciplines. The distribution of artifacts often demands a substantial commitment of time and resources, and the added work—although valuable to the ECSE community—tends to be intellectually unrewarding.
For example, research software that embodies a novel and useful research idea may be stable and complete enough for tests to be made, measurements to be taken, and papers to be written, and generally be capable of providing answers to intellectually interesting questions. At the same time, it may still be undocumented, incomplete, and quite fragile, with numerous bugs remaining in the system. Such software is useful to those who created it primarily because its creators understand its quirks and ''work arounds," and know how to fix it when it breaks; in short, the creators are not unduly hampered by these problems.
On the other hand, outsiders without such knowledge would find the software unusable. Before research-quality software can be disseminated, documentation must be written, bugs removed, omissions filled in, and so on. Additionally, a "distribution" must be planned so that the recipient can install and use the software without intervention by the creators. For a substantial software system, this packaging activity can easily require a person-year; fielding user questions after it has been distributed takes up additional time.
Demonstrations—generally needed for one-of-a-kind hardware or for software running on platforms not widely available—can be a particularly aggravating form of dissemination. In addition to the artifact having to be primped to make it suitable for display (a condition that may require much more effort than originally needed to extract the "research content"), the artifact or the equipment it runs on must be packaged for travel, moved around the country, and set up and interfaced to the local operating environment. Furthermore, it requires a presenter to actually perform the demonstration.
A less bulky, but often no less aggravating, alternative involves using a computing platform at the demonstration site. In this case, although a research software system may have been created by using
the home institution's Brand A, Model 2 computer, "minor" differences in the "same" Brand A, Model 2 computer at the demonstration site (e.g., differences in operating system version) may well prevent the demonstration from running smoothly or even at all.
Although the extra work to prepare an artifact for use or access by other researchers may be substantial, it is willingly done by the community and is part of the tradition of the field. The reason, of course, is that distribution is often a necessity for communicating one's ideas and for obtaining professional recognition. As importantly, demonstration to and actual use by independent observers is often the only way to evaluate the true worth of a contribution.
When another party uses an artifact created by the researcher, the researcher receives recognition, but the etiquette of the ECSE community is such that acknowledgment rather than co-authorship is appropriate. Moreover, if the artifact comes into wide use, even acknowledgments become less frequent, especially when it is not the actual program text that is used but rather its underlying algorithm or idea.
THEORETICIANS' AND EXPERIMENTALISTS' VIEWS ON EXPERIMENTAL RESEARCH
The committee believes that accomplishments in ECSE research should be evaluated in the context of the field's tradition as outlined above. However, one of the most serious problems treated in this report concerns a tension that exists between theoretical and experimental computer scientists. This concern manifests itself in the research evaluation process as the question, Is the mouse worthy of tenure? Behind closed doors and never for attribution, one may hear outrageous remarks from both communities: "Experimentalists don't get tenure because their work is no good." "Theory is irrelevant, as are theoreticians." Such comments are clearly counterproductive and demonstrate a lack of appreciation of the real contributions of the other group. Obviously, neither community can make a claim of being the "true" computer researchers, and mutual understanding and respect are essential.
In the committee's view, the crux of the problem is a critical difference in the way the theoretical and experimental research methodologies approach research questions. The problem derives from the enormous complexity that is fundamental to computational problems, as outlined in the discussion of artifacts in Chapter 1. This complexity is confronted in theoretical and experimental research in different ways, as the following oversimplified formulation exhibits.
When presented with a computational problem, a theoretician tries to simplify it to a clean, core question that can be defined with mathematical rigor and analyzed completely. In the simplification, significant parts of the problem may be removed to expose the core question, and simplifying assumptions may be introduced. The goal is to reduce the complexity to a point where it is analytically tractable. As anyone who has tried it knows, theoretical analysis can be extremely difficult, even for apparently straightforward questions.
When presented with a computational problem, an experimentalist tries to decompose it into subproblems, so that each can be solved separately and reassembled for an overall solution. In the decomposition, careful attention is paid to the partitioning so that clean interfaces with controlled interactions remain. The goal is to contain the complexity, and limit the number and variety of mechanisms needed to solve the problem. As anyone who has tried it knows, experimentation can be extremely difficult to get right, requiring science, engineering, and occasionally, good judgment and taste.
The distinction between these two methodologies naturally fosters a point of view that looks with disdain on the research of the other. When experimentalists consider a problem that has been attacked theoretically and study the related theorems that have been produced, they may see the work as irrelevant. After all, the aspects that were abstracted away embodied critical complicating features of the original problem, and these have not been addressed. The theoretician knows no analysis would have been possible had they been retained, whereas the experimentalist sees that "hard parts" of the problem have been left untouched.
Conversely, when theoreticians examine a problem attacked experimentally and spot subproblems for which they recognize theoretical solutions, they may see the work as uninformed and nonscientific. After all, basic, known results of computing have not been applied in this artifact, and so the experimentalist is not doing research, just "hacking." The experimentalist knows that it is the other aspects of the system that represent the research accomplishment, and the fact that it works by using a "wrong" solution implies that the subproblem could not have been too significant anyway (Box 4.1).
So, as by the blind men encountering an elephant, impressions are formed about the significance, integrity, and worth of computing research by its practitioners. Although it is natural for researchers to believe that their own methodology is better, no claim of superiority can be sustained by either. Fundamental advances in CS&E have been achieved by both experiment and theory. Recognizing that fact promotes tolerance and reduces tensions. Unfortunately, these im
BOX 4.1 An Example of Tension Between Theorists and Experimentalists
An example will help to illustrate the tension between theoretical and experimental computer science. This example is hypothetical, in that its particulars are fictional, although it is grounded in the personal experience of a committee member.
An ECSE faculty member designed a text editor (i.e., a program to arrange text on a page) that incorporates a variety of features that improve its ease of use for novices and also increase its power for expert users. Some of these features are based on novel algorithms and approaches to managing text strings, but the portion of the system that is responsible for displaying the text on the screen uses an algorithm that would be relatively inefficient for displaying large amounts of text (e.g., 100,000 characters) but is perfectly adequate for the amounts of text that will in fact be displayed on all plausible terminal screens (e.g., less than 10,000 characters).
The theorist may criticize the editor on the grounds that the algorithm is known to be inefficient, and that more efficient algorithms are known and should have been used. The experimentalist may well respond that such criticism is irrelevant, because the algorithm used was good enough for all practical purposes, and the editor should be evaluated primarily on the basis of its power and usability.
pressions will likely be used in the process of evaluating professors for promotion and tenure. Although both theoretical and experimental junior faculty are at risk if the senior faculty are predominately of a "different stripe," the problem may be more serious for experimentalists because of the relatively strong antecedents in the field of mathematics (the traditional practices of theoretical computer science are similar to those of mathematics, which are themselves similar to the traditional practices of most universities); also, the relatively recent emergence of experimentation suggests that senior experimentalists are in the minority.
The tension described above was demonstrated in the comments of a number of ECSE faculty responding to the CRA-CSTB survey. Assistant professors wrote,
It is clear to me that experimental computer science is not considered to be of broad intellectual interest by the vast majority of the senior faculty in my department. [assistant professor at a well-known private university]
Ironically, I believe that experimental research is frequently viewed as non-scientific. Many people in my department seem to feel that theorems and proofs are the only valid method of argument. [assistant professor at a large public university]
There is a premium on journal publications, and preferably those with some theoretical leanings. I have decreased the amount of experimental work to orient my work toward this more demonstrably recognizable research contribution. [assistant professor at a wellknown private research university]
Opinions expressing the "opposite" impression might be heard from theoreticians in predominately experimental departments, but the CRACSTB survey did not sample theoreticians.
For some faculty, an affirmative tenure decision does not change their perception that experimental work is not highly valued. An associate professor at a large public university stated: "I just received tenure this year. I felt under intense pressure to move away from experimental work, and to concentrate on formalization and theory."
Even when the department seems to understand the burdens of the ECSE discipline, it is not always evident to assistant professors that the understanding will be converted to action at promotion time. One insightful assistant professor at a private university observed:
The department senior faculty and university-level tenure committee do seem to understand that experimental systems work is timeconsuming, important and needs to be evaluated differently. On the other hand, there is still a strict demand for demonstration of intellectual ability which is more easily and readily met by focusing on more theoretical journal publications.
Others confirm this:
Not only does [experimental research] take time and money, but there is no indication that it is appreciated—specially when the primary tenure measure is publication. [assistant professor at a private university]
I have attempted to do more theoretical work because of the tenuring process even when I feel that the research is not substantially improved by the theoretical aspects and it would have been more productive to spend the same effort on experimental evaluation. [assistant professor at a large public university]
Although it is not possible to determine whether these perceptions are true in fact, the committee believes that they are widely held and that they affect a faculty member's willingness to pursue experimental research as an assistant professor. At the same time, although it is a common belief that experimentalists are disadvantaged at tenure or promotion time in comparison with "equally" qualified theoreticians for
the kinds of reasons discussed in this report, the committee made no attempt to document such cases. A tenure decision is based on many considerations, of which research and scholarship is but one (and, as in other areas, it may be that the quality of a researcher's work is low by the standards of the field itself). Other academic duties (e.g., teaching and service) figure into the decision, as do the strength of letters of evaluation, the personalities of the people involved, the prospects for continued scholarly output, student interactions, and so on. Most of these data are not publicly available. It would be impossible, without being present at all of the deliberations and being party to the participants' thoughts, to second-guess an individual tenure decision and assert that someone was denied tenure simply because of prejudice against experimentalists.
THE EFFECT OF EVALUATION ON PROBLEM CHOICES AND RESEARCH AREA
The practicalities of evaluating ECSE research have substantial impact on how faculty members moving up the career ladder see their own careers. The committee was struck by the considerable intensity of feeling among CRA-CSTB survey respondents that the traditional tenure and promotion (T&P) review process works against their interests and those of the field. The following quotes (all from associate professors at large public universities) illustrate this sentiment:
It is absolutely apparent to me that tracking the market upon which I depend—that is, staying aware of tools, trends, systems and applications available for use in my research—is quite at odds with the promotion process. Staying on top, as I'd been accustomed, exacts a high cost in time and energy; three years into my position as assistant professor I needed to make a conscious decision to abandon these efforts in favor of work that is technically less crisp, having shorter-term pay off, and perhaps done with less outside impact and applicability.... Now having tenure, I have the opportunity to try to return to the front lines of technology.
[The positive tenure] decision has changed the character of my research to a degree, mostly by giving me the freedom to make the right choice during system development rather than simply the expedient one.
[The tenure decision] pressed me to publish "something," "anything" decent, even though my systems were not mature, and tended to press me to apply for grants and submit papers regarding work that really was not in the best shape for that.
Such responses indicate an understanding on the part of junior faculty of a career strategy in which one should modulate one's am-
bitions before the tenure decision is made. Yet it is also clear that many ECSE faculty—junior and senior—consider doing so equivalent to choosing to pursue less important work. That is, because research problems should be selected on the basis of importance, a researcher who chooses not to pursue his or her highest-priority problem is by definition working on a less important problem. This is true regardless of the reasons for not pursuing the original problem, (e.g., it is too ambitious to pursue while under tenure pressure).
What is the origin of the sentiments reflected above? In some ways, it is understandable that junior faculty are often frustrated by the need to subordinate their desire to pursue the most promising intellectual paths in order to respond to the immediate demands of producing documentary proof of achievement for the time-limited tenure processes. (The promotion from associate to full professor seems to be less of an issue in this regard.) Individuals who have chosen to pursue careers in a given field have generally done so because they believe they have good ideas to contribute that will move the field forward.
Any field—ECSE included—must allow individuals to undertake high-risk activities for potentially high gain. Indeed, many senior faculty believe that the field progresses most rapidly and vigorously as the result of such activities on a wide range of fronts. In the words of Frederick Brooks of the University of North Carolina, ECSE would clearly benefit from "people with a vision who go aggressively after the vision, heeding no distraction."9 However, it is also clear that high-risk/high-gain activities should not be the only constituent of a field's overall research portfolio. Incremental research with low to moderate risk of failure also has a key role to play in the advancement of any field, although such work is generally far less glamorous. Both types of research are essential to moving ECSE forward.
Almost every researcher would like to be working on high-gain activities if success could be ensured, but the real question for the field is the following: Given that most high-gain research activities have an inherently high risk of failure (as well as being generally more demanding of resources) and that the field will benefit from incremental lower-risk research as well, who should be doing the high-risk/high-gain work?
In one sense, the answer is clear and is recognized by junior and senior faculty alike: tenured faculty have a much greater freedom of action to pursue high-risk research activities, although such freedom is not unlimited. The traditional tenure process in most institutions
provides strong incentives for junior faculty to undertake lower-risk (and correspondingly lower-gain) activities prior to the tenure decision. Given this simple statement of reality, the real problem is how to nurture talent and competence at performing in both lower-risk and higher-risk activities.
The most straightforward strategy for coping with this conflict is for a faculty member to perform lower-risk/lower-payoff work before tenure and higher-risk/higher-payoff work after tenure. This is indeed the approach that many junior ECSE faculty say they have adopted. However, in the absence of any detailed study, it seems that the ''high-payoff" accomplishments of ECSE have tended to be those of individuals who "hit for home runs" regardless of their tenure status, thus calling into question this apparently straightforward strategy. In practice, it would seem, many do low-risk research before the tenure decision and continue to do low-risk research even after they have received tenure. As one particularly insightful junior faculty member (an assistant professor at a large public university) said:
Once I have tenure, my current interest in generating a larger number of publications will probably shift toward a smaller number of high-quality publications. Or then, again, it may not. My saddest reflection on the tenure process is that six years is long enough that the shaping that occurs may be permanent.
One aspect of the low-risk strategy worth noting is the difference between a researcher working on projects that are structured in a way that allows for meaningful intermediate output (a desirable mode of research consistent with the discussion above) and a researcher maximizing his or her publication count by adopting a "least publishable unit" strategy in which the smallest possible increments of progress are published at frequent intervals (a highly undesirable mode of research that many faculty believe characterizes the reality of the tenure process at their institutions).
Some respondents to the CRA-CSTB survey did say that they ignored the tenure process and concentrated their work on what they thought was interesting, relevant, and important. On the whole, these individuals tended to be from very highly ranked schools.10 A sub
stantial minority of respondents to the CRA-CSTB survey said that an impending tenure decision drove them to focus more on publishing work that was demonstrably recognizable as research rather than pursuing projects with long time horizons. Some quotations follow:
Now that I have tenure, I feel more free to pursue research on systems that I want to do rather than more short-term projects which will lead quickly to a collection of publications. [associate professor at a smaller private university]
The tenure pressure has forced me to explore less speculative areas. Since I cannot afford to expend a significant amount of energy in an area that "didn't pan out," I am forced to do low-risk work. This tends to reduce the potential benefit of the work. [assistant professor at a large public university]
I now feel more free to address longer-term problems that may not yield publications as regularly but will, I believe, turn out to be of more lasting value. [associate professor at a smaller public university]
The T&P process also influenced the work of junior faculty in other ways. For example, a substantial number indicated that the realities of the T&P process at their institutions drove them to do work that was more theoretical in nature than they would have preferred, simply because experimental work in their environment was not as highly valued. In most cases, this issue arose because the T&P process placed the greatest weight on journal publications in its evaluation of research, journal publications that are themselves biased away from experimental work. Survey respondents commented:
I am coming up for tenure this year. It has already affected me in a big way because some of my most time-consuming activities in building actual systems do not produce sufficient publications per level of work, and so I expect to rely on some of my more theoretical work, which has indeed produced publications more easily. It is like a split personality, ... what I consider my most substantial contributions are likely to be ignored, and I may earn tenure on more conventional work. [assistant professor at a smaller private university]
I have tenure, but did primarily theoretical research before that. It was obvious from day one that systems building and getting tenure were not going to mix very well. Since receiving tenure, I have concentrated more on systems-oriented work, and indeed I have not published as many pages of material. Systems building has been part of the reason for this (there is not enough time to build systems, write as many papers as theoretical researchers, etc.). But I have also changed my idea of what is publishable, and [now] insist on having a real contribution before trying to publish something. [associate professor at a large public university]
In conclusion, it would seem to be in the scholar's, as well as the discipline's, best interest for everyone to work on his or her highest-priority problems. In the presence of tenure pressure, care must be taken to identify the risks and minimize them. A mentor (see Chapter 5) can provide the benefits of experience and can guide in the application of this generalized information to a junior faculty member's specific situation. Although the intellectual risks inherent in research problems will remain, much can be done to reduce methodological risks. In any event, there will not be enough time before the tenure decision. Yet a few papers in respected conferences documenting progress toward solving an important experimental problem represent a better accomplishment than a mountain of irrelevant paper.
A NOTE ON OTHER DISCIPLINES
The issues faced by academic experimentalists in ECSE have partial analogues in other disciplines. For example, artists, performing musicians, and dramatists generate work products (sculptures, musical performances, plays) that are analogous to the artifacts of ECSE. However, in these fields, the standard for intellectual accomplishment includes both scholarly analysis or publication and "artistic creativity," a standard that—although subjective—is nevertheless amenable to peer review.11 Universities seeking to evaluate the work of faculty artists, musicians, and dramatists consider the venues in which the works of these individuals are displayed (e.g., an exhibition at a major gallery is worth much more than one at the local community center), peer reviews of these works, and the stature of those peers. In addition, potential letter writers are generally given copies of the portfolio to the extent that it can be reproduced.
Engineers in noncomputer fields also produce artifacts that are judged on the basis of their utility to substantial audiences. However, as a broad generalization, it can be said that these artifacts are often based on a well-accepted theoretical foundation. An aeronautical engineer may design a system to control the flight of an airplane under particular circumstances (e.g., strong wind shear), and the flight control system will eventually be evaluated on the basis of its utility
in preventing crashes due to wind shear. However, control theory is a well-established and well-codified body of knowledge that enjoys paradigmatic status among control engineers. Thus, the aeronautical engineer is also likely to leave a ''paper trail" on the way to implementing the flight control system.
In conducting ECSE research, faculty members will imagine new computing ideas, create artifacts to implement them, and measure properties of the artifacts. It is important that the artifacts work and equally important that they be made available.
The research may be of high quality by the goals, standards, and traditions of ECSE, yet not accord with the expectations of a theoretician or the "usual academic publish-or-perish" standards. However, experimentalists hired to a faculty deserve to be evaluated by the criteria of their chosen specialty. Accommodation may be necessary.
The committee believes that at the point a tenure decision is made, an experimentalist may have
Predominately conference publications;
Nonstandard forms of dissemination, such as software; and
No graduate students completed
and still be a truly spectacular researcher. A positive judgment should be made on the presence or absence of the following:
One or more computational impact-producing artifacts completed;
Research results disseminated to and used by the community;
A reputation for novel systems solutions or ingenious experiments; and
A filled or filling pipeline of well-trained graduate students.
It is the responsibility of the candidate to achieve distinction. It is the responsibility of the department and institution to recognize and reward it.
As a final thought, the committee emphasizes the consequences of two points developed in this chapter. It takes a long time to produce artifacts, and there are often long delays before the impact of an artifact can be determined. Given that the probationary period is brief in relation to the length of this process, it often happens that universities must "gamble" on promoting a promising assistant professor because the data to support the case are not definitive. Although this happens from time to time in all disciplines, it happens
so often in ECSE that it may be the norm. There are examples of spectacular successes and unfortunate mistakes. Because a conservative strategy is not likely to succeed in the long run, universities are encouraged to seek the widest possible input into the promotion decision in order to increase their confidence in the decision.