The fusion of computers and electronic communications has the potential to dramatically enhance the output and productivity of U.S. researchers. A major step toward realizing that potential can come from combining the interests of the scientific community at large with those of the computer science and engineering community to create integrated, tool-oriented computing and communications systems to support scientific collaboration. Such systems can be called ''collaboratories."
Collaboration among colleagues is a challenge for the scientific community that takes many forms, most notably the sharing of data and/or special instruments, joint authoring of papers, and cooperative research. More and more scientific problems demand collaboration for their resolution as a consequence of increasing complexity and scale, a growing amount of which reflects the proliferation of fundamentally interdisciplinary problems. The study of global change phenomena illustrates all of these dimensions; it requires the expertise of oceanographers, meteorologists, biologists, chemists, physicists, experts in modeling and simulation, and others from around the world.
In many areas scientists have sought computer-based tools and techniques for data gathering, storage, analysis, modeling, and communication, making use of both generic (including off-the-shelf) technology and the tools they have developed to meet their own, specific needs. These bottom-up efforts have been productive, but their implementation has been difficult: funding for tool development has been inadequate, tools have been deemed awkward to use, and the building of tools is regarded by most scientists as less prestigious than the direct conduct of research.
At the same time (but largely in isolation), computer scientists and engineers have continued to advance the state of computer technology, developing better and less expensive tools for storing, accessing, and manipulating data; for monitoring and controlling instruments and other equipment; for supporting communications and collaboration among dispersed parties; and so on. But their general-purpose tools do not always match the needs of user communities. A more explicit partnership between scientists in general and computer scientists in particular can inspire development of computing technology that better meets the needs of scientists, better leverages the efforts of computer scientists, and provides broad benefits to scientists across the research community. That prospect was the motivation for this study, which focused on the potential for computer-based technology to facilitate scientific collaboration and improve the utility of computer-based resources used in scientific research.
Although technology will never cause the unwilling to collaborate, it can facilitate collaboration among those who are motivated and can also make it more attractive to others. There is evidence that this is happening. One example is the phenomenal growth in the provision and use of services offered through the Internet, the global network spawned by federally funded research into computer-based communications and now used by millions of scientists, engineers, and educators. Through the Internet, researchers access databases, share software and documents, and communicate with colleagues. The Internet has made collaboration among dispersed scientists practical, and it has been used for that purpose. Nevertheless, despite technological improvements, new tools, and guides, the Internet remains a somewhat primitive tool for collaboration, especially for those scientists who cannot enjoy or do not have the time for learning how to use it.
Observation of the rise of computational science and the popularity of the Internet and its constituent research networks among scientists in general led a group of computer scientists and engineers to conceive of and begin to explore the concept of a "collaboratory," which is an environment in which all of a scientist's instruments and information are virtually local, regardless of their actual locations. The virtual environment of the collaboratory supports interaction among scientists; among scientists, instruments, and data; and among networked computing tools used in the conduct of scientific research.
The development of a national collaboratory capability would facilitate collaboration of individuals and groups without regard to their physical locations. Collaboration may occur among scientists within a given facility or institution, but collaboration and interaction among distant researchers and resources are becoming increasingly important.
Although articulating the rationale for collaboration may be easy, achieving effective collaboration is not. In part, the situation reflects the basic training of scientists: scientists have been educated to focus on individual activity and achievement. Moreover, scientists have had to compete with each other to attain recognition and resources. Collaboration tends to be easier on a small scale and when it is local: when a small number of individuals collaborate it is generally possible to proceed on the basis of mutual trust, but "rules of the road" are needed for larger-scale collaboration. These and other human considerations shape and constrain the collaborations that do take place; in some instances they also inform the design of incentives to promote collaboration.
To gain insights into the motivations for collaboration among and within different fields of science, the obstacles to effective collaboration, and the potential benefits of computer-based tools for collaboration, the committee held workshops addressing these issues in the contexts of molecular biology, oceanography, and space physics, three fields that vary greatly in their use of computing and communications technology and in the applicability of the collaboratory concept. Despite these variations, all three fields share a common dependence on the collection and analysis of large amounts of data.
Through the workshops the committee found that collaboration is becoming more common (albeit at different rates) in these fields, within and between disciplines; that the conditions under which individual scientists work vary substantially; and that the familiarity with, access to, and use of computer-based technology vary significantly across fields. The workshops suggested that the broader community of researchers is aware of some of the relevant technological advances but often lacks the technical and financial support necessary for applying new technology.
The committee found that generally, any science that makes extensive use of computing for modeling, simulation, data analysis, and data storage and retrieval can benefit from the use of collaboratories, particularly in circumstances where collaboration has already begun. Bottom-up motivation will be an essential factor in the success of any collaboratory effort.
The committee concluded that a research program to further knowledge of how to implement and effectively use collaboratories would have broad impact. Such a program could involve the development, adaptation, or integration of wide-bandwidth communication between two or more sites allowing good transmission of sight and sound to achieve a virtual presence of an individual in someone else's laboratory, sets of database tools with common access and sophisticated graphics capabilities for interpreting masses of information, collaborative authoring and editing tools, and so on. While all of these developments can contribute to the conduct of science, the greatest impact will come from integrating these technologies and implementing them on a large enough scale to serve significant scientific communities—a large enough scale to provide scientists with new and better options for designing and executing their projects.
The committee envisions a program that would bring together computer scientists and other members of the scientific community. Such a program would present many challenges, given both the likely variations in the cultures of the disciplines involved and the potential awkwardness of having one partner in the position of a supplier and one in the position of customer. Nevertheless, the limited experience to date with such cross-disciplinary partnerships has been encouragingly beneficial to science.1 Both the opening of all fields to more interdisciplinary activity and the recent, dramatic advances in
computing and communications technologies make the time propitious for joint collaboratory building by the scientific community.
The committee concluded that a collaboratory testbed program has the potential to address important scientific needs while simultaneously representing a key step toward developing national and global information infrastructure. The committee thus recommends an initiative to:
Establish a research program without delay to further knowledge of how to build, operate, and use collaboratories in the support of science. This program should have two major components of equal importance:
A research component dedicated to developing and integrating the software and hardware needed to build and apply collaboratories.
An education component dedicated to educating and training the people needed to build and use collaboratories.
The overarching goal of the program is to aid science and scientists through the construction and operation of working collaboratories. To achieve this goal the committee further recommends that the program:
Establish several collaboratory testbeds, funded at a level of $6 million per year each over a period of 5 years each.
Provide for two national demonstrations of each collaboratory testbed.
Initiate multiple and complementary activities to develop the human resources needed to carry out the collaboratory program, including, but not limited to:
A summer fellowship program to provide hands-on training for scientists and technologists in the use and development of collaboratory technologies in the conduct of science.
Regularly scheduled national symposia for testbed principal investigators, research staff, and graduate students, providing opportunities to share information, findings, and conclusions regarding the technical aspects of building, operating, and using collaboratories.
Because scientific disciplines vary in numerous ways, it should not be assumed that one mode of collaboration or one set of collaboration tools can fit all needs. Therefore, the committee recommends that the proposed research program establish several collaboratory testbeds, which should be tailored to the needs of particular groups of scientists in their respective fields. Although meeting specific needs is essential for collaboratories, developing multiple testbeds will allow an assessment of how much common infrastructure (such as the common support provided by the Internet) may be needed in comparison to more specialized tools. A 5-year, 3-testbed collaboratory program, for example, is estimated to cost about $100 million.
Achieving sufficient testbed scale is critical. The National Science Foundation's program on collaboration technology under the Directorate for Computer and Information Science and Engineering is already exploring research opportunities in this arena, but on a scale and scope less than the committee
believes necessary to create a national collaboratory capability. Demonstrating the revolutionary potential of collaboratories requires a new scale in the funding of research projects in information technology, because complete systems must be built, rather than isolated software, to demonstrate the usefulness of the full set of integrated technologies as well as the sociological effects. A national collaboratory can support a nationwide community interacting with a distributed digital library or with a remote large instrument. Implementing and operating such a system, even for an experimental testbed, requires an effort on the scale of national centers for databases or instruments. Hence, the major recommendation of this report is for the construction and evaluation of a few large collaboratory testbeds; the scale of the testbeds is more important than the number.
The introduction of a collaboratory testbed program must address explicitly a fundamental concern among the scientists who would be the users of collaboratories, namely the fear that spending on tools, regardless of their value, would diminish available resources for basic research in individual disciplines. This reaction may reflect past experience, inasmuch as scientists' efforts to build their own tools may have drawn on the funding and other resources nominally allocated for research. The proposed collaboratory program has been formulated to overcome this concern. First, it anticipates that as computation and communication become more integral to science, the distinction between "research" and "tools" will become less meaningful, and spending choices may appear less zero-sum. This is already happening in some areas (witness the broader attention to computational science). Second, by providing scientists with an alternative to do-it-yourself tool building, the program would help scientists (and their funders) to focus on their primary research while helping to assure that tools are built more efficiently and effectively. Meanwhile, the concept of a partnership between computer scientists and other scientists recognizes that there is mutual research benefit to the development of collaboration technology.
The proposed collaboratory testbed program is consonant with the objectives of the High Performance Computing and Communications (HPCC) initiative, which has the mission of developing high-performance computing systems, advanced software and algorithms, computer networking for research and education, and basic research programs and the human resources to support the mission. Many of the technologies developed as a result of HPCC will play important roles in the development of collaboratories. At the same time many components of HPCC, such as research into medical computation and environmental computation, access to academic medical centers and environmental databases, prototyping of experimental high-performance computing systems, research into software tools, and computer access in general will require collaboration between scientists and technologists. Finally, HPCC involves the (Defense) Advanced Research Projects Agency, Department of Energy, National Aeronautics and Space Administration, and National Institutes of Health as well as NSF, suggesting a broad potential base in government for relevant support and funding.
The committee believes that the program outlined by these recommendations is the minimum level of effort that should be undertaken. The research and education components, taken together as a program, constitute a strategic effort to improve the information infrastructure supporting the conduct of computationally intensive science in the United States.