Providing for a National Collaboratory Program
Computing and communications tools are already being adopted by a wide variety of scientists to automate data retrieval and analysis, to handle the growing scale and complexity of scientific problems, and to facilitate sharing of data and insights with colleagues. Thus a foundation has been laid for more systematic and more integrated uses of computer-based tools in scientific applications where collaboration is necessary or desirable. Achieving this next level in the use of technology, referred to as the collaboratory concept, will involve both technical and social challenges. Computer scientists and engineers will have to join with scientists in other fields in partnerships focused on improving collaboration in specific scientific arenas. Effective partnerships promise to yield both gains in the productivity of collaborating scientists and gains in scientific achievement in areas where the technology makes possible the exploration of new kinds of questions and the use of new methodologies for scientific research. At a time when the cost of scientific research is expanding while research support is tightening, such gains take on a special appeal. The committee believes that the time is right for a focused initiative to pursue scientific collaboratory projects and develop associated technologies.
As discussed in Chapters 2 through 4, several prototype collaboratories have been developed for limited applications. These prototypes have demonstrated the usefulness and viability of the collaboratory concept as a means of facilitating scientific research. Based on its findings, the committee recommends an initiative to:
Establish a research program without delay to further knowledge of how to build, operate, and use collaboratories in the support of science. This program should have two major components of equal importance:
A research component dedicated to developing and integrating the software and hardware needed to build and apply collaboratories.
An education component dedicated to educating and training the people needed to build and use collaboratories.
The overarching goal of the program is to aid science and scientists through the construction and operation of working collaboratories. To achieve this goal the committee further recommends that the program:
Establish several collaboratory testbeds, funded at a level of $6 million per year each over a period of 5 years each.
The committee has learned from the workshops held in oceanography, space physics, and genome mapping and sequencing that these particular fields of science can benefit from the use of collaboratories,
and it is clear that opportunities exist for their use in other disciplines such as seismology and neuroscience (Institute of Medicine, 1991). Concurrent construction of collaboratory testbeds (Box 6.1) will encourage synergistic technology development and permit several sciences to explore the technology in the short term. At the same time, generic collaboratory technology will benefit from being developed for and adapted to several disciplines.
Multiple testbeds tailored to the needs of particular disciplines are recommended to investigate thoroughly the use of collaboratories to discover, teach, and transfer scientific knowledge. Testbeds are the only effective way to explore the multifaceted nature of these tool-oriented computing and communications systems. The committee believes that these testbeds can play a major role in demonstrating how science will be done in the 21st century and how a national program for information infrastructure for research can be implemented.
A collaboratory testbed program has the potential to support science in at least four different ways: (1) by giving scientists tools to do more and better science; (2) by giving teachers tools that they and their students can use to experiment, explore, and collaborate; (3) by involving industry in collaboratory development, thus giving scientists a means to transfer technology from the laboratory to the business sector, which can then make collaboratory technologies and services commercially available; and (4) by providing opportunities to understand better the social and organizational dynamics of scientific research conducted using collaboratories. Such research can be used to refine and improve collaboratories, making them more useful to science and more easily used by scientists. An important consequence of such a program will be the development of the human resources needed to support the building and operation of scientific collaboratories.
A program length of 5 years is recommended due to the nature of the project. Building collaboratories will require that existing technology be adapted and integrated in new ways, and that new technology be developed. A significant period is needed in which to develop and apply the technology and to refine it based on experience.
It is expected that each testbed will involve senior natural scientists, senior computer scientists, senior social scientists, and a variety of junior-level people. The committee estimates that about 50 full-time-equivalent (FTE) workers will be needed per testbed to create a critical mass of expertise. Fifty FTEs will cost about $5 million per year, based on a mean cost per FTE of $100,000, including overhead. Further, 50 FTEs will need about $1 million for computer equipment and networking facilities per year averaged over the 5-year period of the program. Hence the total annual cost is estimated at $6 million per testbed. The relatively high equipment capitalization costs are a consequence of intensive use of computing technology for instrument control, large-scale databases, and scientific visualization tools associated with the kinds of oceanographic, meteorological, and genome mapping collaboratories considered by the committee. The $6 million estimate assumes that testbed collaboratories will make use of existing scientific instruments and equipment. The committee notes that the NSF supercomputer centers, the NSF science and technology centers, and the NIH and DOE Human Genome Project require similar levels of support to accomplish comparably ambitious objectives.1
A program involving three testbeds implies a total funding level of $90 million to $100 million. The committee believes that this level would enable the program to achieve critical mass. However, it is recognized that this is a time of tight resources. Although shrinking the scale of the overall program somewhat may be a necessity, the committee emphasizes the importance of providing sufficient resources per testbed to achieve sufficient scale (see Box 6.1). Prototype collaboratories and technologies examined by the committee have been developed and implemented on a small scale; the challenges of developing and implementing collaboration technologies on a larger, effectively national, scale are substantial. Consequently, the committee formulated its recommendation to include a minimal number of testbeds while providing for a level of resources per testbed that, based on the experience of other projects supporting collaborative research, appears necessary to achieve success. Given the importance of scale, undertaking fewer testbeds at the same time may be one approach to stretching resources.
BOX 6.1 COLLABORATORY TESTBEDS—IMPACT OF SCALE
The purpose of a collaboratory testbed is to build and propagate a complete collaboration system to a specialized community of scientists. The goal is to run a large-scale experiment in both the technology of building a collaboratory and the sociology of using it, in order to infer what some of the components of an effective national research information infrastructure should be. Funding must accordingly be available for a period sufficient to run a large-scale experiment.
As shown by the limited scale of staffing for current collaboratory ''testbeds,''* these testbeds are really prototypes. They include a small sample of users, implement a small sample of a technology, bring the users up on the technology with minimal support, and track a sample of the usage. For example, the Worm Community System is currently only a model implementation and will have to be expanded considerably before it can function as a complete information infrastructure. In particular, the sets of data libraries and analysis programs must be expanded, a true distributed system across platforms and networks must be developed, and a case-hardened implementation must be evolved before the technology can be used as a springboard for a generic infrastructure. The expansion factor from prototype to testbed is partially due to increased functionality but primarily due to increased service, in providing more complete coverage and availability. In communications system development, it is common for the functionality to require only 10 percent of the effort, while universality and maintainability require the other 90 percent. So it is not surprising that the existing collaboratory "testbeds" are experiments in feasibility, rather than complete testbeds.
Expanding a prototype into a testbed involves supporting a much wider range of data types, data sources, hardware platforms, user interfaces, analysis programs, explanatory documents, and technical support. It would also involve defining the interfaces and providing the toolkits to permit many scientists to add their own wide range of data, programs, and interactions.
To see what might be necessary for a complete collaboratory testbed, it is instructive to examine centers for production systems. For example, the Genome Data Base, the central archive for the genetic data from the Human Genome Project, is a production project that provides reliable retrieval and periodic updates to a broad user community with significant user support, i.e., continuously running, daily updates of information contributed by several thousand users. Communications infrastructure projects, such as Arpanet and Internet, which are the direct predecessors of collaboratories, began as research projects at a few sites, grew into research prototypes at vertically integrated, quasi-academic organizations, and then became institutionalized as production developments at commercial corporations or, in the case of Arpanet, in the U.S. defense command and control system.
The development of collaboratories will likely move through this same path; the research projects are just now beginning with prototype collaboratory efforts such as the Worm Community System, the Sondre Stromfjord collaboratory, and highly collaborative research efforts such as the Tropical Ocean-Global Atmosphere and World Ocean Circulation Experiment programs. Prototype and testbed efforts point the way toward an institutional stage. Collaboratory development needs to become institutionalized, with resources for it built into national science funding policies.
Consistent with the federal government's tradition of supporting basic research, the collaboratory program is envisioned as a federal program, building on related research efforts at NSF and various mission agencies such as NASA, DOE, and DOD. However, consonant with the "partnership" vision, a collaboratory program should be designed to draw on efforts and support from other sectors. There are a variety of options for industrial participation, for example, since industry not only has begun to develop commercial collaboration technology, but also employs people engaged in collaborative research (sometimes in collaboration with academic researchers). Universities not only house and benefit from scientific centers, but also employ scientists who collaborate with distant colleagues. Scientific organizations support and promote a variety of research programs, some of which (especially those viewed as "big science") require collaboration and several of which are under budget pressures. The full range of organizations that can benefit from collaboratories should be recognized and leveraged; a collaboratory program should be designed to encompass contributions of personnel, equipment, research facilities, and other resources from industry, academia, federal laboratories, and scientific organizations. Those contributions could both diminish and leverage the federal contribution.
As a part of the technology component of the program, the committee further recommends that the program:
Provide for two national demonstrations of each collaboratory testbed.
The national demonstration would provide a showcase of the program to the larger scientific and technical community. Such a demonstration could be held in conjunction with the meeting of a large scientific organization, such as the American Association for the Advancement of Science, to reach a wide-ranging audience or could be presented at discipline-specific conferences to promote and focus discussion within the community using the collaboratory. A national demonstration serves at least three purposes. First, it motivates participants in the testbed program to meet deadlines and drive the technology development for working collaboratories. Second, it educates the scientific community about how collaboratories can be used to advance science. Third, it provides scientists and technologists further opportunities to interact and collaborate. An initial national demonstration of each testbed could take place about half way though the proposed 5-year project cycle. A second demonstration could take place at the end of the cycle.
The cost of this feature of the program would be about $2 million per demonstration per testbed, based on the committee's estimates of the costs for other similar national demonstrations. A major value of such demonstrations is that the results are not ephemeral, but are integrated into and become a part of each demonstrated testbed.
As part of the education component of the program, the committee also recommends that plans be made to:
Initiate multiple and complementary activities to develop the human resources needed to carry out the collaboratory program, including, but not limited to:
A summer fellowship program to provide hands-on training for scientists and technologists in the use and development of collaboratory technologies in the conduct of science.
The summer fellowship program is aimed at increasing the community of scientists and technologists experienced in the development, implementation, and application of collaboratories and collaboratory technology. Fellowships could be sponsored by scientific societies, such as the American Association for the Advancement of Science, and the government in conjunction with academic institutions. Fellowships also provide a forum for interdisciplinary training of scientists and technologists. Such training is crucial to the success of the collaboratory program.
The cost of the fellowship component of the collaboratory program is estimated at $1 million per year, an amount that would support 50 summer fellows per year at $20,000 each.
Regularly scheduled national symposia for testbed principal investigators, research staff, and graduate students, providing opportunities to share information, findings, and conclusions regarding the technical aspects of building, operating, and using collaboratories.
The objective of this recommendation is to foster the creation of a community of builders and users of collaboratory technology. It is envisioned that principal investigators, research staff, and graduate students working on the testbeds would attend these meetings to share experiences and the results of their work.
In conclusion, the committee believes that the program outlined by these recommendations is the appropriate level of effort that should be undertaken. Although it would be possible to carry out only the research component of the proposed program, the committee believes that the education component is a critical aspect of the effort. Without the education component, it is much less likely that a skilled and growing community of collaboratory users and developers will emerge. The two components taken together as a program constitute a strategic effort to improve the basic infrastructure supporting the conduct of computationally intensive science in the United States.