Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
The Use of ~formadon Technology in Research n this chapter we examine the effect of information technology on the conduct of research. New technologies offer new opportunities, although pervasive use of computers in research has not come about without problems. Some of these problems are technological, some financial. Underlying many of them are complex institutional and behavioral constraints. Nearly five decades ago, the first programmable, electronic, digital computer was switched on. That day science acquired a tool that at first simply facilitated research, then began to change the way research was done. Today these changes continue, and now amount to a revolution. Electronic digital computers at first simply replaced earlier technologies. Researchers used computers to do arithmetic calculations previously done with paper and pencil, slide rules, abacuses, or roomfuls of people running mechan- ical calculators. Benefits offered by the earliest computers were more quantitative than qualitative; bigger computations could be done faster, with greater reliabil- ity, and perhaps more cheaply. But computers were large, expensive, required technically expert operators and programmers, and consequently were accessi- ble only to a relatively small fraction of scientists and engineers. One human generation and several computer generations later, with the advent of the integrated circuit (the semiconductor "chip"), computational speed increased by a factor of 1 trillion, computational cost decreased by a factor of 10 million, and the smallest useful calculator went from the size of a typewriter to the size of a wristwatch. At present, personal computers selling for a few thousand dollars can put significant computing power on the desk of every scientist. Meanwhile, advances in the software through which people interact with and instruct computers have made computers potentially accessible to people with no specific training in computation. More recently, computer technology has joined telecommunications technology to create a new entity, 11
12 INFORMATION TECHNOLOGY AND THE CONDUCT OF RESEARCH Bodices supplement or expand points in the text: the first two below deal with specific disciplines. "information technology." Information technology has done much to remove from the researcher the constraints of speed, cost, and distance. On the whole, information technology has led to improvements in research. New avenues for scientific exploration have opened. The amount of data that can be analyzed has expanded, as has the complexity of analyses. And researchers can collaborate more widely and efficiently. Different scientific disciplines use information technology differently. Uses vain according to the phenomena the discipline studies and the rate at which the discipline obtains information. In such disciplines as high energy physics, neurobiology, chemistry, or materials science, experiments generate millions of observations per second, and these must be screened and recorded as they happen. For these disciplines, computers that can handle large amounts of information quickly are essential and have made possible research that was previously impractical. Other disciplines, such as economics, psychology, or public health, gather data on events that accumulate slowly over relatively long periods of time. These disciplines also need computers with large capacities, but do not need the capability to react in "real time." Most disciplines use informa- tion technology in ways that fall somewhere in the range between these two extremes. HIGH ENERGY PHYSICS: SCIENCE DRIVES THE LEADING EDGE OF INFORMATION TECHNOLOGY An example helps to illustrate the direction in which many disciplines are moving: high energy physics could not be done without information technology, and offers an ex- treme example of the trends for computing and communication needs in many scientific disciplines. Most high energy physicists work on the same set of questions: what is the behavior of the most elementary particles, and what is the nature of the fundamental forces be- tween them? Their experiments are con- ducted in machines called accelerators, de- vices that produce beams of protons, elec- trons, or other particles that are accelerated to high speeds and huge energies. There are two types of accelerators: those in which two beams of particles are made to collide with each other (colliders), and those in which a beam hits stationary targets. Physicists then reconstruct the collision to find new phe nomena. Remarkable results have emerged from high energy physics experiments conducted over the past two decades. For instance, a Nobel prize-winning experiment carried out at the proton-antiproton collider at the Euro- pean Center for Nuclear Research (CERN) in Switzerland, discovered two new particles known as the W and the Z. Their existence had been predicted by a theory claiming that the weak and electromagnetic forces, seem- ingly unrelated at low energy levels, were in fact manifestations of a single force, called the electroweak interaction, which would ap- pear at sufficiently high energies. This discov- ery is a significant step toward the descrip- tion of all known interactions-gravity, elec- tromagnetism, and the strong (nuclear) and weak (radioactive decay) forcers manifes- tations of a single unifying force. The process by which some tens of these
13 The Panel recognizes the diversity in research methods, and differences in needs for information technology. But the needs of researchers show sufficient commonalities across research fields to make a search for common solutions worthwhile. THE CONDUCT OF RESEARCH The everyday work of a researcher involves such activities as writing proposals, developing theoretical models, designing experiments and collecting data, ana- lyzing data, communicating with colleagues, studying research literature, rev~ew- ing colleagues' work, and writing articles. Information technology has had important effects on all these activities, and more change is in the offing. To illustrate these effects, we examine three particular aspects of research: data collection and analysis, communications and collaboration, and information storage and retrieval. In each area, we discuss how researchers currently use information technology and what difficulties they encounter. In a final part of this section, we discuss new technological opportunities and their implications for the conduct of research. new W and Z particles were isolated from millions of collision events in the CERN accel- erator offers a striking illustration of the dependence of high energy physics on the most advanced aspects of information tech- nology. Three steps are involved. First, data are acquired in real time as the experiment progresses; second, the data obtained are transformed into flight paths, from which the particles making the paths are identified; and third, the event itself is reconstructed, and those few events exhibiting the very special characteristics of the new phenomenon are identified. In each of these steps computers are vital: to trigger the identification of inter- esting events; to establish particle tracks from the data; and to carry out analysis and interpretation. In the future, high energy physicists will demand more from information technology than it can now deliver. Proposed new parti- cle accelerators, such as the Superconduct- ing Super Collider (SSC), are expected to pro duce several million collisions every second, of which only one or two collisions a second can be recorded. Selecting this tiny fraction of the produced events in a manner that does not throw away other interesting data is a tremendous challenge. It is hoped that "farms" of dedicated microprocessors might be able to examine tens of thousands of collision events per second, so that sophisti- cated selection mechanisms can screen all collisions and select the veIy few that are to be recorded. The computer programs that need to be developed for these tasks are of unprecedented size and complexity, and will challenge the capabilities of both the physi- cists programming them and the information technology software support available to the programmers. Even the small fraction of recorded events will result in some ten million collisions to be analyzed in a year. Processing one year's worth of saved data from the SSC would take a modern mid-sized computer 500 years; THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
14 INFORMATION DATA COLLECTION AND ANALYSIS TECHNOLOGY AND THE CONDUCT Current Use Collecting and analyzing data with computers are among the OF RESEARCH most widespread uses of information technology in research. Computer hard ware for these purposes comes in all sizes, ranging from personal computers to microprocessors dedicated to specific instrumentational tasks, large mainframe computers sensing a university campus or research facility, and supercomputers. Computer software ranges from general-pu~pose programs that compute nu meric functions or conduct statistical analyses to specialized applications of all sorts. The Panel has identified five trends in the use of information technology in data collection and analysis: · Increased use of computers for research. This trend coincides with large and continued increases in the speed and power of computers and corresponding declines in their costs. · Dramatic increases in the amount of information researchers can store and analyze. For example, researchers can now process and manipulate observations in a database consisting of 18 years x 3,400 individuals x 1,000 variables per individual for each year, create sets of relationships among these observations, obviously, a faster processing rate is re- quired. Although no computer currently on the market would handle this load in reason- able time, existing plans suggest that, by the time it is needed, some combination of dedi- cated microprocessors and large mainframe systems will be available. High energy physicists are also highly de- pendent on networks. Accelerators are lo- cated in only seven main laboratories in the United States, Switzerland, West Germany, the Soviet Union, and Japan; the physicists who use them are located in many hundreds of universities and institutions scattered around the world. Almost every high energy experiment, large or small, is a result of international collaboration: for instance, one detector installed around one of the collision points of the accelerator at the Fermi Na- tional Laboratory is run by a collaboration of four foreign and thirteen U.S. institutions, involving some 200 physicists. Physicists at several institutions designed different parts of the detector; since the detector has to work as an integrated apparatus, the physicists had to coordinate their work closely. Different physicists are also interested in different as- pects of the experiment, and subsequent analysis of the data depends crucially on adequate networking. Future networking needs for high energy physics involve very high transmission speeds (as high as 10 megabits per second) between laboratories, with provision for ex- change of collision event files, graphics, and video conferencing. Present long distance communication links are limited to lower transmission speeds (typically, 56 kilobits per second); each university physics group could use a 1.5 megabit per second line for its own research needs. The provision of these facil- ities would be of enormous benefit to univer- sity-based physicists and students who can- not travel frequently to accelerator sites.
15 and then subject the data to complex statistical analyses, all at a cost of less than $100. Two decades ago, that kind of analysis could not have been conducted, and a much simpler analysis would have cost at least ten times as much. · The creation of new families of instruments in which computer control and data processing are at the core of observation. For example, in new telescopes, image-matching programs on specialized computers align small mirrors to produce the equivalent light-gathering power of much larger telescopes with a single mirror. For instruments such as radio-telescope interferometers, the computer integrates data from instruments that are miles apart. For computer- assisted tomographic scanners, the computer integrates and converts masses of data into three-dimensional images of the body. · Increased communication among researchers, resulting from the prolifera- tion of computer networks dedicated to research, from a handful in the early 1970s to over 100 nationwide at present. Different networks connect different communities. Biologists, high energy physicists, magnetic fusion physicists, and computer scientists each have their own network; oceanographers, space scien- tists, and meteorologists are also linked together. Networks also connect re- searchers with one another regionally; an example is NYSERNET, the New York State Education and Research Network. Researchers with defense agency con- tracts are linked with one network, as are scientists working under contract to the National Aeronautics and Space Administration (NASA). Such networks allow data collection and analysis to be done remotely, and data to be shared among colleagues. · Increasing availability of software "packages" for standard research activities. Robust, standardized software packages allow researchers to do statistical analyses of their data, compute complex mathematical functions, simplify mathematical expressions, maintain large databases, and design everything from circuits to factories. Many of these packages are commercial products, with high-quality documentation, service, and periodic updates. Others are freely shared software of use to a specialized community without the costs or benefits of commercial software. One example illustrating several of the above trends is a system that geophys- icists have set up to predict earthquakes more accurately. Networks of seismo- graphs cover the western United States. One such network in northern California is called CALNET. Information from the 264 seismographs in CALNET goes to a special-purpose computer called the real-time picker. The software on the real-time picker looks at data as they come in and identifies exceptional events: patterns that indicate a coming earthquake. Then it notifies scientists of the events by telephone and sends graphics displays of locations and magnitudes, all within minutes. Difficulties Encountered The difficulties that researchers encounter using information technology to collect and analyze data vary in importance depend- ing on the particular discipline. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
16 INFORMATION One difficulty is uneven access to computing resources. Information technol TECHNOLOGY AD of iS not equally accessible to ail researchers who could benefit from its use, THE CONDUCT even though broadening access is a continuing focus of institutions and Finding OF RESEARCH agencies. To take an example from the field of statistics: according to a 1986 report on the Workshop on the Use of Computers in Statistical Research, sponsored by The Institute for Mathematical Statistics, "...the quality and quantity of computational resources available to researchers today varies dra matically from department to department . . . Perceived needs appear to vary just as dramatically.... tWhile] departments that already have significant computer hardware feel a strong need for operating support, . . . departments that do not have their own computational resources feel an equally strong need for hard ware." (Eddy, 1986, p. iii.) Exclusion from resources happens for a variety of reasons, all reducible in the end to financial constraints. Not all academic or research institutions have links to networks; in addition, access to networks can be expensive, so not everyone who wants it can afford it. In some cases, since access to networks often mediates access to resources such as supercomputers, exclusion from networks can mean exclusion from advanced computing. See box on software, One of the most frustrating difficulties for researchers is finding the right page 18. software. Software that is commercially available is often unsuited to the specialized needs of the researcher. In those fields in which industry has an interest, however, commercial software is being developed in response to a perceived market. Software could be custom designed for the researcher, but relatively few researchers pay directly for software development, partly because research grants often cannot be used to support it. Consequently, most research RESEARCH MATHEMATICS AND COMPUTATION Computation and theory in mathematics are symbiotic processes. Machine computing power has matured to the point where math- ematical problems too complicated to be understood analytically can be computed and observed. Phenomena have been observed for the first time that have initiated entirely new theoretical investigations. The theory of the chaotic behavior of dynamic systems de- pends fundamentally on numerical simula- tions; the concept of a "strange attractor" was formulated to understand the results of a series of numerical computations. Recent advances in the theory of knots have relied on algebraic computations carried out on com- puters. These advances can be directly ap- plied to such important topics as understand- ing the folding of DNA molecules. In the field of geometry, numerical simulation has been used recently to discover new surfaces whose analytic form was too difficult to analyze directly. The simulations were understood by the use of computer graphics, and led to the explicit construction of infinite families of new examples. The modern computer is the first labora- tory instrument in the history of mathemat- ics. Not only is it being used increasingly for research in pure mathematics, but, equally important, the prevalence of scientific com- puting in other fields has provided the me
17 ers, although they are not often skilled software creators, develop their own software with the help of graduate students. The result meets researchers' minimum needs but typically lacks documentation and is designed for one purpose only. Such software is not Filly understood by any one person, making it difficult to maintain or transport to other computing environments. This means that the software often cannot be used for related projects, and the scientific community wastes time, effort, and money duplicating one another's efforts. In sections to follow we examine how this problem is being addressed by profes- sional associations, nonprofit groups, and corporations. Some disciplines are limited by available computer power because computers needed are not on the market. Some contemplated calculations in theoretical physics, quantum chemistry, or molecular dynamics, for example, could use computers with much greater capacity than any even on the drawing boards. In other cases, data gathering is limited by the hardware presently available. Most commercial computers are not designed to accommodate hardware and pro- grams that select out interesting information from observational data, and scientists who want such computers must build them. Another difficulty researchers encounter is in transmitting data over networks at high speed. For researchers such as global geophysicists who use data collected by satellite, a large enough volume of information can be sent in a short enough time, but transmission is unreliable. Researchers often encounter delays and incur extra costs to compensate for "noise" on high-speed networks. Technological solutions such as optical fiber and error-correcting coding are currently expensive to install and implement and are often unavailable in certain geographic regions or for certain applications. dium for communication between the math- ematician and the physical scientist. Here modern graphics plays a critical role. This interaction is particularly strong in materials science, where the behavior of liquid crystals and the shapes of complex polymers are being understood through a combination of theoretical and computational advances. In spite of all this, mathematics has been one of the last scientific disciplines to be computerized. More than other fields, it lacks instrumentation and training. This prevents the mathematician from using modern com- puting hardware and techniques in attacking research problems, and at the same time isolates him/her from productive communi- cation with scientific colleagues. Of course, mathematics is an important part of the foundation and intellectual basis of most of the methods that underlie all scientific use of computational machinery. To use today's high-speed computing ma- chines, new techniques have been devised. The need for new techniques is providing a serious challenge to the applied mathemati- cian, and has placed new and difficult prob- lems on the desk of the theorist; algorithms themselves have become an object of serious investigation. Their refinement and improve- ment have become at least as important to the speed and utility of high-speed comput- ing as the improvement of hardware. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
18 IN1?OElMATION COMMUNICATION AND COLI^BORATION AMONG RESEARCHERS TECHNOLOGY AND THE CONDUCT Current Use Researchers cannot work vv~thout access to collaborators, to OF RESEARCH instruments, to information sources and, sometimes, to distant computers. Computers and communication networks are increasingly necessary for that access. Three technologies are concerned with communications and collabora tion: word processing, electronic mail, and networks. Word processing and electronic mail are arguably the most pervasive of all the routine uses of computers in research communication. Electronic mail sending text from one computer user to another over the networks is replacing written See box on document and telephone communication among many communities of scientists, and is processing, page 19. changing the ways in which these communities are defined. Large, collaborative projects, such as oceanographic voyages, use electronic mail to organize and schedule experiments, coordinate equipment arrivals, and handle other logistical IF KITCHEN APPLIANCES WERE LIKE SOFTWARE If kitchen appliances were like programs, they would all look alike sitting on the counter. They would all be gray, featureless boxes, into which one places the food to be processed. The door to the box, like the box itself, is completely opaque. On the outside of each box is a general description of what the box does. For in- stance, one box might say: "Makes anything a meal"; another: "Cooks perfectly every time"; another: "Never more than 100 calories a serving." You can never be exactly sure what happens to food when it is placed in these boxes. They don't work with the door open, and the 200-page user's manual doesn't give any details. Working in a kitchen would be a matter of becoming familiar with the idiosyncrasies of a small number of these boxes and then laying to get done what you really want done using them. For instance, if you want a fried- egg sandwich, you might try the "Makes any- thing a meal" box, since a sandwich is a sort of meal. But because you know from past experience that this box leaves everything coated with grease, you use the "Never more than 100 calories" box to postprocess the output. And so on. The result is never what you really want, but it is all you can do. You aren't allowed to look inside the boxes to help you do what you really want to do. Each box is sealed in epoxy. No one can break the seal. If the box seems not to be working right, there is nothing you can do. Even calling the manufacturer is no help, because the box is not under warranty to be fit for any particular purpose. The manufacturers do have help lines, but not for help with broken boxe~rather to help you figure out how to use functioning boxes. But don't try to ask how your box works. The help-line people don't know, or if they do, they won't tell you. Several times a year you get a letter from the manufacturer telling you to ship them your old box and they will send you a new one. If you do so, you find yourself with a shinier box, which does whatever it did before a little faster, or perhaps it does a little more but since you were never sure what it did before, you cannot be sure it's better now. SOURCE: Mark Weiser, 1987. "Source Code," IEEE Com puter, Z0(~): 6~73.
19 details. With the advent of electronic publishing tools that help lay out and integrate text, graphics, and pictures, mail systems that allow interchange of complex documents will become essential. Networks range in size from small networks that connect users in a certain geographic area, to national and international networks. Scientists at different sites increasingly use networks for conversations by electronic mail and for repeated exchanges of text and data files. The Panel has identified two major trends in the way information technology is changing collaboration and communication in scientific research: · Information can be shared more and more quickly. For example, one of the first actions of the federal government after the discovery of the new high- temperature superconductors was to fund, through the Department of Energy's Ames Laboratory, the creation of a superconductivity information exchange. The laboratory publishes a biweekly newsletter on advances in high-temperature superconductivity research, available in both paper and electronic forms; the electronic version is sent out to some 250 researchers. · Researchers are making new collaborative arrangements. The technology of networks provides increased convenience and faster turnaround times often several completed message exchanges in one day. For shorter messages, special software allows real-time exchanges. DOCUMENT PROCESSING [An] area of significant change is document processing. This began in the 1960s with a few simple programs that would format typed text. In the context of UNIX* in the 1970s, these ideas led to a new generation of document processing programs and lan are constructing systems, such as the POST SCRIPT protocols, embodying these ideas. The NSF-sponsored EXPRES project, at the University of Michigan and Carnegie Mellon University, illustrates a serious effort to de velop a standard method of exchanging full scientific documents by network. Low-cost laser printers now make advanced document guages, such as SCRIBE and the UNIX-based preparation and printing facilities available to tools troths, eqn, tbl, and pie. The quintessence many people with workstations and personal of these ideas are Knuth's TeX and computers. It is now possible for everyone to METAEiONT systems, which have begun to submit high-quality, camera-ready copy di revolutionize the world's printing industry. rectly to publishers, thus speeding the publi In workstations, these ideas have produced cation of new results; however, it is no longer WYSIWYG (w~zzy-wig, or "what you see true Mat a well-formatted document can be iswhatyouget")systemsthatdisplayformat- trusted to have undergone a careful review ted text exactly as it will appear in print. and editing before being printed. International standards organizations are considering languages for describing docu ments, and some software manufacturers SOURCE: Peter J. Denning, 1987, Position Paper: Informa tion Technology in Computing. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH See box on collaboration, page 20.
20 INFORMATION AS Lederberg noted a decade ago (Lederberg, 1978), digital communication TECHNOLOGY AND allows scientists to define collegial relationships along the lines of specialized THE CONDUCT interests rather than spatial location. This is immensely beneficial to science as OF RESEARCH a whole, but causes some consternation among administrators who find more loyal to disciplines than to institutions. Technologies in the process of development show the networks' remarkable potential. Multimedia mail allows researchers to send a combination of still images, video, sound, and text. Teleconferencing provides simultaneous elec tronic links among several groups. Electronic chalkboards allow researchers to draw on their chalkboard and have the drawing appear on their computer and on the computers of collaborators across the country. Directory services, or "namese~vers," supply directories of the names and network addresses of users, processes, and resources on a given network or on a series of connected networks. Program distribution services include the supply of mathematical software to subscribers. A spectacular new technology is represented in the Metal Oxide Semiconductor Implementation System (MOSIS), a service that contracts for the manufacture of very large-scale integrated (VLSI) chips from circuit diagrams pictured on a subscriber's screen. Fabrication time is often less than 30 days. In one notable example, the researchers designing a radiotelescope in Australia designed custom chips for controlling the telescope. MOSIS returned the chips in a matter of days; the normal manufacturing process would have taken months and would have delayed the development of the instrument considerably. NEW FORMS OF COLLABORATION THROUGH THE NETWORKS The development of COMMON LISP (a pro ~arnming language) would most probably not have been possible without the electronic message system provided by ARPANET, the Department of Defense's Advanced Research Projects Agency network. Design decisions were made on several hundred distinct points, for the most part by consensus, and by simple majority vote when necessary. Ex cept for two one-day face-to-face meetings, all of the language design and discussion was done through the ARPANET message system, which permitted effortless dissemination of messages to dozens of people, and several interchanges per day. The message system also provided auto- matic archiving of the entire discussion, which has proved invaluable in preparation of this reference manual. Over the course of thirty months, approximately 3000 messages were sent (an average of three per day), ranging in length from one line to twenty pages... It would have been substantially more difficult to have conducted this discus- sion by any other means, and would have required much more time. SOURCE: Guy Steele, 1984. COMMON LISP: The Lan guage. Bedford, MA: Digital Press, pp. xi-xii. Reprinted with permission. Copyright Digital Press/Digital Equip- ment Corporation.
21 To share complex information (such as satellite images) over the networks, researchers will need to be able to send entire pictures in a few seconds. One technique that is likely to receive more attention in the future is data compres- sion, which removes redundant information and converts data and images to more compact forms that require less time to transmit. Among the most important of potential applications of information technology is the emergence of a truly national research network-that is, a set of connec- tions, or gateways, between networks to which every researcher has access. The National Science Foundation has announced its intention to serve as a lead agency in the development of such a network, beginning with a backbone, called NSFNET, that links the NSF-supported supercomputing centers, and widening to include other existing networks. Widespread access to networks will also offer much more than just commu- nications links. They can become what the network serving the molecular biology community aims to be: a full-fledged information system. Difficulties Encountered The principal difficulty with communicating across research communities via electronic mail and file transfer technologies is incompatibility. The networks were formed independently, evolved over many years, and are now numerous. Consequently, networks use different protocols, that is, different conventions for packaging data or text for transmission, for locating an appropriate route from sender to receiver over the physical network, and for signaling the start and stop of a message. For example, a physicist on the High Energy Physics network (HEPNET) trying to send data to a physicist on one of the regional networks would first have to ask "What network are you on?"; "How do I address you?"; and "What form do you want the information in?" In the gateway between two networks, the protocols of the first network must be removed from the message and the protocols for the second added. Under heavy traffic loads, the gateways can become bottlenecks. As a result, navigating from one network to a researcher on another is time-consuming, tiresome, and often unreliable; navigating over two networks to a researcher on a third is prohibitively complex. Text can frequently be moved from one word processing system to another only with significant loss of formatting information including the control of spacing, underlining, margins, or indentations. Graphics can only rarely be included with text. Such issues of compatibility may delay the expansion of electronic publishing as well as electronic proposal submission and review the goals of the National Science Foundation's EXPRES project. The issues are summarized succinctly by Denning: "Most word processors are inadequate for scientific needs: they cannot handle graphs, illustrations, math- ematics and layout, and myriad file formats make exchange extremely difficult. With so many experts and so much competition in the market, it is hard to win agreement on standards. There is virtually no electronic support for the remain- der of the process of scientific publication submission, review, publication, and THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
22 INFORMATION distribution. These issues can be expected to be resolved over the next fewyears, TECHNOLOGY AND as document interchange formats are adopted by standards organizations and THE CONDUCT incorporated into software revisions and equipment upgrades. However, the OF RESEARCH transition process will not be painless" (Denning, 1987, pp. 2~27). In addition, some networks limit use under certain circumstances; for in stance, one network bars communication among researchers at industrial laboratories. The fear is that corporations would use a research network for commercial profit or even for sales or marketing. The Panel believes such fear is misplaced and that networks should be open for all research communication. Bodices on pages 22-27 On the whole, the management of the networks is anarchic. Networks operate examine network use not as though they were a service vital to the health of the nation's research alternatives. community but as small fiefdoms, each with strong disciplinary direction, with little incentive to collaborate. The National Science Foundation has taken an early leadership role, with such initiatives as NSFNET, which addresses many of the current networking problems, and the EXPRES project, which establishes stan- dards for the electronic exchange of complex documents. Such efforts to provide integration and leadership are vital to increased research productivity. FROM A NETWORK TO AN INFORMATION RESOURCE PROTOTYPE: BIONET BIONET is a nonprofit resource for molec- ular biology computing that provides access to software, recent versions of databases rel- evant to molecular biology, and electronic communications facilities. Work is in prog- ress to expand BIONET as a logical network reaching molecular biologists throughout the research community worldwide. Many exist- ing physical networks are in use by molecu- lar biologists, and it is BIONET's aim to utilize them all. BIONET is working on plans to provide molecular biologists with access to one or more supercomputers or parallel processing resources. Special programs will be developed to provide molecular biologists with an easy interface to submit supercom- puter jobs. Especially active are the METHODS-AND- REAGENTS bulletin board (for requesting in- formation on lab protocols and/or experi mental reagents) and the RESEARCH-NEWS bulletin board, which has become a forum for posting interesting scientific develop- ments and also a place where scientists can introduce their labs and research interests to the rest of the electronic community. Bulletin boards have been instituted for the GenBank and EMBL nucleic acid sequence databases. Copies of messages on these bulletin boards are forwarded to the database staff members for their attention. These bulletin boards serve as a medium for discussing issues re- lating to the databases and as a place where users of the databases can obtain assistance. Along these same lines BIONET has developed the GENPUB program that facilitates submis- sion of sequence data and author-entered annotations in computer-readable form di- rectly to GenBank and EMBL via the elec- tronic mail network. The journals CELL and CABIOS have estab- lished accounts on BIONET and the Journal of Biological Chemistry and several others
23 INFORMATION STORAGE AND RETRIEVAL Current Uses How information is stored determines how accessible it is. Scientific texts are generally stored in print (in the jargon, in hard copy) and are accessible through the indices and catalogs of a library. Some texts, along with programs and data, however, are stored electronically on disks or magnetic tapes to be run in computers-and are generally more easily accessible. In addition, collections of data, known as databases, are sometimes stored in a central location. In general, electronic storage of information holds enormous advantages: it can be stored economically, found quickly without going to another location, and moved easily. One kind of database holds factual scientific data. The Chemical Abstracts Service, for example, has a library of the molecular structures of all chemical substances reported in the literature since 1961. GenBank is a library of known genetic sequences. Both the National Aeronautics and Space Administration and the National Oceanic and Atmospheric Administration have thousands of tapes holding data on space and the earth and atmosphere. will also soon be on board. Several journals have indicated an interest in publishing re- search abstracts on BIONET in advance of hardcopy articles. Annotated examples of program usage have been included into the HELP ME system. The examples, formatted to be suitable for print- ing out as a manual, cover the major uses of the BIONET software for data entry, gel man- agement, sequence, structure and restriction site analysis, cloning simulations, database searches, and sequence similarities and align- ments. A manual of standard molecular biol- ogy lab protocols has also been added to HELP ME for users to reference. One of BIONET's major goals is to serve as a focus for the development and sharing of new software tools. Towards achieving this goal, BIONET has made available to the com- munity a wide variety of important computer programs donated by a number of software developers. A collaborative effort has oc- curred between the BIONET staff and the software authors to expand the usefulness of important software by making it compatible with a number of hardware and user com- munity constraints. BIONET provides an increasing number of databases online: lists of restriction enzymes; a bank of common cloning vector restriction maps and complete vector sequences; a da- tabase of regular expressions derived from published consensus sequences; the search- able full text of a recent revision of "Genetic Variations of Drosophila melanogaster" by Dan L. Lindsley and E.H. Grell (the Drosophila "Red Books. Some of these can be used as input to search programs. BIONET invites curators of genetic and physical genome maps to use this resource for the collection, maintenance, and distribution of Weir data- bases. SOURCE: Roode et al., 1988. "New Developments at BlO- NET," Nucleic Acids Research, 16(5):1857-1859. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
24 INFORMATION A second kind of database, a reference database, stores information on the TECHNOLOGY AND literature of the sciences. For example, Chemical Abstracts Selvice has abstracted THE CONDUCT all articles published in journals of chemistry since 1970 and makes the abstracts OF RESEARCH available electronically. The National Lib racy of Medicine operates services that index, abstract, and search the literature database (known as MEDLARS). In addition, it distributes copies of the database for use on local computers and has developed a communications package, called GRATEFUL MED, that simplifies searching the major MEDLARS files over six million records through 1987. In addition to biomedicine and clinical medicine, the National Lib raIy of Medicine partially covers the literature of the disciplines of population control, bioethics, nursing, health administration, and chemistry. One of its most important databases, for instance, is TOXLINE, which references the chemical analysis of toxins. Information search services have grown up around these and other databases, including a number of commercial ones, and now constitute a substantial industry. A database, taken together with the procedures for indexing, cataloging, and searching it, makes up an information management system. Some potentials of information management systems have been predicted for years, beginning with BIRTH OF A NETWORK: A HISTORY OF BITNET (EXCERPTED) BITNET (Because It's Time NETwork) began as a single leased telephone line between the computer centers of The City University of New York (CUNY) and Yale University. It has developed into an international network of computer systems at over 800 institutions worldwide. Because membership is not re stricted by disciplinary specialty or funding ability, BITNET plays a unique role in foster ing the use of computer networking for scholarly and administrative communication both nationally and internationally. In 1981, CUNY and Yale had been using internal telecommunications networks to link computers of their own. The New York/ New Haven link allowed the same exchanges to take place between two universities. The founders of BITNET Ira Fuchs, then a CUNY vice chancellor, and Greydon Freeman, the director of the Yale Computing Center real ized that the fledgling network could be used to share a wide range of data. Furthermore, the ease and power of electronic mail showed new potential for cooperative work among scholars; collective projects could now be undertaken that would have been difficult or impossible if conducted by postal mail or by phone. Fuchs and Freeman approached the direc- tors of other academic computer centers with major IBM installations to invite them to become members of the new network. The plan of shared resources that BITNET offered included two proposals: a) that each institu- tion pay for its own communications link to the network; and b) that each provide facili- ties for at least one new member to connect. Software was used to create a store-and- forward chain of computers in which files, messages, and commands are passed on without charge from site to site to their final destination. BITNET became a transcontinen- tal network in 1982 when the University of California at Berkeley leased its own line to CUNY. Berkeley agreed to allow other Califor
25 Vannevar Bush's MEMEX (Bush, 19451. The box on pages 2029 illustrates a current working information management system that links texts and databases in genetics and medicine. Difficulties Encountered For all disciplines, both factual and reference databases promise to be significant sources of knowledge for basic research. But to keep this promise, a Pandora's box of problems will have to be solved. Difficulties encountered with factual databases, stated succinctly, are: the researcher cannot get access to data; if he can, he cannot read them; if he can read them, he does not know how good they are; and if he finds them good, he cannot merge them with other data. Researchers have difficulty getting access to data stored by other researchers. Such access permits reanalysis and replication, both essential elements of the scientific process. At present, with a few excep- tions, data storage is largely an individual researcher's concern, in line with the tradition that researchers have first rights to their data. The result has been a proliferation of idiosyncratic methods for storing, organizing, and indexing data, with one researcher's data essentially inaccessible to all other researchers. nia institutions to link to the network through its line, in return for some expense sharing. In 1984, IBM agreed to support CUNY and EDUCOM (a nonprofit consortium of col- leges, universities, and other institutions founded in 1964 to facilitate the use and management of information technology) in organizing a centralized source of informa- tion and services to accommodate the grow- ing number of BITNET users. EDUCOM set up a Network Information Center (BITNIC), whose ongoing functions include the han- dling of registration of new members; at the same time, CUNY established a Development and Operations Center (BITDOC), which de- velops tools for the network. BITNET's success (it is now in all fifty states) led to the formation of a worldwide network of computers using the same net- working software: in Europe and the Middle East (EARN, the European Academic Re- search Network), Canada (NetNorth), Japan, Mexico, Chile, and Singapore (all of which are members of BITNET). There is also active interest from other countries in the Far East, Australia and New Zealand, and South Amer- ica. Although political and funding consider- ations have forced their administrative segre- gation, BITNET, EARN, and NetNorth form one topologically interconnected network. Success has also meant some further structuring of what had once been essentially a buddy system. BITNET is now governed by a board of trustees elected by and from its membership. The members of the board each participate in various policy-making committees focusing on network usage, fi- nance and administration, BITNIC services and activities, and technical issues. What be- gan as a simple device for intercampus shar- ing is simple no longer. SOURCE: Holland Cotter, ~988. Birth of a network: A history of BITNET. CUNY/University Computer Center Communications, 14:~-10. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
26 INFORMATION Even if a researcher gets access to a colleague's data, he may not be able to read TECHNOLOGY AND them. The formats with which data are written on magnetic tape-like the THE CONDUCT formats used in word processing systems-vary from researcher to researcher, OF RESEARCH even within disciplines. The same formatting problems prohibit the researcher from merging someone else's data into his own database. In order either to read or to merge another's data, considerable effort must be dedicated to converting tape formats. Finally, when a researcher gets access to and reads another's database, he often has no notion of the quality of the data it contains. A number of proposals (see Branscomb, 1983, National Research Council, 1978) have been made for the creation of what are called evaluated databases, in which data have been verified by independent assessment. In fields such as organizational science or public health, the costs of collecting and storing data are so large that researchers often have to depend on case studies of organizations or communities to test hypotheses. Researchers in these fields have proposed combining data from many surveys into databases of national scope. If differences in research protocols and database formats can be resolved, such national databases can increase the quality and effectiveness of research. THE STUDY PANEL'S EXPERIENCE WITH ITS OWN ELECTRONIC MAIL IS INSTRUCTIVE. Most of the members of the Panel use electronic mail in their professional work; some use it extensively, exchanging as many as seventy messages in one day. At their first meeting, Panel members and staff decided it would be useful to establish electronic com- munication links for the Panel. Using a net- work to which he had access, one of the Panel members devised a distribution-list scheme for the Panel. He designed a system that would allow Panel members to exchange messages or documents easily by naming a common group "address." This group ad- dress would connect everyone by name from their own network. Panel members would not have to remember special codes or routes to other networks, but could use their own familiar network. Also, messages could be sent to one, several, or all of the Panel mem bers at once. Between December 1986 and March 1988, nearly 2,000 messages went out using the Panel's special electronic group address. In line with what has been found in systematic research on electronic mail by ad hoc task groups (Finholt, Sproull and Kiesler, 1987), most of the messages went from study staff managing the project to Panel members. Epically, staff used electronic mail to per- form coordinating and attentional functions, e.g., to structure meetings, to ask Panel members for information or to perform writ- ing tasks, and to provide members with prog- ress reports. In addition, some Panel mem- bers sent mail through other network chan- nels to each other; for instance, two Panel members exchanged electronic mail about computers in the oceanographic community through BITNET, ARPANET, and OMNET. Although previous research and our own
27 The primary difficulty encountered with reference databases is in conducting searches. Most information searches at present are incomplete, cumbersome, inefficient, expensive, and executable only by specialists. Searches are incomplete because databases themselves are incomplete-updating a database is difficult and expensive- and because information is stored in more than one database. Searches are cumbersome and inefficient because different databases are orga- nized according to different principles and cannot readily be searched except by commands specific to each database. Searches are expensive because access is expensive (as much as $300 per hour), because network linkages to the databases impose substantial surcharges, and because the inefficiency of the systems means that searches may have to be repeated. A difficulty common to both scientific and reference databases is a pressing need for new and more compact forms of data storage. Disciplines such as oceanography, meteorology, space sciences, and high energy physics have already gathered so much data that more efficient means of storage are essential; and others are following close behind. One solution seems to lie in optical disk storage, for which various alternative technologies are under development. Currently, these new techniques lack commonly accepted standards. informal observations agree in suggesting that the electronic group mail scheme helped the Panel to work more efficiently, the system was used much less extensively than had been originally envisioned. For example, when delivery of report drafts was crucial, the staff relied on overnight postal mail. Net- work service inadequacies and technical problems are partly to blame; for example, it took months before messages could be sent predictably and reliably to every Panel mem- ber. Because the networks do not facilitate access to service support (comparable to tele- phone system operators, for example), Panel members had to rely on their own resources to remedy any system inefficiencies. For ex- ample, changes to electronic mail addresses in the system could not be made after a few months, so that new addresses had to be added to individual messages. Such technical problems, though by no means insurmountable, were annoying. Anal ysis of a sample of messages received by Panel staff indicates that approximately 10 percent contained some complaint about de- lays, losses of material in transmission, or unavailability of the group mail system. Of- ten, documents were difficult to read because document formatting codes embedded in the document files were removed prior to trans- mission. A message legible on one system might be filled with unintelligible characters when received on another. At considerable difficulty, some Panel members converted messages received electronically to formats they could read using their text editors. Then they would type in their own revisions, which once again would have to be converted to plain formats to be sent back through the networks. This experience suggests that much needs to be done to make internetwork communication by groups more efficient and easier to use. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
28 INFORMATION Another difficulty is that stored data gradually become useless, either because TECHNOLOGY AND the storage media decay or the storage technology itself becomes obsolete. Data THE CONDUCT stored on variant forms of punched cards, on paper tape, or on certain magnetic OF RESEARCH tape formats may be lost due to the lack of reading devices for such media. Even if the devices still exist, some data stored on magnetic tapes will be lost as the See box on satellite- tapes age, unless tapes are copied periodically. Needless to say, such preserva derived data, page 30. lion activities often receive low priority. An important archival activity that also receives a low poorly is the conversion of primary and reference data from pre-computer days into machine readable form. In this regard, the efforts of the Chemical Abstract Service to extend their chemical substance and reference databases are praiseworthy. Another difficulty in storing information is private ownership. By tradition, researchers hold their data privately. In general, they neither submit their data to central archives nor make their data available via computer. Increasingly, however, in disciplines like meteorology and the biomedical sciences, submis sion of primary data to data banks has become accepted as a duty. In the field of economics, the National Science Foundation now requires that data collected with the support of the Economics Program be archived in machine readable HOW A LIBRARY USES COMPUTERS TO ADVANCE PRODUCTIVITY IN SCIENCE In 1985 the William H. Welch Medical Li- brary of the Johns Hopkins University began a unique collaboration with Dr. Victor A. McKusick, the Johns Hopkins University Press, and the National Library of Medicine to develop and maintain an online version of McKusick's book Mendelian Inheritance in Man (known as OMIM, for Online Mendelian Inheritance in Man). While the book contains 3,900 phenotypes (a specific disorder or sub- stance linked to a genetic disease) and up- dates are issued approximately every five years, OMIM currently describes more than 4,300 phenotypes and is updated every week. A gene map is available, keyed to the pheno- type descriptions. Any registered user worldwide can dial up OMIM and search its contents through a simple three-step process: 1) state the search in simple English (e.g., relationship between Duchenne muscular dystrophy and growth deficiency hormone); 2) examine the list of documents, which are presented in ranked order of relevance; and 3) select one or more documents to read in detail. Having selected a document, the searcher can determine through a single keystroke whether the phe- notype has been mapped to a specific chro- mosome. OMIM entries are also searchable in a related file, the Human Gene Mapping Library (HGML) at Yale University. By mid- 1988, researchers will be able to use the same access code to enter and search three related databases: HGML in New Haven, the Jackson Laboratory Mouse Map in Bar Harbor, and OMIM in Baltimore. OMIM is more than an electronic text. It is a dynamic database with many applications. Searching the knowledge base is only one of its uses. It can be used as a working tool. For example, at the last biennial international Human Gene Mapping conference in Paris (September 1987) the results of the commit- tees' deliberations were used to update and regenerate the database each evening. Every
29 form, and that any professional article citing program support be accompanied by a fully documented disk describing the underlying data. In the social sciences, a 1985 report of the National Research Council's Committee on National Statistics recommended both that "sharing data should be a regular practice" and that a "comprehensive reference service for computer-readable social science data should be developed." (Fienberg, Martin, and Straf, 1985.) In addition, peer review of articles and proposals has been constrained by the difficulty of gaining access to the data used for analysis. If writers were required to make their primary data available, reviewers could repeat at least part of the analyses reported. Such review would be more stringent, would demand more effort from reviewers, and raises a number of operational questions that need careful consideration; but it would arguably lead to more careful checking of published results. Underlying the difficulties in information storage and retrieval are problems in the institutional management of resources. Who is to manage, maintain, and update information services? Who is to create and enforce standards? At present the research community has three alternative answers: the federal government, which manages such resources as MEDLINE and the GenBank; professional morning, the conferees had fresh files to consult. This information was available worldwide at the same time. In the future, these conferences can take place electroni- cally as frequently as desired by the scientific community. OMIM is a node in an emerging network of biotechnology databases, data banks, tissue repositories, and electronic journals. In a few years, it may be possible to enter any of these files from any one of the related files. Through this kind of linkage, OMIM may serve as a bridge between the molecular ge neticists and the clinical geneticists. Cur- rently, these databases are primarily text or numerical files. As technology improves and becomes ubiquitous, and as network band- width expands, databases will routinely in- clude visual images and complex graphics. It may also be possible to jump from one point within a file to relevant and related points deep within other files. OMIM and its future manifestations result from collaborative efforts and support from diverse groups. Dr. Victor A. McKusick is the scientific expert responsible for the knowl- edge base; his editorial staff adds new mate- rial and updates the database. The National LibraIy of Medicine developed OMIM as part of its Online Reference Works program. The Welch Medical Library provides the comput- ers, network gateways, database maintenance and management, and user support. finally, the Howard Hughes Medical Institute provides partial support for access, maintenance, and future development of the system. The Welch Library must work closely with both the author and the users to represent research knowledge in ways that best suit the users' purposes. It must be able to respond quickly to the changing needs of the author and the users. It is in a unique position to study and engineer a new kind of knowledge utility. The OMIM effort is part of a project to develop a range of online texts and databases in genetics and internal medicine, carried out in the Library's Laboratory for Applied Re- search in Academic Information. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
30 INFORMATION societies, such as the American Chemical Society, which manages the Chemical TECHNOLOGY AND Abstracts Se~v~ce, and the American Psychological Association, which manages THE CONDUCT Psychological Abstracts; and private for-profit enter ses such as the Institute for OF RESEARCH Scientific Information. NEW OPPORTUNITIES: APPROACHING THE REVOLUTION ASYMPI OTICALLY The information technologies and institutions of the past that revolutionized scholarly communication writing, the mails, the library, the printed book, the encyclopedia, the scientific societies, the telephone-made information more accessible, durable, or portable. The advent of digital information technology and management Continues the revolution, suggesting a vision, still somewhat HANDLING SATELLITE-DERIVED OBSERVATIONAL DATA At present both the National Aeronautics and Space Administration (NASA) and the Na- tional Oceanic and Atmospheric Administra- tion (NOAA) operate earth-orbiting satellites and collect data from them. Both NOAA and NASA store large volumes of primary data from the satellites on digital tape. Both have faced problems, although each organization's problems are different. NOAA, until 1985, had a system that, for purposes of satellite oper- ations, stored environmental satellite data on a Terabit Memory System (TBM). The TBM technology was used from 1978 to 1985, at which time it became obsolete; the more than 1,000 tapes of data collected have been reduced by about 40 percent in transforming most of the useful materials to standard dig- ital tape for storage. NASA has used standard digital tape and disk storage technologies and, since ceding the LAND SAT satellites, has re- corded and saved data from its research earth-observing satellites as needed. Both NASA and NOAA face real problems in making data accessible for scientific analysis. NASA has expended time, effort, and money building a number of satellite data distribu- tion systems that provide digital data archives and a catalog of satellite data holdings, as well as images and graphical analyses produced from satellite data. For example, NASA's Na- tional Space Science Data Center received and filled some 2,500 requests for tapes, films, and prints in the first half of fiscal 1988, and also provided network access to specific databases. NOAA has been largely unable to get financial support for its proposed satellite data management systems. Selection of needed information from among the data available remains a problem. Some pilot sys- tems under development at both agencies succeed in leading the user through a catalog, but fail to contain much valuable new infor- mation and data. Both agencies continue to hold great amounts of environmental satellite data in their permanent archives that are difficult to access, expensive to acquire, and as a result are ignored by many researchers who could benefit from their use. Much re- mains to be done to improve access to im- portant satellite-derived data.
31 incoherent, of new ways of finding, understanding, storing, and communicating information. Some technologies involved in the revolution are · Simulations of natural (or hypothesized) phenomena; · Visualization of phenomena through graphical displays of data; and · Emerging use of knowledge-based systems as "intelligent assistants" in managing and interpreting data. Simulations allow examination of hypotheses that may be untestable under normal conditions. Plasma physicists simulate ways of holding and heating a hot, turbulent plasma until it reaches the temperatures necessary for fission. Cosmol- ogists simulate the growth of galaxies and clusters of galaxies in an infant universe. Engineers simulate the growth of fractures in a metal airplane wing or nuclear reactor. Chemists' simulations may someday be sophisticated enough to screen out unproductive experiments in advance. Drug companies are consid- ering the use of simulations to design drugs for a particular function, for example, a non-addictive drug that also kills pain. In general, simulations extend research- ers' ability to model a system and test the model developed. Visualization techniques turn the results of numerical computations into images. The remarkable ability of the human brain to recognize patterns in pictures allows faster understanding of results in solutions to complex problems, as well as faster ways of interacting with computer systems and models. For USES OF SIMULATION IN ECONOMETRICS Simulation techniques take estimated rela- tionships or numerical models that appear to be consistent with observations of actual be- havior and apply them to problems of pre- dicting the changes induced by time, or of measuring the relationships among sets of economic variables. For example, simulation models have been utilized to study the effects of oil price changes on the rate of inflation, proposed policies regarding labor law, and future interest rates. In addition, exchanges among groups of agents in an economy have been used in dynamic input-output analysis to make inferences about the feasible or likely future course of economic growth in the entire economy or within specific indus- tries or regions. There is a growing interest in investigating the properties of models that represent the workings of firms, markets, and whole econ- omies as nonlinear adaptive systems. Re- cently this has begun to expand the reliance placed by essentially theoretical researchers upon extensive applications of numerical simulation methods. Finally, in both exten- sions of the line of inquiry just noted and in other contexts, direct simulation of stochastic processes via Monte Carlo techniques can be used by economists to gain insights into the properties of stochastic systems that resist deductive techniques due to their (current) analytic intractability. SOURCE: Paul A. David and W. Edward Steinmuller, 1987. Position paper: "The Impact of Information Technology Upon Economic Science," p. 21. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH See box on simulation, below. See box on visualization, pages 32-33.
32 INFORMATION instance, while small molecules have a few dozen atoms and are easy to visualize, TECHNOLOGY AND large molecules, like proteins, have tens of thousands of atoms. A useful physical THE CONDUCT model of the structure of a protein might stand six feet high and cost several OF RESEARCH thousand dollars. Moreover a researcher could not slice a physical model to see how it looks inside; with visualization techniques, he could. Visualization is the single advanced technology most widely mentioned by Panel members and position paper writers. (For a critical analysis of opportunities in visual imaging, see McCormick, DeFanti, and Brown, 1987.) Intelligent assistants can serve as interfaces between the researcher and the computer. Just as computers increase our power to collect, store, filter, and retrieve data, they can also help us reason about the data. Over the last three decades, computer scientists have been developing methods for symbolic infor mation processing or artificial intelligence. While these programs are not fully intelligent in the sense that humans are, they allow computers to solve problems that are not reducible to equations. Artificial intelligence programs have been written for many scientific tasks. These tasks are not expressible in terms of numerical operations alone, and, thus, require symbolic computation. The programs fall into a general class, called expert systems, because they are programmed to reach decisions in much the same way as experts do. Expert systems have been successfully applied to industrial areas such as manufacturing and banking. To date, only a few prototype systems have been written for scientific research. Prototypes include programs that assist in chemical synthesis planning, in planning experiments in molecular genetics, in interpreting mass spectra of organic molecules, in trou VISUALIZATION IN SCIENTIFIC COMPUTING Scientists need an alternative to numbers. A technical reality today and a cognitive im perative tomorrow are the use of images. The ability of scientists to visualize complex com putations and simulations is absolutely es sential to ensure the integrity of analyses, to provoke insights, and to communicate those insights with others. Several visually oriented computer-based technologies already exist today. Some have been exploited by the private sector, and off-the-shelf hardware and software can be purchased; others require new develop ments; and still others open up new research areas. Visualization technology, well inte grated into today's workstation, has found practical application in such areas as product design, electronic publishing, media produc- tion and manufacturing automation. Man- agement has found that visualization tools make their companies more productive, more competitive, and more professional. So far, however, scientists and academics have been largely untouched by this revolu- tion in computing. Secretaries who prepare manuscripts for scientists have better inter- active control and visual feedback with their word processors than scientists have over large computing resources that cost several thousand times as much. Traditionally, scientific problems that re- quired large-scale computing resources needed all the available computational power
33 bleshooting particle beam lines for high energy physicists, and in automated theory formulation in chemistry, physics, and astronomy. The methods needed to assist with complex reasoning tasks are themselves the subject of considerable research in such fields as computer science, cognitive science, and linguistics. Research in these fields, in turn, is producing tools that facilitate research in other disciplines. As these methods are used more widely in the future, some experts predict the conduct of research will change dramatically. Intelligent assistants, in the form of software, can carry out complex planning and interpretation tasks as instructed, leaving humans free to spend time on other tasks. fallen these reasoning programs are coupled to systems with data-gathering capabilities, much of the drudgery associated with research planning, data collection, and analysis can be reduced. Research laboratories and the conduct of research will become even more productive. Men every researcher has intelligent assistants at his/her disposal and when the functions of these assistants are interlinked, science will expand the frontiers of knowledge even more rapidly than it now does. Future technologies will provide other forms of research support. Programs that recognize and follow natural-language commands, like "Give me the data from this file," can simplify interaction between the researcher and computer systems. Spoken-language recognition offers the advantage of hands-free inter- action. Speech production, in which computers generate connected sentences in response to instructions, will, according to one author, lead to a revolutionary expansion in the use of computers in business and office environments (Koening, 1987). A variety of manipulative interfaces of different kinds are under active to perform the analyses or simulations. The ability to visualize results or guide the calcu- lations themselves requires substantially more computing power. Electronic media, such as videotapes, laser disks, optical disks, and floppy disks, are now necessary for the publication and dissemina- tion of mathematical models, processing al- gorithms, computer programs, experimental data, and scientific simulations. The reviewer and the reader will need to test models, evaluate algorithms, and execute programs themselves, interactively, without an author's assistance. Scientific publication needs to be extended to make use of visualization-com- patible media. Reading and writing were only democra- tized in the past 100 years and are the ac cepted communication tools for scientists and engineers today. A new communication tool, visualization, in time will also be democ- ratized and embraced by the great research- ers of the future. The introduction of visualization technol- ogy will profoundly transform the way sci- ence is communicated and will facilitate the commission of large-scale engineering pro- jects. ~sualizabon and science go hand in hand as parkers. No one ever expected Gutenberg to be Shakespeare as well. Perhaps we will not have to wait 150 years this time for the ge- niuses to catch up to me technology. SOURCE: B. H. McCormick, T. A. DeFanti, and M. D. Brown, 1987. Visualization in Scientific Computing (NSF Report). Computer Graphics 21(6). ACM SIGGRAPH: New York, Association for Computing Machinery. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
34 INFORMATION exploration (Foley, 1987). For example, the "data glove" is a glove on a computer TECHNOLOGY AND screen that is an image of a specially-engineered glove on a researcher's hand. THE CONDUCT The data glove follows the motions of the researcher's hand, permitting a OF RESEARCH researcher, for instance, to manipulate a molecule directly on screen. When the data glove is coupled with feedback devices in the researcher's glove, a researcher can "feel" the fit between two molecular structure surfaces. The Panel believes that the mature and emerging information technologies, taken together, suggest a vision of new approaches to scientific and engineering research. The vision focuses on an open infrastructure for research support and communication among researchers, along with the services for maintaining this See bodices on pages 35~1. infrastructure. Below are several examples of parts of the vision and of forms the vision could take. We discuss further steps in the report's final section on recommendations. INSTITUTIONAL AND BEHAVIORAL IMPEDIMENTS TO THE USE OF INFORMATION TECHNOLOGY IN RESEARCH Underlying many of the difficulties we have discussed in the use of information technology in research are institutional and behavioral impediments. We have identified six such impediments that seem to affect research in most or all disciplines: MOLECULAR GRAPHICS The use of interactive computer graphics to gain insight into chemical complexity be- gan in 1964. Interactive graphics is now an integral part of academic and industrial re- search on molecular structures and interac- tions, and the methodology is being success- fully combined with supercomputers to model complex systems such as proteins and DNA. Techniques range from simple black- and-white bit-mapped representations of small molecules for substructure searches and synthetic analyses to the most sophisti- cated 3D color stereographic displays re- quired for advanced work in genetic engi- neering and drug design. The attitude of the research and develop- ment community toward molecular model- ing has changed. What used to be viewed as a sophisticated and expensive way to make pretty pictures for publication is now seen as a valuable tool for the analysis and design of experiments. Molecular graphics comple- ments crystallography, sequencing, chroma- tography, mass spectrometIy, magnetic res- onance, and the other tools of the experimen- talist, and is an experimental tool in its own right. The pharmaceutical industry, espe- cially in the new and flourishing fields of genetic and protein engineering, is increas- ingly using molecular modeling to design modifications to known drugs and to propose new therapeutic agents. SOURCE: B. H. McCormick, T. A. DeFanti, and M. D. Brown, 1987. Visualization in Scientific Computing (NSF Report). computer Graphics 21(6). ACM SIGGRAPH: New York, Association for Computing MachineIy.
35 (1) Issues of costs and cost sharing; (2) The problem of standards; (3) Legal and ethical constraints; (4) Gaps in training and education; (5) Risks of organizational change; and (6) Most fundamental, the absence of an infrastructure for the use of informa- tion technology. Issues of Costs and Cost Sharing Many forces drive developments in information technology and its application to research. The result of these developments is constantly increasing requirements for higher performance computer and communications equipment, making current equipment obsolete. Universities and other research organizations are spending increasing fractions of their budgets on information technology to maintain competitive research facilities and to support computer-related instruction. At a number of private research universities, for example, tuition has increased faster than inflation for a number of years, in part to cover some of these costs. It is unrealistic to rely on such funding sources to cover further cost increases that will be required to build local network infrastructures. A related issue is who will pay for the costs of research computing support. Historically, such costs have been partially recovered by bundling them into charges for use of time-shared mainframe computers. As usage has moved from campus mainframes to other options (ranging from supercomputer centers to workstations and personal computers), this source of revenue has been lost, while the needs for administrative staff and sunnort personnel for consulting, RESEARCH ON INTEGRATED INFORMATION SYSTEMS Nearly a decade ago the Association of American Medical Colleges (AAMC) recog- nized the strategic importance of informa- tion technology to the conduct of biomedical research. In response to a study released by the AAMC in 1982, the National Library of Medicine has supported eleven institutions in efforts to develop strategic plans and proto- types of an Integrated Academic Information Management System (L\IMS). The objective of L\IMS is to develop the institutional informa- tion infrastructure that permits individuals to access information they need for their clinical or research work from any computer terminal, ~ ~, wherever and whenever it is needed, pull that information into a local environment, and read, modify, transform it, or otherwise use it for many different purposes. Several pilot prototype models have emerged. The Baylor Medical College is devel- oping a "virtual notebook," a set of tools for researchers to collect, manipulate, and store data. Georgetown Medical Center has a model called BIOSYNTHESIS that automatically routes a user's query from one database to another. The knowledge sector development of a comprehensive patient management clinical decision support system called HELP is the LAIMS project focus at the University of Utah; and Johns Hopkins University is devel- oping a knowledge workstation. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
36 INFORMATION training, and documentation have continued. Efforts to move research support TECHNOLOGY AND into indirect cost categories have not succeeded as many research institutions THE CONDUCT and universities face caps on indirect cost rates and have no room to accommo OF RESEARCH date new costs. Advances in communications and computing generate new services that require subsidy during the first years of their existence if they are to be successfully tested. This is particularly true of network-related services. Building services into a national network for research will require significant federal, state, and institutional subsidy, which cannot be recovered from user service charges until large-scale connectivity has been achieved and services are mature. Sources for these subsidies must be determined. Methods used for cost recovery can have significant impacts on usage. Two alternatives are to charge users for access to services or to charge users for the amount of service used. Networks such as BITNET have grown substantially in connectivity and use because they have fixed annual institutional charges for membership and connection, but charge no fees for use. Use-insensitive charge methods (often referred to as the library model) are attractive to institutions because costs can be treated as infrastructure costs and are predictable. Charges A REASONABLE MODEL Although the Panel is unaware of anvthin~ precisely like the vision it holds for sharing information, proposals for the newly estab- systems; fished National Center for Biotechnology In formation (NCBI) at the National Library of Medicine may come close. The NCBI pro poses to facilitate easy and effective access to a comprehensive array of information sources that support the molecular biology research community. Many, but not all, of these sources are electronic. They encompass raw data, text, bibliographic information, and graphic rep resentations. Ownership and responsibility for development and maintenance of these sources range from individual researchers to departmental groups, institutes, professional organizations, and federal agencies. Each was designed to serve specific needs and audiences, created in many different hard ware configurations and software applica tions. Consequently, NCBI's mission requires experts in both information technologies and biotechnologies. NCBI staff must · Provide directories to knowledge sources; · Create useful network gateways between · Assist users in using databases effec- tively; · Reduce incompatibilities in retrieval ap- proaches, vocabulary, nomenclature and data structures; · Promote standards for representing in- forrnation that will reduce redundancy and detect inconsistencies or errors; · Provide useful tools for manipulating and displaying data; and · Identify new analytic and descriptive services and systems. Some computing-intensive universities (e.g., Carnegie Mellon University and Brown University) and medical centers (e.g., Johns Hopkins University, the University of Utah, Baylor University, and Duke University) are also attempting to develop instances of the · - vlslon.
37 for amount of use, in contrast, can inhibit usage; a major inhibitor to use of commercial databases for information searches, for instance, is the unpredict- ability of user charges for time spent searching the databases. During the development of network services, it seems desirable to recover costs through fixed access charges wherever possible. The Problem of Standards The development of standards for interconnec- tion makes it possible for every telephone in the world to communicate with every other telephone. The absence of commonly held and implemented standards that would allow computers to communicate with every other com- puter and to access information in an intuitive and consistent way is a major impediment to scholarly communication, to the sharing of information re- sources, and to research productivity. Standards for computer communication are being developed by many groups. The pace of these efforts is painfully slow, however, and the process is intensely political. The technologies are developing faster than our ability to define standards that can make effective use of them. Further, standards that are developed prematurely can inhibit technological progress; standards developed by one group (for example, an equipment vendor) in isolation create islands of users with whom effective communication is difficult or impossible. Development of standards not only improves efficiency but also reduces costs. Open interconnection standards permit competition among vendors, which leads to lowered costs and improved capabilities. Proprietary standards restrict competition and lead to increased costs. Federal government procurement rules have been major sources of pressure on vendors to support open standards. Current mechanisms for reaching agreement on standards need examination and significant improvement. Such examination needs input from user groups, which will have to exert pressure on standards bodies and on the vendors who are major players in the standard-setting process. Legal and Ethical Constraints The primary legal and ethical constraints to wider use of information technology are issues of the confidentiality of, and access to, data. The following discussion will only illustrate these issues; we believe they are too important and too specialized to be adequately addressed in a document as general as this one. In the report's final section, we recommend the establishment of a body that will study and advise on these issues. Information technology has made possible large-scale research using data on human subjects. For the first time, researchers can merge data collected by national surveys with data collected in medical, insurance, or tax records. For instance, in public health research, long-term studies of workers exposed to specific hazards can be carried out by linking health insurance data on costs with Internal Revenue data on subsequent earnings, Social Security data on disability payments, and mortality data, including date and cause of death (Steinwachs, 1987, Position Paper: Information Technology and the Conduct of Public Health THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
38 INFORMATION Research). The scientific potential of such data mergers is enormous; the actual TECHNOLOGY AND use of mergers is small, primarily because of concerns about privacy and THE CONDUCT confidentiality. OF RESEARCH The right to confidentiality of personal information is held strongly in our society. Concerns about the conflict between researchers' needs and citizens' rights have been extensively explored by a number of scientific working groups, under the auspices of both governmental agencies (such as the Census Bureau) and private groups (for example, the National Academy of Sciences). As more information about individuals is collected and cross-linked, fears are raised that determined and technically sophisticated computer experts will be able to identity specific individuals, thus breaching promises of confidentiality and privacy of information. The Census Bureau, in particular, fears that publicity surrounding such breaches of confidentiality will undermine public confidence and inhibit cooperation with the decennial censuses. Although there have been discussions and legislative proposals for outright restrictions on mergers of government survey or census data, a reasonable alternative seems to be to impose severe penalties on researchers who breach confidentiality by making use of information on specific individuals. The issue here, as elsewhere in public policy problems, is the balance of benefits against costs. Does better research balance the risk of compromising perceived funda mental rights to privacy? This is a topic that will need to be debated among both researchers and concerned constituencies in the general public. A related issue is that of acceptable levels of informed consent for human subjects. At present, consent is usually obtained from each respondent to a survey; it is described as informed because the respondent understands what will be done with responses usually, that they will be used only for some specific research project. Data-collecting organizations protect the confidenti THE FAR SIDE OF THE DREAM: THE LIBRARY OF THE FUTURE "Can you imagine that they used to have libraries where the books didn't talk to each other?" [Marvin Minsky, MIT] The libraries of today are warehouses for passive objects. The books and journals sit on shelves, waiting for us to use our intelligence to find them, read them, interpret them, and cause them finally to divulge their stored knowledge. "Electronic" libraries of today are no better. Their pages are pages of data files, but the electronic page images are equally passive. Now imagine the library as an active, intel- ligent "knowledge server." It stores the knowledge of the disciplines in complex knowledge structures (perhaps in a formal- ism yet to be invented). It can reason with this knowledge to satisfy the needs of its users. The needs are expressed naturally, with fluid discourse. The system can, of course, retrieve and exhibit (the electronic textbook). It can collect relevant information; it can summa- rize; it can pursue relationships. It acts as a consultant on specific prob- lems, offering advice on particular solutions, justifying those solutions with citations or with a fabric of general reasoning. If the user
39 ality of the information obtained from respondents, but guarantee only that information about specific individuals will not be released in such a way that they can be identified. The extent to which informed consent can be given to unknown future uses of survey data, in particular to their merger with other data sources, is of great concern to survey researchers. Controlling the eventual uses of merged, widely distributed data sets would be difficult. Another concern that needs to be addressed is one of responsibility in computer-supported decision making. Scientists, engineers, and clinicians more and more frequently will use complex software to help analyze and interpret their data. Who then is morally and legally responsible for the correctness of their interpretations, and of actions based on them? Experiments involving dangerous materials or human lives may soon be controlled by computers, just as many commercial aircraft landings are at present. Computers may be capable of faster or more precise determinations in some situations than humans. But software designers lack strong guidelines on assignment of responsibility in case of malfunction or unforeseen disaster, and lack the expertise to guarantee against malfunctions or disasters. With complex software overlaid on complex hardware, it is impossible to prove beyond a doubt in all circumstances that both hardware and software are performing precisely as they were specified to perform. Gaps in Training and Education The training and education necessary for using information technology are lacking. Two decades ago many researchers dealt with computers only indirectly through computer programmers who worked in data processing centers. The development of information technology has brought computing into the researcher's laboratory and office. As a result, the level of computing competence expected of researchers, their support staff, and their students has increased manyfold. can suggest a solution or a hypothesis it can check this, even suggest extensions. Or it can critique the user viewpoint, with a detailed rationale of its agreement or disagreement. . . . The user of the Library of the Future need not be a person. It may be another knowledge system that is, any intelligent agent with a need for knowledge. Such a Library will be a network of knowledge sys- tems, in which people and machines collab- orate. Publishing is an activity transformed. Au- thors may bypass text, adding their incre- ment to human knowledge directly to the knowledge structures. Since the thread of responsibility must be maintained, and since there may be disagreement as knowledge grows, the contributions are authored (inci- dentally allowing for the computation of roy- alties for access and use). Knowledge base maintenance ("updating") itself becomes a vigorous part of the new publishing industry. SOURCE: Edward A. Feigenbaum, 1986. Autoknowledge: From file servers to knowledge servers. In: Med~info 86. R. Salarnon, B. Blum, and M. Jorgensen, eds. New York: Elsevier Science Publishers B.V. (North-Holland). THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
40 INFORMATION Computers are changing what students need to learn. Undergraduate students TECHNOLOGY AND of chemistry, for example, need more than the standard courses in organic, THE CONDUCT inorganic, analytic, and physical chemistry; in the view of many practicing OF RESEARCH chemists, they should also have courses in calculus, differential equations, linear algebra, and computer simulation techniques, and through formal courses or practical research experience, should be competent in mathematical reasoning, electronics, computer programming, numerical methods, statistical analysis, and the workings of information management systems (Counts, 1987, Position Paper: The Impact of Information Technologies on the Productivity of Chemistry). Neither students nor researchers can obtain adequate training and education through one-time training courses. Because the numbers of new tools are multiplying, researchers need ways to continuously learn about, evaluate, and, if necessary, adopt these new tools. Using commercial programs and tutorial systems only partly alleviates the problem because the technologies often change faster than such supports can accommodate to the changes. Instructors in the uses of information technologies within the disciplines are rare. Senior research ers are especially hard hit. The Panel took no formal survey, but informal discussions suggest that most senior researchers have had exposure to no more than a one-semester programming course and have few of the skills needed to evaluate and use the available technology. For all researchers, learning advanced computing means taking a risk. They must interrupt their work and pay attention to something new and temporarily unproductive. They must become novices, often where sources of appropriate instruction and help are unclear or inaccessible. The investment of time and level of frustration are likely to be high. Understandably, many researchers cannot find the time and the confidence to learn technical computing; some justify their DOCUMENTS AS LINKED PIECES: HYPERTEXT The vision of computing technology revo lutionizing how we store and access knowl edge is as old as the computing age. In 1945 Vannevar Bush proposed MEMEX, an electro optical-mechanical information retrieval sys tem that could create links between arbitrary chunks of information and allow the user to follow the links in any desired manner. In the early 1960s, Ted Nelson introduced "hyper text," a fonn of Consequential writing: a text branches and allows choices to the reader, best read at an interactive screen. In 1968, Doug Englebart demonstrated a simple hy pertext system for hierarchically-structured documents-that is, a list of sections, each of which decomposes into a list of subsections, each of which decomposes into a list of paragraphs, and so on to which annotations could be added during a multiple-workstation conference. Today hypertext refers to infor- mation storage in which documents are pre- served as networks of linked pieces rather than as a single linear string of characters; readers can add links and follow links at will. Nelson's XANADU system is perhaps the most ambitious hypertext system proposed. XANAI)U would make all the world's knowl- edge accessible in a global distributed data- base to which anyone can add information,
41 choices with negative attitudes, for example: "I get enough communications as it is; I don't need a computer network," or "If I put my data on the computer, others will steal it," or "We are doing fine as things are; why change at this point?" Given these natural but negative attitudes, organizations are sometimes slow in responding to demands for new information technologies. Some research orga- nizations view these attitudes as unchangeable and wait to introduce advanced computing until existing researchers move or retire. Others are actively replacing personnel or creating new departments for computational researchers. Still others are attempting to change attitudes by giving researchers the necessary time and support systems. While we have no data on changes in productivity, there is some evidence that in organizations following the latter course, existing researchers at all ranks can achieve as high computing competence as new personnel (Kiesler and Sproull, 1987). Because people are now being introduced to computing skills at earlier stages of schooling, the lag in computer expertise is disappearing. Over time, alterna- tives to personal expertise in the form of user-friendly software or individual assistance from specialists will also develop. plunks of Organizational Change Changing an organization to make way for advanced information technology and its attendant benefits entails real risks. Administrators and research managers are often reluctant to incur the costs fi- nancial, organizational, behavioral-of new technology. In some cases, adminis- trators and research managers relegate computer resources-hardware, soft- ware, and people-based support services- to a lower priority than the procure- ment and maintenance of experimental equipment. The result can be a long-term suppression of the development and use of the tools of information technology. and in which anyone can browse or search for information. A document is a set of one or more linked nodes of text, plus links to nodes already in the global database; a document may be mostly links, constructed out of pieces already in the database. Users pay a fee proportional to the number of characters they have stored. Anyone accessing an item in the global database pays an access charge, a portion of which is returned to the owner as a royalty. Individuals can store private docu- ments mat cannot have public links pointing to them and can attach annotations to public documents that become available to everyone reading those documents. Documents can be composed of different parts including text, graphics, voice, and video. INTERMEDL\, a hypertext system with some of these proper- ties, has been implemented at Brown Univer- sibr and has been used to organize informa- tion in a humanities course for presentation to students. Small-scale hypertext systems, such as Apple's Hypercards for the Macin- tosh, are available on personal computers; their promoters claim these systems will change information retrieval as radically as spreadsheets changed accounting a few years ago. SOURCE: Peter and Dorothy Denning, personal commu nication, 1987. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH See box on electronic laboratory notebook, page 42.
42 INFORM`\TION In other cases, administrators are misled into underestimating the time and TECHNOLOGY AND resources required to deploy new information technology. Efforts to develop TElE CONDUCT effective networks have been insufficiently supported by government planners OF RESEARCH and research institution administrators, who have been led to assume that technology and services to provide network access are easily put in place. Some administrators have promoted change, but without adequate planning for the resources or infrastructure needed to support users. Problems such as these are exacerbated by overly optimistic advice given the administrators by technological enthusiasts. This particular impediment probably cannot be overcome. It can, however, be alleviated by establishing collaborative arrangements to develop plans for and share the costs of change. EDUCOM, for example, is a consortium of research universities with large computing resources that promotes long-range planning and sharing of resources and experiences. Absence of Infrastructure Most fundamental of all the institutional and behavioral impediments to the use of information technology is the absence of an infrastructure that supports that use. Just as use of a large collection of books is made possible by a building and shelves in which to put them, a cataloguing system, borrowing policies, and reference librarians to assist users, so the use of a collection of computers and computer networks is supported by the existence LEGAL CONSTRAINTS TO AN ELECTRONIC VERSION OF A LABORATORY NOTEBOOK Today, the paper laboratory notebook is the only legally supportable document for patent applications and other regulatory pro cedures connected with research. Some or ganizations, however, routinely distribute electronic versions of laboratory notebook information to managers and other profes sionals who would otherwise have to visit the research site physically or request photo copies. The benefits of legal electronic note books are speculative but attested to by those using them informally (Liscouski, 1987~. First, they would help give researchers access to information or expertise that is otherwise lost because people have moved or reside in dif ferent departments. Second, they would al low research managers and researchers to observe and compare changes in results over time. Third, they would eliminate or make easier the assembly of paper versions of doc- uments needed for government agencies. The barrier to an electronic notebook is social its lack of acceptance as a legal document. Such acceptance could take place if legal conditions for an electronic system storage, format, security were delineated. However, researchers, scientific associations, and gov- ernment agencies have failed to develop such guidelines. This failure is probably connected to the traditions of privacy in laboratory note- books, to the inability to forecast how an electronic system would stand up in court, (and related to that, the risk and unacceptable cost to any single institution of developing a system), and to the uncertainty of the ulti- mate benefits on some widely accepted index of research effectiveness. Whatever the rea- sons, the end result is that a complete and accepted electronic notebook remains unde- veloped.
43 of institutions, services, policies, and experts in short, by an infrastructure. On the whole, information technology is inadequately supported by current infra- structures. An infrastructure that supports information technology applications to re- search should provide · Access to experts who can help; · Ways of supporting and rewarding these experts; · Tools for developing software, and a market in which the tools are evaluated against one another and disseminated; · Communication links among researchers, experts, and the market; and · Analogs to the library, places where researchers can store and retrieve information. Several different kinds of experts in information technology help researchers. Some are specialists in research computing. Some are programmers who develop and maintain software specific to research. Others are specialists who carry out searches. Still others are "gatekeepers," who help with choices of software and hardware. Gatekeepers are members of an informal network of helpers centered around advocates and specialists, experts in both a discipline and in inflation technology who become known by reputation. Overdependence on gatekeepers creates other problems: as with any informal service, some advice received may be narrowly focused or simply wrong and the number of persons wanting free information often becomes larger than the number of persons able to provide it. As a result, the gatekeepers may become overloaded and eventually retreat from their gatekeeping roles. To hold on to expert help of all types, research and funding institutions must find ways of supporting and rewarding it. While institutions and disciplines have evolved ways of rewarding researchers publication in refereed journals, promo- tion, tenure no such systems yet reward expert help. Another aspect of the needed infrastructure is some formal provision for developing and disseminating software for specific research applications. Tools for constructing reliable, efficient, customized, and well-documented software are not used in support of scientific research. Computer science, as a supporting discipline, needs to facilitate rapid delivery of finished software, and easy extension and revision of existing software. The Department of Defense has recently pioneered the creation of a Software Engineering Institute at Carnegie Mellon University. Efforts to create tool building and research resources for nondefense software are worth encouraging. Development and dissemination of scientific software could be speeded in many cases by adoption of emerging commercial standards. These standards are supported by many vendors for a variety of computing environments. The temptation to narrowly match software to specific applications should be resisted in favor of standard approaches. THE USE OF INFORMATION TECHNOLOGY IN RESEARCH
44 INFORMATION Software, once developed, needs to be evaluated and disseminated. The TECHNOLOGY AND research establishment now evaluates research Information principally through THE CONDUCT peer review of funding proposals and manuscripts submitted for publication. OF RESEARCH SoDw~ needs to be dent with in a simper manner. EDUCOM has recently announced its support of a peer-review process for certain kinds of academic software. Other prototypes of systems for evaluating and disseminating software already exist (see boxes on BIONET and on IBM's software market). These See software market, box prototypes couple an electronic "market," through which software can be disseminated, with a conferencing capability that allows anyone with access to contribute to the evaluation of the market wares. The system provides an extremely important feature: those contributors who are most successful in the open market can automatically be identified and given credit in much the same way as authors of books and research papers now are. The infrastructure for information technology also depends on communica- tion links. The Panel believes that one of the most important services that computer networks can provide is the link between users and expert help. Existing links often take the form of electronic bulletin boards on various networks; other mechanisms also exist. Until more formal mechanisms come about, open communication with pioneers, advocates, and enthusiasts is one of AN EXA1MPLE OF A SOFTWARE MARKET INFRASTRUCTURE: IBM RESEARCH IBM's internal computer network connects over 2,000 individual computers worldwide, providing IBM's researchers, developers, and other employees with communications facil- ities such as electronic mail, file transfers, and access to remote computers. In recent years, software repositories and online con- ferencing facilities have grown and flour- ished, and become one of the primary uses of the network. With a single command, any IBMer has access to some 3,000 software packages, developed by other IBMers around the world and made available through the network. Many of these packages are com- puter utilities and programming tools, but others are tools for research. They include statistical and graphics applications, simula- tion systems, end AI and expert system shells, as well as many everyday utilities to make general use of the computer simpler. The high level of interconnection offered by the network and the centralization of informa- tion offered by the repositories allows scien- tists with a particular need to see if software to satisfy that need is available, to obtain it if it is, and to develop it if it is not, with confidence that they are not duplicating the efforts of some colleague. The online conferences (public special- purpose electronic bulletin boards), which are as widespread and accessible as the soft- ware repositories, allow users of the software (and of commercial and other software) to exchange experiences, questions, and prob- lems. These conferences provide a form of peer review for the software developer. For internally developed software, they provide a fast and convenient channel between the soft- ware author and the users; authors with an interest in improving their programs have instant access to user suggestions and to
the best ways to allow new technologies to be disseminated and evaluated by research communities. A final piece of infrastructure largely missing is housing and support for the storing and sharing of information. Such a function could be performed by disciplinary groups or, more generally, at the university level. Many university libraries have a professional core staff whose members hold faculty rank and function not only as librarians but also as researchers and teachers. Some university computer centers operate similarly. National laboratories, like astro- nomical observatories and accelerator facilities, have a core staff of astronomers or physicists whose main task is to serve outside users while also maintaining their own research programs. The existence of such a professional staff involved in the storage and retrieval of information for a discipline would provide a means of recognizing, rewarding, and providing status to these people. In some cases, a university might wish to consider integrating its information science department with its computer center and its library. eager testers. Users with a special need or a hard question have equally fast access to the author for enhancements or answers. The conferences also allow users with common interests to exchange other sorts of information in the traditional bulletin board style. AI researchers debate the usefulness of the concept of intentionality or discuss how software engineering methodologies apply to expert systems development; computer graphics and vision workers talk about the number of bits required to present a satisfac- to~y image to the human eye. Over 100 individual conferences support thou- sands of separate discussions about computer ~ and software and visual all other an peck of IBM's under. The sol repos itories provide a "reviewed" set of tools and appli- cations for a broad population on a wide spec- trum of problems. The organization that originally sets up a repository or a conference generally provides user support for it (answering "how to do it" questions), and installation and maintenance of local services is usually handled either by an onsite group that has an interest in the specialty served by the facility, or on a more formal basis by the local Information Sys- tems department. The benefits of these repositories and con- ferences are at least as widely distributed and probably even harder to quantify, but the success of these software libraries and online conferences within IBM should serve as an encouraging sign for others with the same sorts of needs. A market can be made to suc- ceed, provided that high levels of stan~iza- tion and compatibility in both hardware and software can be achieved. Such levels of in- teroperability have, so far, been easier to achieve at commercial institutions such as IBM Research than at research universities. such as IBM Research than at research universities. 45 THE USE OF INFORMATION TECHNOLOGY IN RESEARCH