Building Collaboratories for Oceanography
Research in oceanography covers a wide area and encompasses the marine aspects of several disciplines, including the physics, chemistry, biology, and geology of the ocean, the air-sea interface, the ocean bottom, and the shorelines. Yet the U.S. oceanographic community is relatively small, comprising about 4,500 oceanographers and ocean scientists in academia and in government laboratories. For them, the oceans continue to be a challenging environment in which to collect data, since the corrosive properties and hydrostatic pressure of seawater, mechanical failures associated with the forces of surface waves and currents, and the remoteness of many locations must all be dealt with. Considerable resources, relative to the available funding, must be expended to make the in situ observations required to advance the science. To an extent, an active modeling component of the community offsets the scarcity of and difficulty in obtaining field data.
Current and future capabilities that support electronic collaboration and improve access to data would do much to ameliorate some of the problems that impede the conduct of ocean science and would yield benefits realized in and beyond the oceanographic community. The ocean is known to be a large reservoir of heat, of carbon dioxide, and of other chemical constituents, and improved understanding of its role in weather and global climate change will contribute, for example, to enhanced capabilities for forecasting meteorological and longer-term climate conditions. To facilitate research in oceanography, to broaden the approach to addressing complex, large-scale processes that have significant social effects, and to carry on the learning associated with the larger, increasingly more cooperative process-oriented research studies carried out by the oceanographic community, a new thrust to observe the ocean on a global scale coupled with improved means of collaboration will be needed. Among such aids to collaboration are tools that can more effectively link members of the small, diverse oceanographic research community, both within and across the subdisciplines, that can provide more timely access to data collected at sea, and that can display and analyze information in ways that allow closer interaction between modelers and field experimentalists.
This chapter briefly describes oceanography and the present state of oceanographic research in general, outlines particular collaborative research programs, discusses the potential for improving computational support for current and collaborative research in oceanography, suggests the kinds of computer-based tools that would answer research needs of oceanographers, and lists the attributes of a useful collaboratory for oceanographic research.
The U.S. oceanographic community includes physical oceanographers who study the dynamics and kinematics of fluid flows, including ocean currents and waves, and the forces that drive them; chemical oceanographers who focus on the distribution and variability of the ocean's chemical constituents; biological oceanographers who study the plants and animals, as individuals and as communities, found in the ocean; geological oceanographers who study sediments and rocks beneath the
oceans; and ocean engineers who design structures and instruments placed in the ocean and provide other related technology. Typically, academic oceanographers are either research staff or tenure-track faculty. For individuals in both groups, success in their careers depends on the ability to publish original, peer-reviewed research results in a timely fashion and at a regular rate. Within each oceanographic discipline, research efforts include field experimentation, numerical modeling, theory development, and laboratory experimentation. In field experiments, researchers have collected observations by working from ships and other crewed platforms and by deploying and leaving in place instruments that are moored, free-drifting, or placed on the sea bottom. More recently, observations of the ocean's surface have been made from satellites and aircraft.
In support of field work, the Universities National Oceanographic Laboratories (UNOLS) oversees the operation by academic institutions of research ships owned by the U.S. Navy, the National Science Foundation, or operating institutions. Available to investigators and scheduled roughly 1 year in advance, these ships make a series of research cruises all over the globe, returning to their home institutions perhaps once per year. The National Oceanic and Atmospheric Administration (NOAA) operates a fleet of about 20 ocean-going vessels, comparable in size to those of UNOLS, that are active in research, charting, marine resource assessment, and fisheries oceanography. Use of a UNOLS research ship costs approximately $15,000 per day, and 30-day cruises are typical.
The ships are used as platforms from which to make observations and to deploy instruments that are left in place on the ocean bottom, at the surface, or within the ocean (Figure 2.1). Instruments require considerable electrical power to operate and are designed to perform various tasks, such as collecting seawater, oceanic plants, or animals or making geographic surveys of ocean properties. Berthing space on research ships limits the number of scientists on board to roughly 12 to 24 individuals. However, because more than one investigator is on board and research must be carried on 24 hours per day, often only three to four scientists and technicians are available at one time to do the work associated with a specific project. At the same time, the equipment taken to sea has become more sophisticated, relying on advanced electronics and computers, and the need often arises for two-way ship-to-shore communication with engineers, technicians, and programmers unable to be on the ship due to space or financial limitations. Given the additional difficulties of staging experimental work from foreign ports, work done from ships remains a challenge.
Time series measurements are gathered by drifting and moored instruments (Figure 2.2) used to collect data sets that relate to specific events whose incidence is difficult to predict and match with the rather inflexible ship schedules. In addition, these instruments provide the capabilities of collecting unbroken data sets of up to several years in length and, when placed along a mooring line, of making observations at many different depths within the ocean. Oceanographic moorings are expensive, and their use requires a cruise both to deploy and recover the moorings. Recovery is a necessity not only to bring instruments back for reuse or maintenance but also to obtain the data they have collected. Cheaper, nonrecoverable instruments, called drifters because they are released to drift freely on the sea surface or within the ocean, deliver 100 to 200 data values per day using radio or satellite telemetry. When larger amounts of data are needed, instruments that store their own data must be used.
The limited data telemetry now done from moored and drifting instruments relies on links to satellites and usually is possible only when the instrumentation is located at the surface—as, for example, the meteorological sensors on surface buoys—or when oceanographic instruments are linked electromechanically or acoustically (using coded transmission of sound through the water to carry information) to surface packages. Radio frequencies do not penetrate seawater as they do air and space. Acoustic telemetry techniques, which are more suited for use in the ocean, are under development, but transmission of data through seawater is not routine. To work around this limitation, instruments have
BOX 2.1 DATA COMMUNICATION FROM OCEANOGRAPHIC INSTRUMENTS
Polar-orbiting satellites are often used for relaying oceanographic data. Transmitters mounted on buoys or ships for use with the Argos data collection system on polar-orbiting NOAA satellites transmit 32 bytes of data roughly every minute. However, the satellite passes overhead only 5 to 15 times a day and during any overpass is in view for only about 10 minutes. Because of the limited data rate possible with this system, transmission of a complete set of hourly meteorological data (wind velocity, barometric pressure, solar radiation, long-wave radiation, sea surface temperature, air temperature, relative humidity, and precipitation) requires use of multiple transmitters and transmission of the contents of a ring buffer (the data from 4 hours are stored and transmitted over and over again, but are updated as new data become available) from each transmitter. However, with the cost of use of Argos at about $4,000 per transmitter for users in countries participating in an Argos joint operating agreement, this method is not used often. (More information about oceanographic data telemetry is available in Dickey et al. (1993) and in Briscoe and Frye (1987).)
recently been designed that rise to the surface to access the satellites and transmit the data that have been gathered. However, transmission of data collected at sea by moored buoys, drifters, and ship is the exception and not the rule, due to limited transmission capacity and the great expense associated with the process (Box 2.1). The commonest way to collect data from instruments at sea is to undertake a cruise to retrieve the instruments.
In part because of the cost of maintaining and staffing research ships and moorings and the difficulty of achieving access to near-real-time data, no worldwide, comprehensive, operational oceanographic observing system, such as exists in the atmosphere to support weather prediction, is in place to monitor the variability of the oceans. Oceanographic field work to date has focused on specific hypotheses and deployed its limited resources for short periods to investigate specific processes. However, financial constraints and the inaccessibility of data are not the only reasons for lack of a global ocean observing system. Having seen much of their growth since World War II, the ocean sciences are relatively young and are characterized by having gathered comparatively few data from the ocean. Current field work, which is very productive scientifically and is contributing to a growing understanding of the time and space scales of the processes at work in the ocean, is thus essential to building the foundation of understanding required to intelligently plan observations on a global scale.1
Modeling serves several purposes in the study of the oceans, among them (1) complementing sparse oceanographic data, (2) providing a means to test our understanding of the processes at work in the ocean, and (3) producing forecasts (there is great interest, for example, in developing the ability to predict the occurrence of El Niño). Modeling now under way addresses the ocean's role in climate change, the maintenance and variability of the large-scale ocean circulation, and the interaction of the ocean circulation with the atmosphere, the biosphere, the hydrological cycle, and the solid earth.
Modelers are found in small numbers (perhaps only one or two in some cases) at many institutions, although centers of activity exist at NOAA's Geophysical Fluid Dynamics Laboratory in Princeton, New Jersey, at the National Center for Atmospheric Research in Boulder, Colorado, and at Navy laboratories. Ocean modelers in general require access to large databases with, for example, climatological data that specify the initial state of the ocean or the annual variability of the surface forcing of the ocean by the atmosphere; high-power workstations and access to supercomputers; specialized data-
display software, and the means to collaborate with colleagues at other institutions while writing proposals, reports, and publications. Modelers need to store, manipulate, exchange, and visualize the data produced by models; for a global model of the ocean with high temporal (every 4 hours over a period of many years) and spatial (every 100 kin) resolution, the data files could be as large as 1 Gbyte.
Because of the complexity of ocean models (e.g., ranging from direct eddy simulation models of the small scale to global general circulation models) and the need for powerful computing resources, relatively few oceanographers have access to ocean models as research tools. However, a growing number of modelers are working with shared (community) models. Thus, the models are in a sense community resources and represent a form of collaboration that maximizes the use of precious resources for scientific advances.
COLLABORATIVE RESEARCH IN OCEANOGRAPHY
World Ocean Circulation Experiment
As described in a recent report (NRC, 1990b, pp. 4-5), the World Ocean Circulation Experiment (WOCE), an international program that is part of the World Climate Research Programme,
was created because an understanding of ocean circulation is crucial for predicting global climate change. Circulation is related to climate on a decades-to-centuries scale, through the transfer of heat and momentum between the atmosphere and the ocean. WOCE will study surface and subsurface circulation of the world's oceans over a seven-year period, with the goal of understanding circulation well enough to model its present state, to predict its future state under a variety of assumptions, and to predict feedbacks between climate change and ocean circulation. These goals would be met by describing (1) the ocean's present circulation and its variability, (2) air/sea boundary layer processes, (3) the role of exchange between different ocean basins in global circulation, and (4) the role of the oceanic heat storage and transport on the global heat balance.
The WOCE program is divided into several interlocking parts, the largest of which is the global survey carried out in international cooperation. This global hydrographic survey will carry out a number of cross-ocean sections sampling (1) water density, which helps drive ocean circulation, (2) various natural tracers, such as oxygen and nutrients, and (3) man-made tracers of water motions, such as chlorofluorocarbons. Worldwide placement of floats and current meter moorings will augment the global survey with direct observations of current ocean velocity. Important objectives are quantifying the oceanic transport of heat and the pathways of downward movement of water by which atmospheric gases are transported into the deep ocean, and to correctly model observed circulation patterns. An upper ocean program will focus on the atmosphere-ocean fluxes that drive the ocean and feedback to the atmosphere and on variations of the upper ocean temperature and heat storage. Satellites, voluntary observing ships, moorings, and surface drifters will be integrated into an observation system capable of global measurements.
Tropical Ocean-Global Atmosphere Program
The Tropical Ocean-Global Atmosphere (TOGA) program is a decade-long international research program begun in 1985 to measure, model, understand, and predict variability in global climate associated with the El Niño-Southern Oscillation (ENSO) phenomenon (National Research Council, 1990c). ENSO is a robust, identifiable, recurrent (roughly every 4 to 7 years) climate signal whose origins can be traced to the tropical Pacific and whose impacts are felt worldwide through perturbations of the atmospheric
general circulation. The TOGA concept derives from a theory of coupled ocean-atmosphere interactions in the tropical Pacific first articulated by Jacob Bjerknes in the 1960s and subsequently expanded by other investigators; these studies provided impetus for the planning of TOGA in the early 1980s, in the midst of which the 1982-1983 ENSO event occurred. The 1982-1983 ENSO event was the most intense of the century, leaving in its wake human misery and billions of dollars in devastation on a global scale. Significantly, development of the 1982-1983 ENSO went undetected until the event was well under way. This dramatized to the scientific community the need to monitor the tropical oceans in real time to detect precursors of ENSO and to develop models capable of skillfully predicting ENSO events months to years in advance.
These two capabilities, real-time monitoring and model-based climate prediction, have guided the implementation of an ocean-observing array in support of TOGA objectives. TOGA observations fall into two broad categories, namely, in situ observations and remotely sensed satellite observations. Among the most valuable of the satellite data collected to date have been measurements of sea surface temperatures from NOAA weather satellites and of sea level from the U.S. Navy's Geosat mission. These data have been useful in defining patterns of surface variability on time and space scales relevant to understanding the ENSO phenomenon. They have also been valuable for assimilation into and/or validation of ocean models under development for climate prediction.
Given the reality of limited resources, TOGA has concentrated its in situ observational efforts in the areas of the Pacific Ocean where the scientific issues related to ENSO events are most clearly defined and, from a global perspective, are most compelling (Figure 2.3). Much of the data are transmitted in real time via polar-orbiting weather satellites (Satellite Service Argos) or geostationary satellites for use by the research community. Increasing amounts of in situ data are also being disseminated on the Global Telecommunications System to national meteorological centers for use in operational weather prediction.
One measure of progress during the first half of TOGA is the development of statistical, statistical-dynamical, and purely dynamical models that exhibit limited though significant skill for predicting ENSO events several months to a year in advance. This success has stimulated discussion not only of how to best capitalize during the second half of TOGA on the scientific advances that have been made, but also of how to best translate these advances into an operational system for climate prediction during the post-TOGA period (1995 and beyond).
TOGA is unique not only for its scientific contributions to understanding and predicting ENSO events and related phenomena, but also for the way the oceanographic community has organized itself to make these contributions. The compelling need for obtaining data in real time has led to a more collegial attitude among oceanographers toward the sharing of data for both operational and research purposes. In many instances, the customary 2-year period of exclusive or proprietary rights to the analysis of a new data set in oceanography has been waived by a particular investigator in the interests of furthering the common goal of improved understanding and prediction of short-term climate variability. This shift in attitude is by no means universal within the TOGA oceanographic community, nor has it been completely voluntary. It has been fostered in part by the peer review process, which has favored grants to investigators who propose to disseminate data in real time (or in near-real time) to a broad spectrum of investigators.
Another indication of the sociological transformation under way among oceanographers involved in TOGA is the gradual movement toward a modus operandi similar to that in meteorology and stemming from the nature of the scientific problems being addressed. In meteorology, much of the data collection effort is driven by the need for improved numerical weather prediction, and most observations for this purpose are supported by the intergovernmental World Weather Watch. A long tradition of operational support for meteorological observations obviates the need for most research meteorologists to become involved in field work. Oceanography, by comparison, is a relatively new field of study and, until recently, has not had a clearly defined operational imperative to support climate prediction. Hence much
of the data used in oceanographic research has been, and still is, collected via the mechanism of individual peer-reviewed proposals. In view of the daunting logistical and technical challenges involved in oceanographic field work, investigators who successfully compete in the peer review process have traditionally had little incentive to share hard-earned, highly prized data sets too freely. In TOGA, on the other hand, real-time and near-real-time oceanographic data streams are voluminous and continue to grow. In parallel with the evolution of the TOGA ocean-observing array, numerical models and ocean data assimilation techniques suitable for climate studies have also undergone rapid development. It is now generally recognized that long-term maintenance of the TOGA ocean-observing array should become the responsibility of operational agencies, since the justifications for large-scale measurements are cast increasingly in terms of initializing and verifying operational ocean models for climate prediction. Accordingly, NOAA's National Ocean Service has become an ever more prominent source of support for TOGA observations, as in the case of the tide-gauge sea level network and the Ship of Opportunity Program/Expendable Bathythermograph Program.
Additional Interdisciplinary Programs
Interdisciplinary oceanographic research will be increasingly common, as exemplified by programs such as the Global Ecosystems Dynamics Experiment (GLOBEC) and the Joint Global Ocean Flux Study (JGOFS). GLOBEC is designed to evaluate how changes in global climate and related physical processes influence the ability of individual animals to feed, grow, reproduce, and survive in the sea. JGOFS focuses on the climatic implications of time-varying fluxes of greenhouse gases (e.g., carbon dioxide) as related to physical forcing, bringing together at sites near Bermuda and Hawaii scientists who examine biological, geochemical, and physical processes on time scales ranging from months to years.
Successfully realizing the goals of such programs will almost certainly require a concerted effort to ensure that the electronic infrastructure supports and sustains collaboration across the disciplines, whose tools and data types can differ greatly. In addition, the diversity of the data (Figure 2.4) collected in such interdisciplinary programs will itself present a challenge to collaboration. Some physical data are digitized in the instruments and made available soon after collection via satellite telemetry, whereas some biological data, such as population statistics, may not be available for exchange until months after completion of the cruises in which the sampling was done. It will be a challenge to establish a database with the life span and flexibility needed to serve all participants in the large interdisciplinary programs under discussion, which potentially could generate 25 Gbytes of data per year.
USING COLLABORATORY COMPONENTS TO FACILITATE RESEARCH
The oceanographic research community is widely distributed geographically, comprises academic, government, and private-sector scientists, and has many subdisciplines. Researchers in field programs commonly collaborate; single-investigator experiments are now rare. Increasingly, the high cost of field work has led to large, cooperative experiments in which investigators pool instrumentation and share time on ships. Thus investigators at diverse locations need ways to plan cooperatively, write coordinated proposals, communicate between different platforms in the field, and work together to analyze and publish their results.
Improving Access to Colleagues
Increasingly, the problems of current interest are either interdisciplinary or global, or both. As a consequence, research programs not only cross the lines of the traditional subdisciplines of oceanography (physical, biological, chemical, and geological) but also require collaboration among international scientists. The need for ongoing dialogue with colleagues exists as well between theoreticians, numerical modelers, and laboratory modelers. Traditionally, much of the required interaction has been done at meetings, requiring participants to travel to a common location. Now, however, at the same time that more widespread collaboration is needed, travel funds and the time scientists have available to travel are both more difficult to come by. Ways to facilitate collaboration across disciplines and at a distance are needed.
BOX 2.2 BRIEF HISTORY AND DESCRIPTION OF OMNET INC.'S SCIENCENET
In 1979, in response to the need of participants in large research programs for a communications alternative superior to telex, real-time telephone communication, or voluminous photocopying, pilot electronic mail networks were set up for three oceanographic research programs and managed from the Massachusetts Institute of Technology. Omnet Inc., started in 1980, provides the on-line service SCIENCEnet. The first group to use SCIENCEnet was the NSF-funded Pacific Equatorial Ocean Dynamics Experiment (PEQUOD), which involved 40 program participants from the United states, Australia, and Canada. Over the next decade, the growing network included primarily the international earth sciences community.
Simple electronic mail provided program groups a means for communicating without working together in real time. Meetings were scheduled, agendas modified and clarified. Overtime, SCIENCEnet expanded beyond simple communications, and participants found new ways to Use the simple bulletin board structure. Notices were posted on subjects ranging from calls for papers and meeting announcements to job postings. De facto conferences sprang up. ''ENSO.Info'' (El Niño-Southern Oscillation Information), for instance, is a worldwide discussion of the likelihood of El Niño events occurring in the southern Pacific, the accuracy of various models, and related topics. Bulletin boards have been used to locate lost deep-sea research buoys. The "Gulf. Mex" board became a repository of maps of oceanographic data from the Gulf of Mexico, using an ASCII format devised by one of the participants. Joint documents were created. Schedules and calendars were shared.
SCIENCEnet has a library of custom electronic message forms that allow participants to submit annual reports to funding agencies such as the Office of Naval Research and the National Science Foundation, or to register for meetings. Under the pressure of individual, programmatic, and agency needs, SCIENCEnet evolved into an early, effective collaboratory working environment.
The early need to connect to the network from research vessels at sea was first accomplished in the early 1980s via a voice channel on a communications satellite, the Applications Technology Satellite (ATS), used by the ocean research community. As seagoing researchers became more dependent on such access and needed to move outside the footprint of the ATS satellite, much of the ship-to-shore traffic moved to a commercial satellite system, Inmarsat. SCIENCEnet now has participants in 50 countries and has connected researchers in remote parts of the world. For example, it currently provides communications for Antarctic research stations and a remote ice-coring party on the Greenland ice sheet. In 1989, SCIENCEnet provided a link for a climbing party on Mt. Everest to receive customized weather forecasts from the University of Pennsylvania.
SCIENCEnet does not provide computational resources, but users have not demanded that it do so, owing to the availability of cheap local computational power on individuals' desktops at one end of the computing spectrum, and access to national supercomputing centers at the other. (It is clear, though, that program source code has been a nonnegligible part of the message traffic.) Plans for the future include a closer integration of SCIENCEnet with the Internet, from which Omnet Inc. currently provides gateway access to SCIENCEnet, and With Joint Oceanographic Institutions Inc., the development of SeaNet, an extension of the Internet to ships at sea. Also, the capabilities of the network are being expanded to provide for multiple-author document preparation, a simple conferencing system, better directory services, and an improved database capability.
Electronic communication through SCIENCEnet is one example of how the oceanographic community has attempted to establish links and infrastructure that permit collaboration (Box 2.2). Run by Omnet Inc., SCIENCEnet provides a user-friendly environment, user support, and customized communication for its paying customers and is an important part of the communications infrastructure
for oceanography. However, there continues to be a pressing need for improved infrastructure within the oceanographic community.
Improvements to the communications infrastructure that boost connectivity among researchers and between researchers and remote platforms would have clear benefits. For example, improved voice and computer access during data-gathering cruises would allow more technical support to be carried out by land-based personnel. A project under development by Omnet Inc. and Joint Oceanographic Institutions Inc. is intended to extend the Internet to ships, buoys, aircraft, and other remote platforms that exist in the ocean environment. Called SeaNet, the project seeks to provide low-cost/high-bandwidth data transmission capabilities to oceanographers. Although it is still in the earliest stages of development, this project, should it succeed, promises to have a significant positive impact on the oceanographic community.
Collaboration tools that initially help modelers to work together, such as electronic mail, video links, and methods for exchange and common display of data, would be valuable, as would tools that build intuition about a process being modeled. Such tools would also have value beyond the modeling community. Modeling can guide the planning of field work, and tools that simplify access to and understanding of model results would make it easier for field experimentalists to gain from and interact with modeling efforts. In the future, parallel processing techniques, three-dimensional modeling and visualization methods, including movies and other animation techniques, and electronic publishing are anticipated.
One positive aspect of the developing electronic communications infrastructure and potential collaboratories for oceanography is the overt recognition of the need for computer and information specialists within oceanography and the establishment of exchanges between such specialists and the larger computer science and information community in the United States. This may particularly aid the developing oceanographic modeling community.
To make progress on the "grand questions" of oceanography, oceanographers, most of whom have been trained in a traditional, narrow subset of the field, must strive to see the ocean as a system and not just as being of isolated physical, biological, or chemical interest. New ways of pooling and applying the diverse skills of scientists whose formal training may vary considerably—in effect, of developing efficient collaborations—are thus needed. One approach is to bring faculty, graduate students, and principal investigators together to explore research issues in an extended workshop (Box 2.3). The components of a collaboratory workshop include:
An interdisciplinary group of interested scientists (the core faculty) to teach the fundamental science of a particular project;
Participants consisting of a combination of advanced graduate students and principal investigators;
A period of 4 to 6 weeks in which to conduct the workshop;
Funding by an interagency group whose mission is furthered by the workshop; and
A computer network that provides access to databases and numerical models relevant to the scientific project.
Ideally, participation in such an intensive period of study stimulates graduate students to think about particular problems in a "systems" mode, establishes early in a scientific career lasting student interactions that cross the traditional subdisciplines of oceanography, and encourages principal investigators from diverse subdisciplines to collaborate immediately to share data, write papers, and propose new science.
BOX 2.3 OCEANOGRAPHIC MODELING—AN EDUCATIONAL WORKSHOP
The paradigm of a collaboratory workshop has been used by Lewis Rothstein of the University of Rhode Island, who held an educational workshop in June 1991 for participants in the Joint Global Ocean Flux Study (JGOFS). The focus of the workshop was the physics of the equatorial ocean; the goal was to define the fundamental physical processes critical to understanding biogeochemical cycling in the region. A group of 4 resident faculty, 8 guest faculty, 33 graduate students, and a number of principal investigators participated at the University of Rhode island's Graduate School of Oceanography. A computer network established for the workshop consisted locally of 15 Unix graphics workstations, two color printers, video recording equipment, and a variety of peripheral devices networked to the CRAY Y/MP at the National Center for Atmospheric Research. Each student was allowed 10 hours of CRAY time for numerical experimentation, under the guidance of the faculty. Physical models of the equatorial circulation, as well as models of ecosystems, were prepared to provide students with a basis for asking "what if" questions of the "system": e.g., How would the ecosystem respond to a reduction in equatorial wind stress? The students were able to produce movies of the resulting simulations, which provided rapid graphical feedback to enhance the educational process. The principal investigators used the same tools to extend previous results, collaborate on new papers, and define potential future projects. Thus, the workshop furthered the short-term goals of the JGOFS program as well as the long-term educational goals of the students and faculty. Such an extended workshop is an attractive model for future efforts to provoke new ways of thinking about ocean system science.
Improving Access to Data
Oceanographic data are diverse in nature because of varied sampling platforms, the interdisciplinary problems considered, and the variety of required sampling techniques (e.g., optical, acoustical, and physical). Data take forms ranging from point time series to global spatial maps. Near-real-time data are needed for several purposes, including linking in situ and satellite data sets, conditional sampling, and modeling. More generally, models require satellite and in sire data sets for verification of both initial and boundary conditions. Often, however, one research program produces a mix of telemetered, near-real-time, and delayed data, with some data available soon after the end of the program and other data available only after several additional years of processing and analysis. Thus, access to and synthesis of data remain problematic for many oceanographers. Reliable, affordable data transmission from sea to land and policies that establish incentives and protection for sharing data are crucial components of an improved communications infrastructure for oceanography.
Particularly needed is development of a unified system for the collection, processing, and distribution of interdisciplinary, in situ data collected from ships, mooring buoys, drifters, and, in the future, autonomous underwater vehicles. The present Argos satellite data communication system (see Briscoe and Frye (1987) for a summary of current oceanographic telemetry methods) is adequate for some purposes, but not (because of limited bandwidth) for handling data from many of the emerging interdisciplinary instrumentation platforms, which collect several channels of data at high sampling rates. Other systems such as Inmarsat are prohibitively expensive for researchers.
A collaboratory facilitated by improved data communication capabilities and enabling the rapid exchange of in situ and satellite data at modest cost would help to meet one of the basic needs of oceanographic researchers. Ideally, such a system would have the capability to modify sampling rates in response to changing environmental conditions. For data from many coastal environments, transmission by cellular radio methods will suffice (e.g., Dickey et al., 1992). However, for data from the open ocean, satellite methods will most likely be needed.2
Cost-effective data telemetry from uncrewed platforms at sea would give oceanographers wider access to the ocean, the majority of which falls outside the commercial sea lanes. Two-way communication with remote platforms would permit conditional sampling and remote maintenance, further enhancing the value of the platforms.
Real-time access to all data as they are collected in the ocean has the potential to revolutionize field work. Data analysis and publication would no longer wait for the recovery of the instruments. Model testing and development could be carried out in parallel with the field work, perhaps guiding revisions to sampling strategies, rather than waiting for the release of data that usually postdates instrument recovery by one or more years. The instruments, if still working, would not need to be recovered, except, perhaps, for calibration or refurbishment. In some cases, it might be more economical to leave them in place rather than to field a recovery cruise. The design of the instruments themselves would also change, with on-board data storage hardware no longer needed or used only as a backup.
Providing Tools for Collaboration in Oceanography
The research tools of the oceanographic community—such as ships, satellites, moorings, drifters, supercomputers, and specialized modeling software—are often expensive, specialized, and inaccessible (often they are one of a kind). These resources could be put to better use if the tools and/or the results of their use were more widely accessible via electronic networks.
The needs of the oceanographic research community for supportive infrastructure can be described in terms of a number of tools that would better support collaboration and further improve access to data. The integration of those tools into a collaboratory could be particularly helpful to oceanographers.
TOGA Data Catalog
TOGA data sets are routinely distributed to the oceanographic and meteorological communities by magnetic tape, CD-ROM, electronic mail, dial-up databases, tabulations in monthly bulletins, and other media. Dissemination of these data has been encouraged by the U.S. and international TOGA project offices and represents the collective effort of individuals involved in field programs, specialized TOGA data centers, and national oceanographic data centers. Although current means of data distribution are adequate for many purposes, a centralized system for interactive, on-line access to information on essential TOGA data sets would greatly enhance collaborative work in short-term climate studies. Such a tool might be called a TOGA data catalog.
A TOGA data catalog could run on an X-windows workstation with a user-friendly point-and-click interface to allow interactive browsing of available data sets. Users would be linked by the Internet to the specialized TOGA data centers and to other providers of relevant data, so that the database could be updated regularly (every day to every month, depending on data type). Data would be classified by geographic location, depth range, time of collection, geophysical variable (e.g., ocean temperature), sensor platform (e.g., mooring, drifter), quality control and degree of processing (e.g., real-time vs. delayed mode), and by cruise (for shipboard data). Analyzed fields from numerical ocean climate models such as that used at the U.S. National Meteorological Center (NMC) would be accessible, as would be the analyses distributed in hard-copy form on the pages of NOAA's Climate Diagnostics Bulletins.
Information on data sets would be displayed graphically where appropriate (e.g., on a geographical mock-up of the globe, for information on position). Each data summary would be annotated with information on the data sources and would provide instructions on how to access the data via the Internet. Display capabilities built into the data catalog would allow visualization of the data sets on a user's terminal. Displays would be in the form of time series, vertical sections, horizontal fields, movies,
and so on. Users would be encouraged to forward comments on their experiences in working with particular data sets, and these comments would be incorporated into the annotated database.
A prototype version of such a data visualization and analysis system being developed at NOAA's Pacific Marine Environmental Laboratory is focused at the moment on data from the TOGA Tropical Atmosphere and Ocean (TAO) moored array and NMC operational model analyses. The TOGA TAO workstation concept is expandable in principal to encompass a much broader spectrum of data sets relevant to climate studies.
Globe Data Catalog
Data collected in oceanographic research are often difficult to find and/or to access. Several oceanographic data catalogs exist now, but each is based on its own paradigm. A globe data catalog—a system that could be accessed easily with a workstation featuring advanced three-dimensional graphics—would provide all oceanographers a common means to access various data types and their locations and periods of collection (Box 2.4). One goal of such a system would be to facilitate collaboration by simplifying access to data and providing for all oceanographers the kind of shared experience that fosters understanding and trust. The challenge is to design an earth sciences data catalog paradigm that all oceanographers would refer to when discussing their data with one another.
A system of the kind described in Box 2.4 avoids the problem of data format completely. Instead, the focus is on the definition of supported views. Supported views might include a standard form
BOX 2.4 GLOBE DATA CATALOG
Imagine a three-dimensional representation of the globe displayed on the scientific workstation sitting on your desk. (If you want to be imaginative, consider it as a hologram spinning around slowly in a large room!) Imagine further that any oceanographers in the world can see the same representation simply by typing a single command at his or her computer.
Spotted over the globe are a scattering of dots, icons, and square grids representing the locations where various types of oceanographic data have been collected. These "data catalog objects" can be organized into layers representing different types of data. Multiple layers can be shown at the same time by displaying each as a different color. Shown are some standard layers defined by oceanographers, ecologists, biologists, and others. Individuals can define their own layers if they wish.
Across the top of the workstation screen are the date and the time. A user can scroll forward and backward in time. Scrolling 100 years or so backward might reveal icons on the globe that identify where to find the ship logs kept for long-ago cruises. Scrolling forward in time to the future might display locations for future cruises. For the time between, a catalog of oceanographic information is represented.
Each data catalog object contains data descriptions of one or more "data catalog items" representing various data sets. The data set itself is not stored in the system. Instead, views of data are stored. A view is someone's interpretation of the data. The first View produced for a data item might simply be a reference to how the data set was collected, who collected it, and where it is now held. Another view might be a Tektronix 4010 plot of the data. A fancier view might be an X-windows movie visualizing a time series of the data. In some cases the algorithm used to generate a view might be included somehow as part of the view.
Data catalog objects are selected with a mouse either by pointing at a data catalog object and clicking on it or by somehow dragging a mouse through many dimensions—latitude, longitude, depth, time, and multiple type-layers. Selecting data objects on the globe generates a menu of the related available views of the data objects selected, along with information on who published a view. As mentioned above, layers might be defined by data type, discipline, author, instrument type, or some other category. Users of the system would provide their own profiles defining how the layers would look for each individual.
submitted to a data archive describing the data set, a Tektronix plot file, a subset of the data collected, or a message to connect to some person's computer on the network and run a program, which in turn would generate an X-windows session on a user's workstation.
Note that security is an important part of such a system. In fact, in some cases, attempts to access a view would produce the message "Access denied; please call Mr. Jones for further details." Groups of collaborators would be assigned access codes so that they could access each other's views. The details of how to provide reliable security for such a system would be an interesting area to pursue.
If any oceanographer on a network can bring up a globe data catalog on his/her screen, then the catalog resembles a publication. In fact, it provides an opportunity for an individual to publish his/her data and to get credit for the task of collecting the data. A view would thus be another type of authored publication with the appropriate amount of credit attached to it. Perhaps the system could employ hypertext capabilities such that the number of links back to another person's data-view would deserve credit as well.
Data set editors would be assigned to data catalog objects. Each would evaluate the views submitted for the data and assure the quality of each. In a sense this person would have the same role as a journal editor.
Although the technological aspects of such a tool are fun to consider, an important contribution, in terms of increasing collaboration, would be the development of a tool that provides a common experience for all oceanographers to work from.
Many useful resources are already available on the Internet. The problem is knowing whether they are useful for doing one's own research. Needed is an easy-to-use tool that would allow researchers to generate a tour or "sample session" in the use of applications available on the Internet. Such a tour would be similar to those tours Claris Corp., a subsidiary of Apple Computer Inc., uses to introduce products to new users.
A tour tool would enable a user to access a tour available on a node on the network and then sit back and simply watch two or three sample sessions using the network resource of interest. Tours would require different levels of user equipment and software and would have to be advertised as such. Tours might include, for example, (1) a sample anonymous session on a Unix workstation, (2) accessing an oceanographic database service such as the Ocean Network Information Center (OCEANIC) maintained by the University of Delaware, (3) sample library searches on various library systems, and (4) an X-windows demonstration using the Lamont-Doherty View-server.
Those interested in creating a tour would need a session-recorder or movie maker (in fact, session recorders for several platforms) and perhaps a tour-file format to generate tours or sample sessions of the resources they provide on the Internet. Such a tool might provide the capabilities to easily annotate a tour with text, annotate with voice or music, record mouse movements where appropriate, or give users a chance to try doing something simple, interactively.
Cruise Planning Tool
Oceanographers often collaborate through joint participation in research cruises that bring together scientists from diverse disciplinary areas and from many countries. Logistical and scientific planning for cruises could be facilitated by a tool with a variety of functions, including (1) a capability for determining space availability and utility; (2) a calendar for ship scheduling; (3) sampling regimens; (4) forms, including shipping manifests, crew medical and personal information forms, and State Department forms for clearance to work in foreign waters; (5) sample documents showing a planned cruise track; and (6)
an agreement indicating the breakdown of ship time between different investigators on board. The concepts underlying such a cruise planning tool could be applied to tools for use with distributed oceanographic facilities, including ships, moorings, drifters, field laboratory facilities, and others.
As used during discussions at the CSTB workshop, the term "ontology" connotes a set of perceptions, terminology, procedures, and perhaps myths that are common to a group of people. Oceanographers within subdisciplines (e.g., physical oceanography) have particular ontologies, just as oceanographers in general have ontologies that may differ from those of their mother disciplines (e.g., physics). In order to collaborate on interdisciplinary problems, oceanographers need to develop an ability to communicate using a common set of terms. An ontology tool could be developed using network and computer resources to facilitate the introduction of an individual to the unfamiliar ontology of collaborating oceanographers. To be effective, such an ontology tool might have to include videos and face-to-face experiences that would convey visual effects and other subtleties.
ATTRIBUTES OF A USEFUL COLLABORATORY FOR OCEANOGRAPHY
A collaboratory for oceanographic research will require a variety of information resources and services that support a broad spectrum of interactions ranging from informal communication to active collaboration, possibly on interdisciplinary research, and including "passive" collaboration via controlled databases. Such a system would consist of a suite of tools whose usefulness will depend critically on the degree to which several criteria are satisfied: interoperability, transparency, customizability, integrity, and extensibility. Although these criteria are desirable even for the less formal modes of interaction, they are essential for the more formal ones.
It is imperative that the various components of oceanographic research—experiments, data, models, graphical and tabular interpretations, textual summaries, documentation, and publications—be thoroughly integrated and interoperable. This requirement is only partially satisfied by the imposition of standard data formats and interfaces. Portability of computational models, for example, is becoming an increasingly important issue as current parallel processing practice requires that programs be customized for each type, brand, and generation of machine. Portability, an active area of computer science research that is not simply a matter of standards, is highly relevant to the notion of a collaboratory if models are to be shared.
Data and the tools for their interpretation pose a number of research issues that also go beyond mere standards. Researchers may wish to view or process data in ways that the initial investigator did not imagine. The organization of scientific databases to allow flexible access to multidimensional, temporal data is also a research area that underpins the development of a collaboratory. This need is particularly acute in interdisciplinary work involving biological and chemical as well as physical data possibly collected over a range of time and space scales.
Effective and proper use and interpretation of data submitted to a collaboratory will require adequate documentation of the experiment in which they were collected. While these "metadata" may customarily be provided as informal annotations, ideally they, too, should be formalized to permit modeling and analysis software to assess possible inappropriate use of the associated data.
Related to interoperability, the second major criterion of a useful collaboratory is transparency. The collaboratory will likely be implemented as a physically distributed system with individual institutions or researchers perhaps maintaining "ownership" of their own contributions and providing network access to others. Ideally, the distributed nature of the information will be transparent to users, thus obviating the need for explicit connections to remote machines and explicit transfers. Access time can be reduced through general mechanisms such as fast networks, caching, and replication. It has been argued by some in the oceanography community that there is no compelling need to provide on-line access to raw data, but rather only to summaries, analysis, and interpretation. The argument is that the data may always be obtained informally, if necessary, for further investigations, but that maintaining general access to all data would overwhelm the system. In a transparent system, however, the data could remain accessible without burdening the system or the users.
A properly designed and successful collaboratory must also recognize the needs of its users, both as individuals and as identifiable subgroups. Users may wish to have customized interfaces that allow interaction on familiar terms. Multiple degrees of collaboration, including single users, enumerated groups, and classes of users, should also be supported to protect information not ready for full disclosure. User-imposed restrictions on the scope of information of interest could potentially be used by the system to optimize access in a distributed environment.
Integrity and Extensibility
Integrity refers not only to maintaining system and data security across the various partitioned layers of a system, but also to assuring the validity of information submitted to the system and thus of subsequent research based on that information. Although protection of integrity depends on policies and procedures at least as much as on technology, features such as formalization of metadata and documentation of models can provide valuable support for ensuring integrity.
Finally, a national collaboratory must be extensible to allow for the consistent incorporation of new services and features. Ensuring extensibility requires that considerable forethought be given to all representation and interface standards so as not to preclude or inhibit any foreseeable extension of the system.