We are the first generation to have the tools to study the Earth as a system. During the last few decades of the 20th century, the development of an array of technologies has made it possible to observe the Earth, collect large quantities of data related to components and processes of the Earth system, and store, analyze, and retrieve these data at will. These data can be registered to specific locations on the Earth's surface and can be integrated into spatial-temporal information systems and registered at the same scale and cartographic projection as other resource data.
Another important technological advance, the Internet, has had a major impact on the nature of all scientific research. This electronic network links computers located in universities, government agencies and laboratories, as well as many commercial enterprises. The network, initially developed and partially supported by the federal government, allows rapid communication among scientists. Many groups have set up Wide Area Information Servers (WAIS), which allow users at other nodes in the network to retrieve data. These developments have encouraged researchers to retrieve and combine data sets in ways not previously attempted.
Scientists can now perform environmental research that increases our understanding of the Earth system at all spatial scales, enhances resource management and environmental decision making, and improves our capabilities for predicting significant changes in the environment. Over the past decade, in particular, these observational, computational, and communications technologies have enabled the scientific community to undertake
a broad range of interdisciplinary environmental research and assessment programs. At the international level, two of the most ambitious programs are the International Geosphere-Biosphere Program (IGBP) of the International Council of Scientific Unions (ICSU), and the World Climate Research Program, jointly sponsored by the World Meteorological Organization and ICSU (NRC, 1990). At the national level, these international research initiatives are supported through the federal inter-agency Global Change Research Program (NSTC, 1994).
Global change research, by its nature and scope, is inherently complex. On the technical side, complexity increases with the number of different variables that are modeled, measured, or experimentally manipulated. These variables may interact with each other to a high degree, and these interactions include nonlinearities or discontinuities in space or time. In particular, a certain degree of complexity in global change research ensues from the sheer quantity of data at large spatial and temporal scales. Likewise, analogous degrees of complexity might originate on the organizational side of research in how the work is structured, managed, and implemented due to the sizable number of investigators and participants across a range of disciplines.
The Global Change Research Program and other large research initiatives involve the interfacing of large volumes of diverse data, commonly combining several traditionally distinct disciplines, such as meteorology, oceanography, geology, biology, chemistry, and geography, or their related subdisciplines. ''Data interfacing" may be defined as the coordination, combination, or integration of data for the purpose of modeling, correlation, pattern analysis, hypothesis testing, and field investigations at various scales. Because data from each discipline and subdiscipline are organized into data sets and databases that frequently possess unique or special attributes, their effective interfacing can be difficult.
Sound practices in database management are required to deal effectively with problems of complexity in global change studies and other large interdisciplinary research and assessment projects. Although a great deal of attention and resources has been devoted to this type of research in recent years, little guidance has been provided on overcoming the barriers frequently encountered in the interfacing of disparate data sets. And although there is a wealth of relevant experience at the working level in the research community, this experience generally has not been analyzed and organized to make it more readily available to researchers.
Because of the increasing importance of conducting interdisciplinary environmental research and assessments, both nationally and internationally, the Committee for a Pilot Study on Database Interfaces was charged to review and advise on data interfacing activities in that context. This report is the result of that study. It does not address in detail the
mathematical and statistical aspects associated with data interfacing activities, which were the topic of a recent NRC report (see NRC, 1992). Nor does it address the issue of technical barriers in the electronic storage and distribution of interdisciplinary environmental data. Rather, the focus is on developing analytical and functional guidelines to help researchers and technicians engaged in interdisciplinary research—particularly those projects that involve both geophysical and ecological issues—to better plan and implement their supporting data management activities. It also is aimed at informing those individuals responsible for funding, managing, or evaluating such studies and activities.
METHODOLOGY OF THE STUDY
Early in its deliberations the committee decided to take a case study approach. The objective was to obtain some well-documented examples of successful and unsuccessful data interfacing techniques and to learn from the triumphs and failures of those who had actually conducted complex interdisciplinary environmental studies. One drawback to this approach was that there still are not many completed studies that have focused on global environmental change. Another limitation was that none of the case studies used the Internet as a key technological component. Nevertheless, the committee believes that the rapidly increasing use of the Internet in many research projects will accentuate the data management issues examined in this report.
Because the committee wanted to select the maximum number of diverse case studies feasible to examine in the time available, it identified 11 potential cases for consideration. It was particularly interested in complex, interdisciplinary studies in which a combination of physical, chemical, and biological measurements were taken and then integrated into composite data sets from which various conclusions could be drawn. The focus was on evaluating the data management activities in each case study rather than the research itself.
The committee used a modified delphi technique—a method of quickly quantifying and weighing diverse opinions—to rank the candidate case studies. The following criteria were used in deciding which case studies to select:
If possible, the studies should involve global change research or assessment.
The research should be interdisciplinary.
There should be reasonable access to the results of the studies and to their designers.
The studies should involve some attempt at integrating multimedia data sets from both the geophysical and the biological sciences.
The studies should be spread over a variety of different scales of activities and operations, from international to local, and from large-scale to small-scale.
The studies should cover a wide range of environmental issues.
The studies should either involve completed research projects or projects that have been under way long enough to have developed and used complex data management systems.
On the basis of these criteria the committee selected the following six case studies for detailed investigation:
Impact Assessment Project for Drought Early Warning in the Sahel. This project, conducted from 1979 to 1986, was designed to detect and monitor drought and to use modeling to assess the crop conditions and yield potential in the countries within Africa's Sahel and Horn regions. It was led by the U.S. National Oceanic and Atmospheric Administration (NOAA) in support of a U.S. Agency for International Development (USAID) initiative. The study area extended over millions of hectares of largely inaccessible arid and semiarid land. Some of the many types of data that were interfaced in this project included remote sensing data from several different spacecraft, ground-based point data of varying reliability, a vegetation index proxy for crop growth and yield, interpolated soil properties, and integration of the data by use of a model. Of the six cases that the committee examined, this was the only international study. It involved coordinating data from 10 developing countries, which generally lacked the technology and resources to adequately support the NOAA/USAID efforts.
The National Acid Precipitation Assessment Program (NAPAP). This comprehensive research and assessment program was established by federal law in 1980 to, among other purposes, "evaluate the environmental, social, and economic effects of acid precipitation." It involved the cooperation of many different federal agencies and federally funded laboratories. The committee focused its review on the data management and interfacing activities related to the Aquatic Processes and Effects portion of the total program. The types of data collected included water chemistry and biology, wet deposition of acidic air pollution compounds, meteorology, hydrology, and episodic response of water bodies to acid deposition. These data were used to generate predictive models.
The H.J. Andrews Experimental Forest Long-Term Ecological Research Site. Funded by the National Science Foundation, the Long-Term Ecological Research (LTER) Program consists of 18 independent sites in
the United States, of which the H.J. Andrews Experimental Forest in Oregon is one. The LTER Program studies are designed to carry out long-term research on diverse natural ecosystems. Their objectives are to study the following major features: pattern and control of primary production; spatial and temporal distribution of populations selected to represent trophic structure; pattern and control of organic matter accumulation in surface layers and sediments; pattern of inorganic inputs and movements of nutrients through soils, groundwater, and surface waters; and pattern and frequency of disturbance to the site. Research at the H.J. Andrews Experimental Forest, which was designated as a LTER site in 1980, has focused on several areas, including the disturbance regime, vegetation succession, long-term site productivity, and decomposition processes.
The Carbon Dioxide Information Analysis Center (CDIAC). This center, located at the Oak Ridge National Laboratory (ORNL) and funded by the Department of Energy, provides high-quality data sets to the climate change research community. Its data management program exemplifies the kinds of data gathering, cleanup, documentation, and dissemination activities that are a necessary part of many data interfacing exercises. The types of data available include worldwide energy production statistics, population estimates, biological carbon dioxide sources and sinks, measured concentration of carbon dioxide, extensive related metadata, and numerous models.
The First ISLSCP (International Satellite Land Surface Climatology Project) Field Experiment (FIFE). The long-term goal of ISLSCP is to improve our understanding of satellite measurements relating particularly to the fluxes of momentum, heat, water vapor, and carbon dioxide from land surfaces. The research goals of FIFE, which was conducted by NASA and several other agencies at a 3,400-hectare site in Kansas in 1987 and 1989, were to determine whether our understanding of biological processes on small geographic scales can be integrated over much larger scales to describe interactions appropriate for climate models, and to determine whether selected biological processes or associated states can be quantified over appropriate scales for climate models. The operational goals of FIFE included the simultaneous acquisition of satellite, atmospheric, and surface data; multiscale observations of biophysical parameters and processes controlling energy and mass exchange at the surface to determine how these are manifested in satellite radiometric data; and provision of integrated analyses through a central data system.
The California Cooperative Oceanic Fisheries Investigation (CalCOFI) Program. This is an example of a long-term, broad-scale interdisciplinary research and monitoring program, the major goal of which has been to describe and understand the relationship between biological patterns and physical oceanographic/climate processes. The CalCOFI
program has been under way since 1948 and is supported by the National Marine Fisheries Service, the California Department of Fish and Game, and the Scripps Institution of Oceanography. It exemplifies several scientific and organizational features that are important to the success of interfacing diverse data in interdisciplinary research.
The committee formed a separate subcommittee to evaluate each case study and established a list of evaluation criteria to help guide the fact finding. These criteria are presented in Appendix A. The subcommittees obtained briefings from the researchers and data managers, reviewed key background documents, and made site visits in all but the FIFE and Sub-Saharan Africa case studies. The subcommittees then reported back to the full committee both orally and in writing. The resulting case study reports were the product of the entire committee's deliberations.
ORGANIZATION OF THE REPORT
Chapters 2 through 7 present the results of the six case studies described above and an analysis of the major data interfacing issues that were identified through the case studies and related research. These chapters are all similarly structured, with sections containing the relevant background, the variables measured and the sources of data, the major data management and interfacing elements and issues, and the lessons learned. Some of the more complex case studies, notably the Impact Assessment Project for Drought Early Warning in the Sahel and the National Acid Precipitation Assessment Program, have a number of additional sections.
The final chapter provides a thorough overview of the issues and requirements related to the interfacing of diverse environmental data. It identifies the problem and its context, describes the barriers to effective data interfacing, and presents Ten Keys to Success for data interfacing activities. Supporting examples from the case studies are provided throughout.
National Research Council (NRC). 1990. Research Strategies for the U.S. Global Change Research Program. National Academy Press, Washington, D.C.
National Research Council (NRC). 1992.Combining Information: Statistical Issues and Opportunities for Research. National Academy Press, Washington, D.C.
National Science and Technology Council (NSTC). 1994. Our Changing Planet: The FY 1995 U.S. Global Change Research Program. Government Printing Office, Washington, D.C.