The complexity of the earth-ocean-atmosphere system and the nonlinear processes governing its dynamics pose formidable challenges to our scientific understanding and to our capabilities to extend the range of useful prediction. The governing dynamical equations for this nonlinear geophysical fluid system are evolutionary, rather than equilibrium, equations. For the foreseeable future, observations of this system will be irregularly distributed in space and time, will be obtained using a great variety of instruments and methods, and will continue to be inhomogeneous in accuracy. In addition, the observations are frequently too sparsely distributed to resolve adequately the complex spectrum of scales and processes involved. These limitations and the need to describe the state of the evolving geophysical system as accurately as possible have led to the development of geophysical model data assimilation as a rational method to infer the state of the system within which dynamical, physical, chemical, and biological processes can be coupled interactively.
With funding from several federal agencies, the National Research Council's Board on Atmospheric Sciences and Climate established the Panel on Model-Assimilated Data Sets for Atmospheric and Oceanic Research. This study is the report of the panel. The panel's charge was to prepare a concise report that defines a nationally focused effort to routinely generate model-assimilated, research-quality data sets for the needs of the 1990s, to briefly survey the current usefulness of such data sets as a starting point for appli-
cation to the full range of anticipated enhanced geophysical data streams, and to outline a basic strategy for a national archive system that in the future will effectively serve a broad range of research and operational programs.
CONCEPT OF GEOPHYSICAL MODEL DATA ASSIMILATION
The basis for geophysical model data assimilation may be described as the four-dimensional representation of a unified, dynamically evolving geophysical system by a mathematical model. This model has the capability to predict the dynamic changes occurring in the system, accept the insertion of new observational data distributed heterogeneously in time and space, and blend earlier information and current information objectively under rigorous quality control. This data assimilation technology leads to an estimate of the state of the system that is more complete and more accurate, and thus of higher value, than can be achieved from direct analysis of a single set of observations taken at a particular time. The model is used to mathematically extrapolate information in time in accordance with basic physical and chemical laws, moving from the "recent past" to the "present state" of the system. This predicted present model state constitutes the ''background" field against which a current set of new observations are subjected to quality control, are interpreted, and are synthesized with the extrapolations of all previous information. In other words, the background field, gridded or spectral, is adjusted using the assimilated information from the current set of observations, thus providing the "updated" model state. In this process the current observations are filtered and interpolated in space and time in a manner that is as consistent as possible with the dynamical model and the presumed true state of the system. Starting from the updated state, the model is then integrated into the future, to provide a forecast and, concurrently, a background field for the ingestion of the next set of observed data, and so the process continues, whereby the state of the system and its evolution are continually described.
The process of data assimilation provides the most complete and accurate synthesis of our theoretical and a priori knowledge with all available observational information, and the result is the best estimate of the state of the system and its evolution that can be obtained with all the available data. The resulting synthesis is called a model-assimilated data set (MADS), consisting of gridpoint values or spectral coefficients for the specified time.
For data assimilation to be successful, it is essential that the assimilating model have useful predictive skill in providing a mechanism for filtering and interpolating the observations. If the model has no predictive skill in the extrapolation of earlier information forward in time, the impact of the
earlier information cannot be expressed. The ability of a skillful model to transport information from data-rich areas into data-sparse areas is particularly valuable, although even a very accurate model cannot substitute completely for bad or incomplete data.
ASSESSMENT OF DATA ASSIMILATION IN ATMOSPHERIC SCIENCES
Successful prediction for a dynamical system depends critically on accurate knowledge of the state of a system. In this regard, data assimilation has proven to be a powerful method to infer the current state of the atmosphere. The approach has been exploited in operational meteorology with unusual success and has contributed to remarkable gains in forecast skill over the last decade (Figure 1).
Dynamically consistent fields of model-assimilated data have also proven
to be extremely valuable for scientific studies of the circulation within the earth-atmosphere system, in terms of both the system's structure and the processes that maintain the circulation. Assimilation models are unique in their ability to produce fields of complex nonlinear processes with a physical and dynamical consistency that is unattainable by any other method of analysis. As value-added sources of information, these model-assimilated data sets constitute a priceless resource for diagnostic and predictive studies of atmospheric and oceanic circulation ranging from mesoscales (as small as 2 km) through planetary scales (as large as 10,000 km).
DATA ASSIMILATION VIEWED AS PART OF A SYSTEMATIC LEARNING PROCESS
The data assimilation process is one of continual confrontation between theoretical and observational knowledge, as expressed by the current observations from nature and by the predicted model state, which is a function of the previous observations from nature. This confrontation presents a rich opportunity for a structured, iterative, and open-ended learning process about the behavior of atmospheric, oceanic, and other geophysical systems; the quality of the observations; the interpretation of observational evidence; and the accuracy of the assimilating model.
In the last decade, observational research has used model-assimilated data sets as a primary source material for the study of a wide range of atmospheric phenomena and processes. There are many reasons for this development. Model-assimilated data sets constitute an internally consistent time series of global three-dimensional fields whose accuracy can be estimated. Since the gridded fields are regular, the data are in a very convenient form for scientific analysis—that is, dynamical and physical processes involving differentiated and integrated quantities can be readily calculated. Because of the internal consistency, important but in many cases inadequately observed dynamical and physical processes (e.g., atmospheric mass divergence) can be estimated, and theories relating to these processes can be developed and tested. Model-assimilated data sets have also been validated in many studies through the use of independent observational data that were not utilized in the assimilation process. In such studies the validation frequently is a two-way process, where the independent data are used to test the quality of the model-assimilated data sets, and the data sets in turn prove useful for interpretation of the observations. Thus, model-assimilated data sets have contributed substantially to both our theoretical and diagnostic understanding of the atmosphere.
The knowledge gained from theoretical and diagnostic studies, noted above, is the first of four important aspects of the learning process associated with the development and application of data assimilation. The second
aspect, and one of the most striking results to date, has been the development of powerful methods to monitor data and to identify important systematic errors in the measurements of many different in situ and remote meteorological observing systems. For example, the ability to identify operationally even modest errors at the most isolated observing locations has furnished dramatic proof of the power of data assimilation to synthesize a predictive model's knowledge of atmospheric behavior with available data into a coherent and accurate picture of that behavior.
The third significant element in the learning process from data assimilation has been the increased effectiveness in the use of remotely sensed data. For example, experiences from the Global Weather Experiment (GWE) (also called the First GARP Global Experiment [FGGE]) revealed that data assimilation with research models benefited substantially from the evaluation and quality control of remotely sensed observational data streams by operational centers. Hence, for the full potential of remote sensing to be developed in the future, it is essential that data from new systems be provided in a timely fashion to operational centers capable of assimilating the data. For example, studies of the quality of wind and temperature retrievals from remotely sensed data, through comparisons with the background fields produced in data assimilation, have exposed many serious limitations in current retrieval procedures for remotely sensed data. Many of these difficulties arise because the measured quantity, usually a radiance, is an integrated nonlinear function of the variable being inferred, such as a vertical profile of atmospheric temperature. Many shortcomings in the current procedures can be removed through variational procedures that explicitly recognize the integrated and nonlinear nature of the remotely sensed data. These procedures reduce the difficulties arising from the integrated nature of the measurements by using information from the background field to compensate for the nonlinearity of the radiative transfer equation. This new approach to the use of remotely sensed data exemplifies how experience with data assimilation generates an impetus for more accurate methods of assimilated data interpretation.
A fourth and vital aspect of the learning process associated with data assimilation is the stimulus provided for the improvement of models used in assimilation and prediction. Such models are by no means perfect. Systematic studies of the errors of the background forecast in the assimilation process, and of longer-range forecasts, have been a powerful stimulus to the detection of defects in models. Missing processes have been identified, known processes have been specified more accurately, and specifications of poorly represented interactions between different processes have been advanced. As assimilation and interpretation of the available observations are improved and the range of observations used in the process is widened, the learning processes inherent in data assimilation become more and more
effective. The success of this learning process indicates that data assimilation is the best means known to gain a comprehensive and internally consistent synthesis of all available geophysical data.
CURRENT STATUS OF DATA ASSIMILATION
Currently, several major operational forecast centers use model-based data assimilation to produce four-dimensional (space and time) analyses of atmospheric circulation as a routine starting point for numerical weather prediction. Remarkable advances in both global data assimilation and numerical weather prediction models have been made in the past 15 years. A doubling of the period of time over which a given level of forecast accuracy is maintained has occurred through advances in data assimilation that were largely stimulated by the GWE research and development. Similar data assimilation procedures are applicable to oceanic data and are expected to be applied, for example, in the Tropical Ocean and Global Atmosphere (TOGA) program and the World Ocean Circulation Experiment (WOCE).
Despite the importance of model-based data assimilation for operational prediction, only limited resources have been specifically focused on the development of model-assimilated data sets for research, with the exception of those provided for the GWE and for a few isolated case studies. Currently, the assimilated data sets being made available for research are generated as a by-product of operational weather prediction. Not all operationally generated model-assimilated data sets are readily accessible in general by the larger scientific research community in the United States or abroad. The present practices for archiving and accessing observed and model-assimilated data are inadequate to meet the future needs of the scientific community, particularly for data sets that include consistently all available geophysical data.
APPLICABILITY OF DATA ASSIMILATION TO THE EARTH SYSTEM SCIENCES
The need is emerging for research-quality model-assimilated data sets across a broad spectrum of interactive endeavors known as the earth system sciences (Hollingsworth, 1989). Significant endeavors are beginning in climate and global change, long-range weather prediction, mesoscale weather, hydrology, atmospheric chemistry, physical and biogeochemical oceanography, and land surface processes. Crucial to our understanding of the earth-ocean-atmosphere system is knowledge of the present state and the continuing changes of fundamental properties and trace constituents. Thus, a need exists for a nationally focused effort during the coming decades to routinely provide research-quality assimilated data sets and to ensure their ready ac-
cessibility to the larger scientific community. Through a national focus, priorities for scientific exchange among the several communities engaged in data assimilation endeavors must be established, whereby the full potential from technological advances in global and regional observational systems and computational capabilities will be realized. There is an immediate need to assimilate all available atmospheric and oceanic data with a state-of-the-art assimilation model, so as to produce a best-to-date interpretation of the available data record, particularly for study of climate and global change. Such analysis of long records from the past will require the marshaling of financial and manpower resources at the national and even international levels.
The immediate goals of this report are to review the current status of data assimilation and the application of model-assimilated data sets for both operational prediction and scientific research. The panel's recommendations are aimed at ensuring the availability of assimilated data sets for broad national needs in the coming decades. As a starting point for action, the report emphasizes the need for an integrated national effort for the generation, archiving, and service-oriented publication of model-assimilated data sets that will serve a broad range of operational and research programs in atmospheric, oceanographic, and earth sciences. To ensure that the needs of the coming decades are met, the panel includes in this report a recommendation that an integrated national program be developed to provide the focus for and implementation of the full range of effort needed to meet these needs.