**Suggested Citation:**"2 Data Assimilation Development." National Research Council. 1991.

*Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences*. Washington, DC: The National Academies Press. doi: 10.17226/1830.

**2**

**Data Assimilation Development**

**INTRODUCTION**

The purpose of model assimilation of data from various observing systems, with an irregular distribution in space and time, and with varying and incompletely known error properties, is to produce a four-dimensional depiction of the atmosphere and oceans, with regular distribution of field variables with known error properties in space and time. For such a ''movie" depiction, the numerical model used for assimilation must demonstrate a physical consistency with the geophysical system being described in that the associated errors should be as small and well known as possible. The degree to which this goal is achieved depends, for a given set of observing systems, on the data assimilation methodology used.

The existing methodology for atmospheric data and models is briefly reviewed in the next section. It gives reasonably satisfactory results for the mass field and the horizontal winds but leaves much to be desired as to the vertical velocities and the humidity field. In oceanography, data assimilation is now being explored aggressively, but no satisfactory global model-assimilated data set (MADS) of the mass and horizontal velocity fields has emerged as yet (Haidvogel and Robinson, 1989).

Scientific requirements are formulated in this report for improved atmospheric data assimilation; for oceanographic model-assimilated data sets of a quality at least comparable to present atmospheric model-assimilated data sets; and for complementary model-assimilated data sets on surface hydrol-

**Suggested Citation:**"2 Data Assimilation Development." National Research Council. 1991.

*Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences*. Washington, DC: The National Academies Press. doi: 10.17226/1830.

ogy, cryospheric components of the climate system, and the relevant chemical and biogeochemical components. These scientific requirements clearly suggest the need for continued development of improved and highly flexible data assimilation methods.

**PRINCIPLES AND METHODS**

In many current operational systems the data are grouped into convenient (6-hour) groups, and a data assimilation process is carried out, using three modules: analysis, initialization, and forecast. The forecast model is usually a state-of-the-art numerical weather prediction (NWP) model. The forecast field from the earlier data provides the background field (or first guess) for the analysis of the new data. A statistical approach is widely used in the analysis module in current operational systems and is known variously as optimum interpolation (OI) or statistical interpolation (Gandin, 1963; McPherson et al., 1979; Lorenc, 1981).

Operational implementation of the OI approach requires resolution of a number of practical issues. It is difficult to invert a matrix corresponding to a global data set; up to now, a series of local calculations that involve a number of compromises on data selection, continuity between adjacent analysis volumes, multivariate relationships, and so on has been done. Three-dimensional methods to eliminate these compromises are in an advanced state of development. Four-dimensional assimilation systems based on the Kalman filter and on variational analysis that are currently under development will relax limitations and exploit a combination of dynamics and statistics more fully.

An important aspect of the OI approach is that it is a multivariate algorithm that can exploit linear multivariate relations between different variables. The final analysis is a linear combination of the "structure functions" of the forecast error correlations. Thus, any linear constraint imposed on these structure functions is satisfied within its domain of validity by the analyzed fields. Current operational systems impose constraints on the analysis increments, which provide for approximate hydrostatic balance everywhere and approximate nondivergent balance in the extratropics. There is ample empirical justification for these constraints.

The OI algorithm can only use data that are linearly related to the model variables. Data that are nonlinearly related to the model variables (e.g., satellite radiances) must be transformed to variables such as temperature or humidity before use in OI by a retrieval process. The transformation process can introduce many errors. For this reason there is considerable interest in developing more general methods that can use data that are nonlinearly related to model variables.

Quality control of observational data is a critical consideration for accu-

**Suggested Citation:**"2 Data Assimilation Development." National Research Council. 1991.

*Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences*. Washington, DC: The National Academies Press. doi: 10.17226/1830.

rate analysis because of the unstable nature of atmospheric predictions contaminated by erroneous observations. Operational centers use modified OI routines to perform multivariate cross-checks of each datum against all other data in the vicinity as well as against the forecast field used as a background field.

The meteorological "primitive" equations describe traveling atmospheric disturbances with two inherent time scales: (1) a fast time scale (phase speed ~300 m/s) associated with gravity wave motions and (2) a slow meteorological time scale (phase speed ~50 m/s) associated with large-scale Rossby wave disturbances. An initialization module ensures a smooth start to a forecast by controlling the amplitude of rapidly propagating gravity wave "noise," which is introduced in the course of each analysis step. This control on the "noise" is critical for the quality control procedures employed at the next analysis time since it ensures that the next background field is reasonably accurate and noise free. In the future the fully compressible equations will be employed for data assimilation involving nonhydrostatic phenomena.

**CONTINUOUS DATA INSERTION**

In the past, the National Aeronautics and Space Administration (NASA) pioneered the first continuous data insertion method (Charney et al., 1969; Ghil et al., 1979). Two groups are currently engaged in the operation and development of a continuous data insertion scheme using a general circulation model (GCM)—the United Kingdom Meteorological Office (UKMO) and the Geophysical Fluid Dynamics Laboratory (GFDL) of the National Oceanic and Atmospheric Administration (NOAA). In the future it is envisaged that several different types of continuous methods may emerge.

In the continuous data assimilation of the U.K. system, the observed data are repeatedly inserted at each time step of the forward integration, using the relaxation technique for injection of data (Lorenc, 1976; Lyne et al., 1982). Observations are first interpolated vertically to the model's level, then horizontally to model gridpoints, and the increments of observed values are used to correct the model values. For the assimilation the Newtonian nudging technique is adopted (Davies and Turner, 1977; Hoke and Anthes, 1976).

The recent GFDL system (Stern et al., 1985; Stem and Ploshay, 1991) is an extension of one employed for the original First GARP Global Experiment (FGGE) analysis at the GFDL (Miyakoda et al., 1976), but it has been modified considerably. Model-assimilated data sets are produced by a GCM, which may be either a spectral or grid model. The observational data are grouped beforehand into 2-hour windows.

The values at the insertion points of model grids (Gaussian grid in the

**Suggested Citation:**"2 Data Assimilation Development." National Research Council. 1991.

*Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences*. Washington, DC: The National Academies Press. doi: 10.17226/1830.

case of a spectral GCM) are found by interpolating the data increments at observation points three-dimensionally with OI. Univariate, rather than multivariate, OI is used because the GCM is expected to provide internal consistency among variables in the continuous scheme.

The insertion data consist of the current solution of the assimilation model (unaffected by the initialization), plus slower gravity modes and all Rossby modes associated with OI-determined incremental changes to the background values. The GCM assimilates these data immediately and produces a continuous stream of model-assimilated data sets.

**RESEARCH AND DEVELOPMENT IN DATA ASSIMILATION: THE KALMAN FILTER AND ADJOINT METHODS**

At present, the two most promising areas of research and development in data assimilation methods are the extended, nonlinear *Kalman filter* and the *adjoint method*, each with a number of possible simplifications and variations. Research on the Kalman filter was pioneered in the United States (Kalman, 1960; Kalman and Bucy, 1961; Jarwinski, 1970; Gelb, 1974; Bierman, 1977; Bucy and Joseph, 1987). Meteorological applications (Ghil et al., 1981; Parrish and Cohn, 1985; Ghil, 1989) and oceanographic applications (Budgell, 1986; Miller, 1986; Bennett and Budgell, 1987; Miller and Cane, 1989) are being carried out at the Institute of Ocean Sciences of the University of British Columbia, New York University, Oregon State University, Scripps Institution of Oceanography, the University of California at Los Angeles, the University of Rhode Island, and a number of Soviet research institutes. Operational implementation is contemplated at NASA's Goddard Laboratory for Atmospheres.

The adjoint method was pioneered in the Soviet Union and France (Marchuk, 1974; Penenko and Obraztsov, 1976; Le Dimet and Talagrand, 1986). Metéeorological applications (Lewis and Derber, 1985; Talagrand and Courtier, 1987; Derber, 1989) and oceanographic applications (Bennett and McIntosh, 1982) are being pursued at the GFDL, the Laboratoire de Météorologie Dynamique in Paris, the Massachusetts Institute of Technology, NOAA's Miami laboratories, and the University of Oklahoma, among others. Operational implementation is being considered by the European Centre for Medium Range Weather Forecasts (ECMWF) and the National Meteorological Center (NMC).

In principle, both methods try to minimize the distance in phase space between a system trajectory, constrained by model dynamics, and the existing data over a given time interval (Ghil and Malanotte-Rizzoli, 1991). In the adjoint method the constraint is "strong," that is, the model is supposed to be nearly exact; in the Kalman filter approach model, errors are explicitly incorporated. This explicit modeling of errors, while desirable in principle,

**Suggested Citation:**"2 Data Assimilation Development." National Research Council. 1991.

*Four-Dimensional Model Assimilation of Data: A Strategy for the Earth System Sciences*. Washington, DC: The National Academies Press. doi: 10.17226/1830.

imposes an additional computational burden. On the other hand, the Kalman filter only needs to handle data at a given instant in time rather than over the entire time interval of interest, as does the adjoint method.

Both methods require linearization of the variational problem they attempt to solve. For the extended Kalman filter this linearization occurs at successive moments in time about a given state of the system. For the adjoint method it occurs in phase space about a given trajectory. Combinations of the two approaches are possible; one might use the Kalman filter only at large intervals (e.g., every week or month) in order to compute the covariance matrices of observational and model errors. The inverses of these matrices can then be used as relative weights for current observations and model background fields in the adjoint method. It should be noted that the adjoint method uses future as well as past data and is therefore well suited for delayed-mode (as opposed to real-time) applications. A Kalman ''smoother" also uses future as well as past data.

For operational applications of one or both of these techniques in the future, additional computing power will be needed at the major operational centers. This need is part of the nationally coordinated program presented in the president's fiscal year 1992 budget to Congress in a report by the Committee on Physical, Mathematical, and Engineering Sciences of the Federal Coordinating Council for Science, Engineering, and Technology of the Office of Science and Technology Policy: *Grand Challenges: High Performance Computing and Communications*. As the report's summary states: "The HPCC program is driven by the recognition that unprecedented computational power and capability [are] needed to investigate and understand a wide range of scientific and engineering `grand challenge' problems. These are problems whose solution is critical to national needs. Progress toward solution of these problems is essential to fulfilling many of the missions of the participating agencies. Examples of grand challenges addressed include: prediction of weather, climate and global change. . . ."