Needs for the Future for Model-Assimilated Data Sets
Current research problems that require the use of model-assimilated data sets in the future are summarize in this chapter. The concise summary includes a discussion of the temporal and spatial scales encountered and the implied resolution of the data fields that need to be produce and archived.
A thorough understanding of the structure and dynamics of synoptic weather prediction (NWP). While the study of baroclinic instability and wave cyclones has been a central theme of dynamic meteorology for over 40 years, the generation, maintenance, and decay of these synoptic weather system, as well are still far from completely understood, and new field programs for their study are being organized. The horizontal an vertical structure of the synoptic system, as well as the relative importance of dynamic and thermodynamic mechanisms in their life cycle, are topics of current active research. The preferential location of storm tracks and their variability connects cyclonic scale to spatially and temporally longer scales, which are discussed in the next two sections.
The characteristic length scale of these phenomena is L = 103 km, although frontal structures with cross-front L = 102 km play an important role.
during periods of maximum amplification. Their characteristic time scale is T = 3 to 10 days, with the first 2 to 3 days constituting the growth, or amplification, stage.
Planetary waves have a crucial role for medium-and extended-range forecasting (MRF, ERF). In midlatitudes, Rossby and inertia-gravity (Poincaré) waves play a major role, while the tropics give rise to Kelvin and mixed Rossby-gravity (Yanai) waves. Aside from the troposphere, these waves also appear, with various degrees of importance, in the middle atmosphere and oceans. Their horizontal and vertical propagation and their interaction with the mean flow are major topics of interest.
Spatial scales of planetary waves in the atmosphere are L = 103 to 104 km, and temporal scales are T = 3 to 30 days, except for Poincaré waves, which are characterized by T = 1 to 12 hr. In the ocean the corresponding spatial scales are shorter and the temporal scales are longer.
The predictive counterpart of the low-frequency variability (LFV) is long-range forecasting (LRF). Important research issues concerning LFV include barotropization, that is, the increasingly barotropic character of atmospheric motion as the characteristic period increases, the existence and explanation of multiple flow regimes for the same external conditions, and the emergence and life cycle of localized coherent eddy structures (Ghil and Childress, 1987). Various intraseasonal oscillations, with dominant periods of 16, 25, and 40 to 50 days, have been observed in the tropical atmosphere as well as in the northern and southern hemisphere extratropics. These oscillations need to be explained and related to tropical-extratropical interactions and to energy and momentum transfer by wave dispersion. The length scale of phenomena in this category is L = 103 to 104 km, and the corresponding time scale is T = 30 to 300 days.
A brief list of important research problems in this category includes the tropical 40-to 50-day propagating wave disturbances and their role in El Niño-Southern Oscillation (ENSO) events, the genesis and life cycle of sea surface temperature (SST) anomalies, explosive marine cyclogenesis, and the dependence of planktonic biota on upwelling and hence wind stress. These phenomena span the time scale of the previous three categories, with L = 102 to 104 km and T = 3 to 300 days.
Some relevant problems in this area are persistent droughts and severe winters. Changes in surface heat and momentum fluxes with these events are associated with increased snow and ice cover—in the latter case leading to enhanced albedo and in the former to increased evapotranspiration from land vegetation. Typical spatial and temporal scales of interest here are L = 100 to 1000 km and T = 10 to 300 days.
Rather than occurring at a two-dimensional interface, as for the two preceding types of interaction, dynamics-chemistry-radiation interactions occur throughout the depth of the lower and middle atmospheres. The vertical scale can vary from hundreds of meters to tens of kilometers, while the range of horizontal and temporal scales covers the entire spectrum discussed so far.
The development of comprehensive and consistent model-assimilated data sets for the global hydrological cycle requires a coupled treatment of the various subsystems. This requires a capability for assimilating the current highly nonhomogeneous hydrological information as well as the remotely sensed information planned for the future. Such a system seems to be best conceptualized in terms of a real-time operational component and a delayed research-quality climatic database component.
The future space-based system for precipitation estimation will rely on a variety of indirect measurements (e.g., visible and infrared passive microwave radiances), along with direct estimates from precipitation radar (Tropical Rainfall Measuring Mission Science Steering Group, 1988). Integration of this information with information from in situ measurements and model output to obtain optimum global precipitation fields win be a difficult task.
The advantages of a comprehensive analysis system for the hydrological cycle are well illustrated by considering the common exchange quantity of evaporation minus precipitation (E-P), that is, the net transfer of water substance from the earth's surface to the atmosphere. Global atmospheric forecast models, through the data assimilation process, should ultimately provide accurate estimates of temporally and a really averaged vertically integrated vapor flux divergence E-P over most areas of the world. Accurate evaluation of the flux divergence is most difficult over areas of sparse data and regions of significant relief, where smoothed model terrain may bias estimates. These estimates can then be compared with the difference
between parameterized evapotranspiration and ''measured" precipitation, and any inconsistencies can be resolved. The reconciled data set can then be viewed in the context of the evolution of the ocean thermal fields to resolve apparent inconsistencies in these two subsystem analyses. Finally, the atmospheric estimate of E-P over basins where streamflow is also measured can be used to derive the month-to-month changes in surface and subsurface moisture storage, quantities that at present are only roughly known and are of extreme importance in the modeling of climate change (Rasmusson, 1968). Variations in E-P over ocean areas are important in generating systematic and white noise components of forcing on the ocean circulation. Thus, we need to know the synoptic distribution of E-P over the ocean as well as overland.
DATASETS NEEDS BY TIME SCALES
The partial differential equations governing atmospheric and oceanic motions are nonlinear. Therefore, different temporal and spatial scales interact with each other, and no particular frequency or wavenumber band can be completely understood without consideration of the adjacent bands or even without the help of much shorter or much longer scales. Still, for convenience, the discussion in this section is subdivided into frequency bands.
0 to 3 Days
The characteristics of the meteorological data systems of the 1990s will require the use of analysis procedures that are more complex than those currently in use. For example, the continuous nature of some of the data sources will require a higher-resolution analysis system. Also, the varied error characteristics of the data will need to be objectively accounted for. Regardless of spatial or temporal scales, the use of models in data assimilation is motivated by the need to impose dynamic consistency on the data sets, to interpolate the data in some optimal way to a grid system, and to provide reasonable estimates of meteorological structures where there are large spatial or temporal voids in the data. An additional motivation for using models for mesoscale data assimilation is that the resolution of conventional operational data has often been insufficient to define the detailed characteristics of mesoscale meteorological processes. Use of fine-grid models, with appropriate mesoscale physics, is necessary in order to generate a data set for subsequent analysis. In this case, the model is used to simulate mesoscale data as well as to integrate observed data. While this particular contrast with the objectives of large-scale data assimilation is not likely to be completely eliminated in the future, new data acquisition systems such as the WSR-88D Doppler radar and the wind profilers will allow
much better routine identification of mesoscale structures than is now possible. However, since many variables will still be poorly defined, research-oriented data assimilation systems that are tailored to specific scales, geographic regions, or physical processes will continue to be required.
3 to 10 Days
Although objective analysis and numerical prediction originated from the demand for synoptic weather forecasts of 1 and 2 days, the scope of forecasts has gradually extended to the medium-range time scale of 3 to 10 days. Simultaneously, analysis techniques have recognized the importance of using numerically predicted fields as a background field for analysis, and of requiring dynamical consistency among variables, resulting in the continuing development of data assimilation methodology. The models used as assimilators have mostly been global general circulation models (GCMs), as opposed to limited-domain models for short-range forecasts. Assimilation techniques for medium-range forecasts have now reached a fairly mature stage of development, but needs for further development in the future can readily be identified.
These needs may be grouped into two categories. One is the correction of current known deficiencies, and the other is general improvement of data assimilation systems for the purpose of achieving further advancement in medium-range forecasting, particularly beyond about 5 to 6 days (Hollingsworth, 1987).
The deficiencies vary depending on the assimilation system. One of the outstanding deficiencies often mentioned with respect to the medium-range forecasts of operational centers is the "spin-up" problem. There are two aspects to this problem: one is of a pathological nature resulting from lack of adequate model initialization, and the other is due to discrepancies, often noted as "systematic errors," between a model solution and observations.
Development of the nonlinear normal-mode initialization technique has substantially improved initialization; as a result, pathological problems in the extratropics have been dramatically reduced. However, tropical initialization still is imperfect in terms of including diabatic heating correctly and therefore is imperfect in handling condensation and evaporation rates. For example, it takes 2 to 3 days after the beginning of a weather forecast for the rates of precipitation and evaporation to reach equilibrium. Some scientists believe that this drawback can be eliminated only by both better parameterization and "genuine" four-dimensional analysis that employs the adjoint method or the Kalman filter (see Chapter 2). On the other hand, other scientists consider that appropriate subgrid-scale parameterization by itself can alleviate this problem. Others consider that continuous data as-
similation, including the adjoint method and the Kalman filter, can solve the spin-up deficiency.
Rectification of systematic errors in the assimilating model will also reduce errors in the model-assimilated data sets, particularly for data-void areas. However, reduction of a GCM's systematic errors may require a long, careful effort. Systematic errors, such as a global cooling tendency, a meridional shift of westerly jets, and an excessive intensity of easterlies at the tropical tropopause level, have long been noted, but clear-cut remedies have not yet emerged. In fact, such biases in models may be most efficiently corrected by interactive use of data assimilation and model predictions, as discussed in Chapter 1.
The needs in the second category, general improvement of data assimilation systems, include better assimilation algorithms, more comprehensive observational networks, more reliable observational platforms, and improved quality control of observational data. While some observational data problems are instrumental, the discussion here is concerned with problems in which improvement can be achieved using observed data interactively in an assimilation system. Such tasks include (1) utilization of outgoing long-wave radiation (OLR) data; (2) better use of improved satellite temperature retrievals, cloud winds, and precipitable water estimates; (3) use of future scatterometer data; and (4) improvement of equatorial surface wind analyses.
Use of OLR data has been suggested for producing better initial tropical wind divergence fields (Julian, 1984; Krishnamurti and Low-Nam, 1986; Kasahara et al., 1988). The next step is to investigate whether such usage improves model rainfall forecasts and whether enhanced or rectified cumulus convection can be maintained by data assimilation without need for a spin-up by the model.
Satellite data have been used for temperature retrievals since 1968, for cloud winds since 1970, and for moisture since 1983. The benefits of these data for improving forecasts have been well demonstrated in several numerical experiments (Uppala et al., 1985; Kalnay et al., 1985; Illari, 1989). Some of these retrieved data have been used routinely in operational data assimilation, but questionable data are still occasionally reported. Raw radiance data, rather than retrieved temperatures, have been proposed for direct assimilation since radiances are what satellites actually measure. Cloud wind data continue to be plagued by a nagging problem of proper height determination. Improvement in locating cloud levels is highly desired.
Scatterometer data are expected to be available in the near future, and the inclusion of such data in data assimilation systems may contribute significantly to better representation of surface wind stress. Based on an assimilation experiment using a limited data set from the Seasat-A scatterometer, Atlas et al. (1987) recommend that the directional accuracy of surface winds from future scatterometers be substantially improved.
The quality of equatorial wind analyses has been studied intensively by comparing analyses from various operational centers with the 1997 surface wind data of moored buoys (Reynolds et al., 1989). The results are disappointing, somewhat surprisingly, since the operational wind analyses were thought to be acceptable. The quality of these data is important for modeling long-term events like El Niño as well as for day-to-day tropical variations.
10 to 100 Days
It appears appropriate to subdivide this time range into two ranges (i.e., 10 to 30 days and 30 to 100 days) from the standpoint of atmospheric as well as oceanic forecasts. The forecasts in the first range, 10 to 30 days, are often referred to as 30-day or 1-month forecasts; those in the second range, 30 to 100 days, are referred to as seasonal forecasts. The former has been an object of intensive studies (e.g., dynamic extended-range forecasts [DERF]) at a number of operational centers around the world (National Research Council, 1991). For forecasts in this category, reliable data assimilation of large-scale atmospheric circulation features is required to produce appropriate initial conditions. In addition, accurate observed sea surface temperatures must be specified at the initial times.
A fundamental consideration is that 30-day forecasts are not deterministic because the limit of predictability has been exceeded and that only probabilistic forecasts are possible. This implies that some forecasts among members of ensemble forecasts are good, while other members are not good, even in the case of a perfect forecasting model. A crucial problem is how to generate ensemble forecasts based on multiple initial conditions. The model-assimilated data sets are quite relevant to this issue. For example, Hoffman and Kalnay (1983) proposed an approach referred to as lagged average forecasting. Another well-known approach is to generate various initial conditions by adding random numbers to a basic initial condition; this is called the Monte Carlo method (Leith, 1974). Current ideas under investigation include finding the fastest-growing mode of the symmetric eigenvalue problem.
The forecasts in the second category (30 to 100 days) require coupled air-sea models for prediction, and the initial conditions should be produced by data assimilation systems based on the coupled model. The current state of ocean data assimilation is discussed in Chapter 3. There have been several studies on global ocean data assimilation, but no attempt at coupled data assimilation has yet been conducted. Two essential elements in coupled assimilation are that the ocean heat content should be properly included and the surface winds accurately represented. These are very important because El Niño forecasts critically depend on the distributions of these variables in the initial condition.
The data required for this purpose are, in addition to the conventional atmospheric data, the surface winds, the surface ocean temperature, and the subsurface temperature. Furthermore, it is envisaged that altimeter data and scatterometer data from satellites could considerably enhance more conventional observational systems. The Tropical Ocean and Global Atmosphere (TOGA) program is testing the utility of drifting buoys, which provide surface ocean current data. TOGA also is experimenting with moored buoys, which provide ocean temperature, salinity, and subsurface current observations. These data are potentially valuable and important.
100 to 1000 Days
On seasonal-to-interannual scales, the value of atmospheric model-assimilated data sets has already been demonstrated (see Wallace, 1987, for a review and a rather complete set of references). Wallace and his collaborators have used 15 years of 12-hourly National Meteorological Center (NMC) model-assimilated analyses to study low-frequency variability in the northern hemisphere atmosphere poleward of 20°N. Resulting discoveries include Pacific-North America (PNA), North Atlantic, and North Pacific oscillations (Wallace and Gutzler, 1981), as well as the correlation between the PNA oscillation and the Southern Oscillation index (Horel and Wallace, 1981). Numerous other studies of variability on slow time scales have been performed using the NMC model-assimilated data set—studies that could not have been done using the rawinsonde station data directly.
While current operational analyses of atmospheric data have flaws, omissions, and (largely undocumented) discontinuities in data assimilation procedures, it is recognized that data assimilation techniques have improved markedly, especially in the tropics, and that these model-assimilated data sets have proven extraordinarily valuable in examining the low-frequency variability of the global atmosphere. It has been suggested that model-assimilated data sets be periodically reanalyzed by NMC and other operational or research centers to include new or omitted observations as improved data assimilation techniques and more accurate assimilation models are developed. The advantage of such reanalysis would be to provide the highest-quality global atmospheric data sets possible with which to perform additional studies of low-frequency variability.
In the ocean the primary need for model-assimilated data sets is to initialize the ocean for coupled atmosphere-ocean model predictions and for integration of the results of oceanographic field programs. In order to make predictions on time scales of months to years, the evolution of sea surface temperature (SST) must also be included, and the only way to do this is to make forecasts using coupled atmosphere-ocean models. The work of Cane et al. (1986), using a simplified SST anomaly coupled model, has indicated
that the onset of the ENSO phenomenon in the Pacific is potentially predictable as much as a year in advance and that the major source of forecast error is specification of the initial state of the ocean. The model currently obtains an initial ocean state for the forecast by forcing the ocean to equilibrium with the observed winds and allowing the model to freely evolve afterwards. No ocean data are used to initialize the model; therefore, errors in the observed winds can lead to errors in the initial specification of the ocean state. A more comprehensive routine effort at ENSO prediction using coupled GCMs would require the assimilation of ocean data and would produce ocean model-assimilated data sets as a by-product.
Note that once the ocean and atmosphere are coupled completely, correct simulation of the annual cycle is not guaranteed. Accurate simulation of the annual cycle thus becomes a crucial test of coupled models.
With the exception of the work in the Atlantic and Pacific cited previously, production of oceanographic model-assimilated data sets is almost nonexistent. Oceanographic field programs, such as TOGA, the Seasonal Equatorial Atlantic Experiment/Français Ocean et Climat dans l'Atlantique Equatorial (SEQUAL/FOCAL), and Tropic Heat, that extend over many years have been performed in the tropical oceans; similar programs, such as the World Ocean Circulation Experiment (WOCE) and the Joint Global Ocean Flux Study (JGOFS), are being started for the extratropics. Because the observed data in these programs were of different forms, what has emerged is a collection of different data streams (e.g., drifter, current meter, expendable bathythermograph (XBT), satellite SST, satellite altimeter), all describing aspects of the ocean but not dynamically connected and therefore not mutually reinforcing. Assimilating all these data into an ocean GCM would (1) provide a general description of the interannual variability of the ocean; (2) provide a data set for the initialization of coupled ocean-atmosphere predictions; and (3) provide consistent dynamical quantities, such as vorticity and vertical velocity, that cannot be measured directly but that can be computed in the process of assimilating the data.
At time scales beyond a few years, small effects acting over long periods of time can have major climatic impacts. Simulating and predicting large-scale climate change on these time scales involves many physical, biological, and chemical aspects of the atmosphere, ocean, cryosphere, and land system. In order to simulate and predict climate changes on long time scales, a model with all these elements and their interactions needs to be constructed. While much effort has gone into part of this problem, a full-blown model has not yet been developed.
The instrumental observational record is limited to little more than about
a hundred years at the surface and less than 40 years at upper levels of the atmosphere. Within the ocean the record is uneven and sparse; in some regions of the ocean, deep measurements have never been taken.
The problem on long time scales is therefore twofold. There is a data problem of combining individual records into global fields of data over long time intervals and a modeling problem of devising accurate and comprehensive enough models to be able to simulate and predict the global climate over decades and longer.
In the atmosphere an upper-air network has been used as the starting point for routine global weather forecasts since the mid-1950s. The analyses of upper-air data constitute a long record of global atmospheric model-assimilated data sets but one that is inadequate for climate purposes because it is so spatially and temporally inconsistent. Several changes in the data collection system and the analysis techniques mask long-term climate variability and trends that exist in the data. As pointed out previously, the existing record of weather analyses is good enough to show shorter-term variability but is totally inadequate for changes on scales of a few years to decades.
A remedy, simple in concept but difficult and expensive to execute, is to analyze the existing 40 or so years of original data in a uniform manner using the best available assimilation models. This would produce the best possible long-term model-assimilated data set of the global atmosphere, one that would then be useful for analysis of long-term climate fluctuations. Since, in the future, this type of data set would be the most useful long-term global atmospheric data set in existence, systematic and consistent additions should continue the archive into the future in order that reanalysis of the data set can be accomplished later with advanced assimilation models. In any reanalysis, data that were not available at the time of the previous analysis could be profitably included.
The ocean situation is quite different in that data do not exist to define the evolution of the ocean system over periods of years and beyond. It is important, however, to have the mean state defined, since many of the changes that the ocean will undergo can be calculated by moving heat, momentum, and constituents around by the mean circulation of the ocean. To the extent that the ocean is in steady equilibrium (i.e., the eddies are statistically stationary and the annual cycle is regular), the best way to define this state would be to assimilate the existing data into a seasonally varying model of the global ocean.
Making predictions on time scales of a few years to beyond a hundred years requires careful initialization of the ocean state. There is evidence that the ocean SST affects climate on decadal time scales (e.g., Palmer, 1986) and that part of this SST variability is related to long-term changes in the deeper parts of the northern oceans (e.g., Lazier, 1988; Levitus, 1989).
Unless the initial state can be specified, the future evolution of the ocean cannot be predicted. The time range over which specification of the initial ocean state affects details of the final climatic state is undetermined at present. On time scales of thousands of years, the coupled climate state presumably loses all memory of the initial ocean state. In order to determine the influence of the initial ocean state, climate runs of coupled global models starting from slightly different initial conditions in the ocean must be done. Once the range of deterministic predictability has been estimated, the requirements for initializing predictions may be better defined.
REQUIREMENTS FOR DATA AND COVERAGE
To study a phenomenon of given length scale L and temporal scale T, the required resolution is roughly L/10 and T/10. The horizontal extent of data coverage for all the phenomena discussed in the preceding section should be global. For some of the phenomena, regional, hemispheric, or tropical-belt coverage might be satisfactory for some applications, but global coverage is needed for others. Regional data sets with higher resolution can be very useful complements to global sets that satisfy minimal resolution requirements. Such high-resolution data sets should cover at least two separate regions for comparison purposes. Horizontal extent of regional data sets should generally be at least 10L.
The vertical coverage should be from the land and ocean surface to at least the mesopause. For some of the phenomena discussed, the coverage should extend downward well below the ocean surface and somewhat below the land surface. The temporal coverage should extend over 10T to 100T in order to achieve statistical significance and permit analysis of slow changes in the phenomena of interest.
Horizontal and vertical resolution, as well as temporal resolution, will vary among global-scale, regional-scale, mesoscale, and local-scale data sets. In general, it appears at this time that the following general specifications represent an achievable goal.
Most data sets assimilated and archived should have global coverage, with a horizontal resolution of 100 x 100 km or better. The vertical resolution should be 30 levels judiciously distributed, that is, with greater resolution in the subsurface, the planetary boundary layer, and the lower stratosphere.
Selectively increased horizontal and vertical resolution in certain regions
should be employed in the coming decade as mesoscale data become routinely available from the modernization program of the National Weather Service (NWS). Increased resolution is also necessary for the purposes of designing or verifying the results of field programs. A recommended resolution for such selected regions is 10 x 10 x 50 km levels.
A data set of barely satisfactory length for the study of low-frequency variability and atmosphere-ocean interactions, atmosphere-land interactions, and dynamics-chemistry-radiation interactions is 30 to 40 years. Such model-assimilated data sets need to be as temporally consistent as possible.