Skip to main content

Currently Skimming:

4 Model Evaluation
Pages 104-169

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 104...
... 1994) that complex computational models can never be truly validated, only "invalidated." The contemporary phrase for what one seeks to achieve in resolving model performance with observation is "evaluation" (Oreskes 1998)
From page 105...
... The terms "validation" and "assurance" prejudice expectations of the outcome of the procedure toward only the positive -- the model is valid or its quality is assured -- whereas evaluation is neutral in what might be expected of the outcome. Because awareness of environmental regulatory models has become so widespread in a more scientifically aware audience of stakeholders and the public, words used within the scientific enterprise can have meanings that are misleading in contexts outside the confines of the laboratory world.
From page 106...
... This is not merely a question of form, however. In this chapter, where the committee describes the process of model evaluation, it adopts the perspective, discussed in Chapter 1 of this report, that a model is a "tool" designed to fulfill a task -- providing scientific and technical support in the regulatory decision-making process -- not a "truth-generating machine" (Janssen and Rotmans 1995; Beck et al.
From page 107...
... The MOBILE model for estimating atmospheric vehicle emissions, the UAM (urban airshed model) air quality model, and the QUAL2 water quality models are examples of models that have had multiple versions and major scientific modifications and extensions in over two decades of their existence (Scheffe and Morris 1993; Barnwell et al.
From page 108...
... Responses to such questions will emerge and develop at various stages of model development and application, from the task description through the construction of the conceptual and computational models and eventually to the applications. The committee believes that answering these questions requires careful assessment of information obtained at each stage of a model's life cycle.
From page 109...
... It instead demands that a model capture all essential processes for the system under consideration -- but no more. It requires that models meet the difficult goal of being accurate representations of the system of interest while being reproducible, transparent, and useful for the regulatory decision at hand.
From page 110...
... 110 Models in Environmental Regulatory Decision Making BOX 4-1 Attributes That Foster Accuracy, Precision, Parsimony, and Transparency in Models Gets the Correct Result Model behavior closely approximates behavior of real system – High predictive power on a case-by-case basis High predictive power a statistical basis Model results insensitive to factors that should not affect them – Gets the Correct Result for the Right Reason Model accurately represents the real system – Comprehensive • Variables o Inputs, outputs o Exogenous, endogenous • Relationships o Functional o Cause-effect • Statistical Circumstances o Input changes o Assumption relaxation • Resolutions o Temporal o Spatial Model is based on good science – Accepted principles, theory, results • From peer reviewed sources • Prestige of developer or lab Up-to-date • Concepts and theory • Algorithms, computational methods • Empirical findings Appropriate data are available or feasible to acquire – Estimates for model parameters Data for model calibration Transparency Suits specific regulatory context or decisions – Address the specific concern Usable by • Decision makers • Implementers Understandable by • Decision makers • Stakeholders • Implementers
From page 111...
... These three goals can result in competing objectives in model development and application. For example, if the primary task was to use a model as a repository of knowledge, its design might place priority on getting sufficient detail to ensure that the result is correct for the correct reasons.
From page 112...
... The reasons for the increasing complexity are varied, but one regulatory modeler mentioned that it is not only modelers that strive to building a more complex model but also stakeholders who wish to ensure that their issue or concerns are represented in the model, even if addressing such concerns does not have an impact on model results (A. Gilliland, Model Evaluation and Applications Branch, Office of Research and Development, EPA, personal commun., May 19, 2006)
From page 113...
... MODEL EVALUATION AT THE PROBLEM IDENTIFICATION STAGE There are many reasons why regulatory activities can be supported by environmental modeling. At the problem identification stage, decision makers together with model developers and other analysts must consider the regulatory decision at hand, the type of input the decision needs, and whether and how modeling can contribute to the decision-making process.
From page 114...
... Benchmarking against other models – Comparison of model results with other similar models. Sensitivity and uncertainty analysis – Investigation of what parameters or processes are driving model results as well as the effects of lack of knowledge and other potential sources of error in the model.
From page 115...
... Finally, data must be assessed at this point to ensure the availability of data for model development, input parameters, and evaluation. The result of this process is the selection of a computational modeling approach that addresses problem identification, data availability, and transparency requirements.
From page 116...
... It is one of the major issues in the use of environmental models, and it has multiple aspects: • Data used as inputs to the model, including data used to develop the model. • Data used to estimate values of model parameters (internal variables)
From page 117...
... Modelers should be building on-going collaborations with experimentalists and those responsible for collection of additional data to determine how such new data can guide model development and how the resulting models can guide the collection of additional data. EVALUATION AT THE COMPUTATIONAL MODEL STAGE In moving from the identification of the problem, the assessment of required resolving power and data needs, and the decision concerning the basic qualitative modeling approach to a constructed computational model, a number of practical considerations arise.
From page 118...
... However, these formal model evaluation activities must be cognizant of and built on earlier evaluation activities during the problem identification and model conceptualization stages. Scientific Basis, Computational Infrastructure, and Assumptions The scientific basis, the computational infrastructure, and the major assumptions used within a computational model are some of the first elements typically addressed during model evaluation.
From page 119...
... . Where models are linked, as in linking emissions models to fate and transport models as discussed in Chapter 2, additional checks and audits are required to ensure the streams of data passing back and forth have strictly identical meanings and units in the partnered codes engaging in these electronic transactions.
From page 120...
... Stress testing (of complex models) : Stress testing ensures that the maximum load (for example, real time data acquisition and control systems)
From page 121...
... Acceptance testing: Certain contractually required testing may be needed before the new model or model application is accepted by the client. Specific procedures and the criteria for passing the acceptance test are listed before the testing is conducted.
From page 122...
... As discussed in Box 1-1 in Chapter 1, the evaluation of the ozone models in the 1980s and early 1990s showed that estimates of ozone concentrations from air quality models were good when compared with observations for any choice of statistical methods, but only because the errors in the models tended to cancel out. Statistics has value for conceptualizing, visualizing, and quantifying variation and dependence rather than for serving as a source of "rigorous" or "objective" standards for model evaluation.
From page 123...
... The issue of model calibration can be contentious. The calibration tradition is ingrained in the water resources field by groundwater, stream-flow, and water-quality modelers, whereas the practice is shunned by air-quality modelers.
From page 124...
... Experience of model calibration and the stances taken on it differ from one discipline to another. In hydrology and water quality modeling it is unsurprising how the wider interpretation and greater use of calibration have become established practice.
From page 125...
... . Air quality models also rely on a wide range of parameters used in the description of processes simulated by the models (such as turbulent dispersion coefficients for atmospheric mixing, parameters for the dry and wet removal of pollutants, kinetic coefficients for gas and aqueous-phase chemistry, mass transfer rate constants, and thermodynamic data for the partitioning of pollutants among the different phases present in the atmosphere)
From page 126...
... If air quality models were calibrated to observations, as is done with water quality models, there would be less need to use the model in a relative sense.
From page 127...
... One problem is that data and model output are generally averages over different temporal and spatial scales. For example, air pollution monitors produce an observation at a point, whereas output from regional-scaled air quality models discussed earlier in the report produces at best averages over the grid cells used in the numerical solution of the governing partial differential equations.
From page 128...
... In this manner, a model could be "stressed" to produce, for example, diurnal profiles of the pollutant. Again, however, the availability of monitoring data is a limiting factor.
From page 129...
... This study used epidemiological modeling to look at the reduction in hospital admissions for pneumonia, bronchitis, and asthma that occurred in the Utah Valley when a major source of pollution, the local steel mill, was closed for 13 months. The observation of a statistically significant reduction in hospital visits correlated to reductions in ambient PM concentrations helped to initiate a reassessment of ambient air quality standards for this pollutant.
From page 130...
... As such, uncertainty analysis and related sensitivity analysis is a critical aspect of model evaluation during model development and model application stages. The use of formal qualitative and quantitative uncertainty analysis in environmental regulatory modeling is growing in response to improvements in methods and computational abilities.
From page 131...
... . As these complex models are linked to other models, such as those in the state implementation planning process discussed in Chapter 2, the difficulties in performing quantitative uncertainty analysis greatly increases.
From page 132...
... The concept of sensitivity analysis has value in the model development phase to establish model goals and examine the advantages and limitations of alternative algorithms. For example, the definition of sensitivity analysis developed by EPA's Council on Regulatory Environmental Models (CREM)
From page 133...
... . In a broader perspective, uncertainty analysis examines a wide range of quantitative and qualitative factors that might cause a model's output values to vary.
From page 134...
... EVALUATION AT THE MODEL APPLICATION STAGE A new set of practical considerations apply in moving from the development of a computational model to the application of the model to a regulatory problem, including the need for specifying boundary and initial conditions, developing input data for the specific setting, and generally getting the model running correctly. These issues do not detract from the fundamental questions and trade-offs involved in model evaluation.
From page 135...
... Probability provides a useful framework for summarizing uncertainties and should be used as a matter of course to quantify the uncertainty in model outputs used to support regulatory decisions. A probabilistic uncertainty analysis may entail the basic task of propagating uncertainties in inputs to uncertainties in outputs (which would commonly, although perhaps ambiguously, be
From page 136...
... . Bayesian analysis, in which one or more sources of information are explicitly used to update prior uncertainties through the use of Bayes' theorem, is another approach for uncertainty analysis and is better, in principle, because it attempts to make use of all available information in a coherent fashion when computing the uncertainties of any model output.
From page 137...
... However, even using multiple scenarios ranging from highly optimistic to highly pessimistic will not necessarily ensure that such scenarios will bracket the true value. In thinking about the use of probability in uncertainty analysis, it is not necessary or even desirable to consider only the extremes of representing all uncertainties by using probability or by not using probability at all.
From page 138...
... 138 Models in Environmental Regulatory Decision Making poorly characterized uncertainty are fixed at various plausible levels and then probabilities are used to describe all other sources of uncertainty. To illustrate how conditional probability distributions can be used to describe the uncertainty in a cost-benefit analysis, consider the following highly idealized problem.
From page 139...
... Model Evaluation 139 a b FIGURE 4-3 (a) Hypothetical distribution representing uncertainty in number of lives saved by a policy; (b)
From page 140...
... 140 Models in Environmental Regulatory Decision Making This conclusion is highly sensitive to the difficulty of quantifying the value of a human life. Instead of averaging over the distribution in Figure 4-3(b)
From page 141...
... Rather than attempt, via a Bayesian calculation, to average over a host of models that all fit the data about equally well but result in different conclusions about low-dose human toxicity, it may be better to give several possible conclusions under varying assumptions on how to extrapolate across species and doses. In providing this guidance, it is not the committee's intent to dismiss the considerable amount of work done on monetization of value judgments, nor the work on Bayesian model averaging (Hoeting et al.
From page 142...
... It might be argued that providing multiple summaries that include a combination of conditional distributions, typical sample points, and distributions of intermediate outputs will be too much information for policy makers. However, interviews conducted with former EPA decision makers on the use of uncertainty analysis in regulatory decision making do not support this pessimistic assessment of the quantitative literacy of environmental policy makers (Krupnick et al.
From page 143...
... Missing data are an inevitable challenge in even the most well-run epidemiological study, so it is important to assess the impact of missing data, from the loss through the follow-up, to ensure that the analysis is not subject to bias. From a more technical perspective, it is important to ask whether the modeling assumptions are appropriate and whether the chosen model fits the observed data reasonably well.
From page 144...
... MANAGEMENT OF THE EVALUATION PROCESS This section addresses practices for managing the evaluation process. The life cycle of a model can be complex for any single model and immensely difficult when the full range of EPA regulatory models are considered.
From page 145...
... How the above factors are addressed in the model evaluation plan will vary among different model types. The committee envisions that the acceptability and applicability criteria be presented either within the model evaluation plan or in a separate document on the
From page 146...
... This information could determine acceptability for specific classes of pollutants. Another example is a one-box pond model that is applicable and acceptable for representing a small surface-water body but might not be applicable and acceptable for representing one of the Great Lakes, where there are potentially distinct subregions within the water body.
From page 147...
... What type of sensitivity analysis was used to make this determination? There is also substantial literature in related fields that bears on the issue of how much precision or accuracy is needed to inform regulatory decisions.
From page 148...
... key inputs required by the model are found to be incorrect or out of date -- for example, demographic data that are 30 years old and no longer updated. An example of a systematic approach to scientific and technical acceptability criteria for scientific assessments, including those based on environmental modeling, is shown in the activities of the RIVM Environmental Assessment Agency (RIVM/MNP)
From page 149...
... Thus, there is typically no consideration of how long-term model evaluation will occur throughout a model life stages. Under the heading "Model Evaluation" in the CREM database, most models present individual statements, such as • "Currently undergoing beta-testing and model evaluation…." • "Code verification, sensitivity analysis, and qualitative and quantitative uncertainty analysis have been performed.
From page 150...
... CMAQ, the community multiscale air quality modeling system, which is discussed in previous chapters, has been designed to approach air quality in an integrated fashion by including state-of-the-science capabilities for modeling multiple air quality issues, including tropospheric ozone, fine particles, toxics, acid deposition, and visibility degradation. TRIM.FaTE is a spatially explicit, compartmental mass-balance model that describes the movement and transformation of pollutants over time through a userdefined, bounded system that includes biotic and abiotic compartments (EPA 2003g)
From page 151...
... This group published a guidance document on the use of multimedia models for estimating environmental persistence and long-range transport. From 2003 to 2004, the expert group performed an extensive comparison of nine available multimedia fate and transport models to compare and assess their performance (Fenner et al.
From page 152...
... and explain its intended uses. • Use a thematic structure or diagram to summarize all the elements of the evaluation plan -- in particular, the elements that will be used in different stages of model development and application (elements such as the conceptual model, data, model testing, and model application)
From page 153...
... It also could be helpful for models with large regulatory impacts or complex scientific issues to have a periodic peer review or peer advisory process in which the peers interact with the model developers and users throughout the model's life. As noted in EPA's most recent version of its peer review guidance, the agency is beginning to appreciate that obtaining peer review earlier in the development of scientific products might be desirable (EPA 2006a)
From page 154...
... To obtain such an in-depth peer review, the committee sees the need for support in the form of compensation and perhaps in running the model for conditions that the reviewers specify. The committee considers such peer review to be part of the cost of building and using models, especially models with a large impact on regulatory activities.
From page 155...
... The groups involved in the environmental regulatory process can be risk takers, risk averse, and risk managing, to name but three classes of perspective (Thompson 1989)
From page 156...
... ) 2 given the local impacts of these model-based regulatory decisions, the general public can invest considerable resources in overseeing the quality of EPA's cleanup models and can even obtain grants to hire technical experts to review EPA's technical assessments (40 CFR, Part 35, Subpart M [Technical Assistance Grants]
From page 157...
... However, even with the widespread use of models at EPA, there has been little attempt to generalize prior experiences with models and classes of models into systematic improvements for the future. One reason may be the reluctance by the agency to disclose errors, criticisms, and shortcomings in the adversarial and legally constrained setting that environmental regulatory modeling activities often occur.
From page 158...
... As noted, Bredehoeft's work suggests that groundwater models are subject to surprises that show their underlying conceptual models to be invalid. Bredehoeft reported that one suggestion arising from that observation is to carry alternative conceptual models BOX 4-6 Retrospective Analysis of Model Predictions Retrospective analysis of environmental regulatory models often occurs when particular model predictions are later compared to measurements or results from other models.
From page 159...
... . These papers described alternative conceptualizations of the health risks that are incompatible with each other but that, at the time of the analyses, were supported by some data.
From page 160...
... With a long-term perspective, there will be cases in which it is possible to compare model results with data that were not available when the models were built. A key benefit of retrospective evaluations of models of individual models and of model classes is the identification of priorities for improving models.
From page 161...
... For all models used in the regulatory process, the agency should begin by developing a life-cycle model evaluation plan commensurate with the regulatory application of the model (for example, the scientific complexity, the precedent-setting potential of the modeling approach or application, the extent to which previous evaluations are still applicable, and the projected impacts of the associated regulatory decision)
From page 162...
... However, given the importance of modeling activities in the regulatory process, such investments are critical to enable environmental regulatory modeling to meet challenges now and in the future. Peer Review Peer review is an important tool for improving the quality of scientific products and is basic to all stages of model evaluation.
From page 163...
... However, in assessing environmental regulatory issues, these analyses generally would be quite complicated to carry out convincingly, especially when some of the uncertainties in critical parameters have broad ranges or when the parameter uncertainties are difficult to quantify. Thus, although probabilistic uncer
From page 164...
... Such problems are compounded when models are linked into a highly complex system, for example, when emissions and meteorological model results are used as inputs into an air quality model. At the other extreme, scenario assessment and/or sensitivity analysis could be used.
From page 165...
... Communicating Uncertainty Probabilistic uncertainty analysis should not be viewed as a means to turn uncertain model outputs into policy recommendations that can be made with certitude. Whether or not a complete probabilistic uncertainty analysis has been done, the committee recommends that various approaches be used to communicate the results of the analysis.
From page 166...
... Retrospective Analysis of Models EPA has been involved in the development and application of computational models for environmental regulatory purposes for as long as the agency has been in existence. Its reliance on models has only increased over time.
From page 167...
... Requirements such as those in the Information Quality Act may increase the susceptibility of models to challenges because outside parties may file a correction request for information disseminated by agencies. When a model that informs a regulatory decision has undergone the multilayered review and comment processes, the model tends to remain in place for some time.
From page 168...
... Finally, without adequate documentation, EPA might be limited in its ability to justify decisions that were critical to model design, development, or model selection. Recommendations As part of the evaluation plan, a documented history of important events regarding the models should be maintained, especially after public release.
From page 169...
... One possible way to implement the recommendation for developing and maintaining the model history may be to expand CREM's efforts in this direction. The EPA Science Advisory Board review of CREM contains additional recommendations with regard to specific improvements in CREM's database.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.