D Prognosis and Clinical Predictive Models for Critically Ill Patients
George E. Thibault, M.D.*
The development of clinical predictive models in intensive care units has represented a significant advancement for clinicians, clinical investigators, intensive care unit (ICU) directors, and quality assurance managers. Early classification systems, such as the Killip Class for acute myocardial infarction, the Glasgow Coma Score, and other disease-specific scoring systems, were clinically based, simple, and widely embraced. Over the past decade, more complex models have been developed with the aid of computerized databases and statistical modeling. Knaus' APACHE model, now in its third version, has been the best known of these systems and has been disseminated on a worldwide basis. A similar model has been developed by Pollock for use in Pediatric ICUs (PRISM). Both the APACHE and PRISM models use a consensus-derived scoring system that has subsequently been validated in a variety of settings. Lemeshow and Teres have developed a mortality prediction model (MPM) based on empirical data.
More complex than their predecessors, these systems demonstrate greater precision in classifying patients into risk groups for likelihood of survival. The wide acceptance of these models and their demonstrated ability to accurately place patients in risk strata have led to the hope that these models will be useful to the clinician in making individual patient decisions. Specifically, the hope is expressed that these may enable us to make more
Chief of Medicine, Brockton-West Roxbury Veterans Affairs Medical Center, West Roxbury, Massachusetts. Prepared for the Institute of Medicine Committee for the Feasibility Study on Care at the End of Life. (See Appendix A of this report.)
appropriate and timely decisions regarding the withdrawal or termination of care in critically ill ICU patients. Although it is attractive to use these models to make these difficult decisions on a more rational basis informed by patient risk, we should be cautious in embracing these models for this purpose. This essay will briefly touch on several problems related to this use of these models.
There are statistical limitations.
The models produce probability estimates. They are developed and validated on large patient populations. The data and models must be valid and reliable for aggregated groups of patients to satisfy statistical and methodological requirements. They have not been validated for individual patient decisions. For example, a model that predicts a 50 percent mortality for 100 ICU patients may prove to be 100 percent accurate when applied to the next 100 ICU patients viewed as a group, but if the model is used to identify the 50 individual patients who will live and the 50 who will die, it is theoretically possible to misclassify all of the patients and still be 100 percent accurate in aggregate. Among patients with a very high probability (>90 percent) of death, different problems exist. The calibration of the models is most suspect at the extremes of probabilities. The models invariably perform best in the midrange of probabilities and are most useful in that range when used on aggregated patients. Models lack statistical power among very high-risk patients because of the low number of cases in the very high-risk strata. The confidence limits around predictions of very high risk of death are likely to include probabilities that might make both clinicians and family members cautious about discontinuing therapy. In other words, our ability to predict death with sufficient certainty is unlikely to be an achievable goal on statistical grounds alone. The size of the database that would be required for statistical certainty among very high-risk patients is unlikely to ever be achieved within realistic cost and time constraints.
The models are inherently imperfect.
The models are in need of constant revision, hence APACHE III. This should give us pause about viewing the most current version of a model as definitive enough to make life or death decisions based on the model alone. These models have a number of potential limitations. Data elements used in the model may be missing or inaccurate. The time at which the variables are captured and their relationships to interventions are also potential sources of bias (see below). The model must be proven to give the same predictive accuracy when applied in different settings and over time (it must be "ro-
bust"). An ideal model would also apply to a patient population different from the one upon which it was generated, or at least the patients for whom it is not applicable should be known. Models lack generalizability if they are "overfitted" to the particular population and hence disproportionately reflect outliers or idiosyncratic values in that data set. Models may also fail to distinguish between data elements that represent process of care versus those that more truly represent the patient's condition. Models may also fail to distinguish between the effects of chronic disease compared with those of acute physiological derangement.
The model may not adequately account for disease specificity or the effect of intervention.
When APACHE was first developed, it was claimed that the predictive accuracy of this physiological scoring system was independent of disease process. In its third iteration, it is acknowledged that the intercepts are different for different diseases. It is very likely that this is in part true because of the variable effectiveness of interventions in different diseases. One could therefore predict that these relationships could change over time as new interventions are available or understanding of basic pathophysiology improves. Therefore, a given "score" might imply a different prognosis depending on the cause of the physiological derangement (e.g., severe metabolic acidosis due to diabetic ketoacidosis compared with cardiogenic shock).
Death is not the only outcome of interest.
The models are used almost exclusively to predict hospital-related death, and this is not the only outcome of interest to clinicians, patients, and families. Different models are likely to be needed to predict functional status, duration and quality of life following an ICU stay, and other outcomes that may be of importance to patients if we are going to use the models to decide when additional care is likely to be warranted or beneficial. Death is obviously the easiest end point to model, but if the models are going to be useful in ICU decisionmaking, we will need other end points to be modeled as well.
ICU illness is a dynamic process.
Models that use a single time point, frequently either entry to an intensive care unit or 24 hours later, fail to account for clinical changes over time that are an important part of clinical decisionmaking. Changes in patient clinical status add valuable information to predictive modeling. Several
recent model development efforts incorporate this dynamic process (see Chang, 1989, and Lemeshow, 1988). Further developments in time-dependent models may further aid clinicians in using models to make more accurate prognostications.
The complexity of the models may make them less credible to clinicians and families.
The less accessible the models, the more they become a "black box." The þ values of the current APACHE system are not available in the literature because this is deemed proprietary information by the developers and marketers of this model. Other models are usable only with computer software. These observations do not necessarily make the models less accurate, but they do deny them face validity and credibility with clinicians and families who will be the principal users in making end-of-life decisions.
There is a problem in determining the perspective to be taken by the modeler or the user of the model.
Are the models to be judged from the perspective of the individual patient or the perspective of society? If it is society's perspective, then some of the objections about the lack of power and calibration at the extremes may be less of a concern if, overall, the models have good predictive validity. For the physician or family trying to make the best decision possible for an individual patient, however, the performance characteristics of the model for that specific patient are of paramount importance.
All of these objections should not make us nihilistic about the use of models. I believe that the use of these models has furthered our understanding of the relationships between complex illness and outcomes in the ICU. They have refined our language and made more precise our discussions and writings. The models have certainly proven to be extremely useful in making comparisons from one ICU to another and in generating hypotheses about why care may be better and outcomes different in one unit versus another. The models have been helpful for clinical research so that we can be sure that patients are grouped by severity of illness. This is a necessary step in understanding whether observed differences in outcomes are related to structure or process of care as opposed to differences in patient risk. I believe, however, that we are far from achieving the goal of using individual patient level predictors to make difficult and painful decisions regarding which critically ill patients may no longer benefit from intensive care. These clinical predictors are one more piece of information, like any other diagnostic test, to be used in the context of the full clinical picture informed by patient and family preferences. Knowledge of the results of these predictive
models and their methodological limits is useful to clinicians in framing the questions about end-of-life decisionmaking for themselves, the other caregivers, patients, and their families. The more accurate the predictive models and the narrower the confidence limits, the more useful they will be. What cutoff point will be the appropriate cutoff point to decide that no more should be done will remain, I believe, an individual decision informed by, but not made exclusively by, these increasingly accurate predictive models.
I am indebted to my colleague, Dr. Jennifer Daley, for her contributions to the ideas in this essay.
1. Chang, R.W.S. Individual Outcome Prediction Models for Intensive Care Units. Lancet 2(8655):143–146, 1989.
2. Daley J., Jencks S., Draper D. et al. Predicting Hospital-Associated mortality for Medicare Patients. Journal of the American Medical Association 260(24):3617–3624, 1988.
3. Diamond G.A. Future Imperfect: the Limitations of Clinical Predication Model and the Limits of Clinical Prediction. Journal of the American College of Cardiology 14(3):12A–22A, 1989.
4. Iezzoni L.I., Moskowitz M.A. A clinical assessment of Medisgroups. Journal of the American Medical Association 260(:31):3159–3163, 1988.
5. Iezzoni L.I. "Black box" medical information systems. A technology needing assessment. Journal of the American Medical Association 265(22):3006–3007, 1988.
6. Jancks S.F., Daley J., Draper D., et al. Interpreting hospital mortality date. The role of clinical risk adjustment. Journal of the American Medical Association 260(24):3611–3616 1988.
7. Knaus W.A., Draper E.A., Wagner D.P., et al. An evaluation of outcome from intensive care in major medical centers. Annals of Internal Medicine 104(3):410–418, 1986.
8. Knaus W.A., Wagner D.P., Draper E.A., Zimmerman J.E., Bergner M., Bastos P.G., et al. The APACHE III prognostic risk system. Risk prediction of hospital mortality for critically ill hospitalized adults. Chest 100:1619–1636, 1991.
9. Knaus W.A., Wagner D.P., Zimmerman J.E., Draper E.A. Variations in mortality and length of stay in intensive care units. Annals of Internal Medicine 118:753–761, 1993.
10. Lemeshow S., Teres, Avrunin S.J., Pastides H. Predicting the outcome of intensive care patients. Journal of the American Statistical Association 83:348–356, 1988.
11. Pollack M.M., Ruttimann U.E., Getson P.R. Accurate prediction of the outcome of pediatric intensive care . New England Journal of Medicine 316:134–139, 1987.
12. Pollack M.M., Getson P.R., Ruttimann U.E., Steinhart C.M., Kanter R.K., Katz R.W., et al. Efficiency of intensive care. A comparative analysis of eight pediatric intensive care units. Journal of the American Medical Association 258:1481–1486, 1987.
13. Selker H.P. Systems for comparing actual and predicted mortality rates: characteristics to promote cooperation in improving hospital care. Annals of Internal Medicine 118:820–822, 1993.
14. Tores D., Lemeshow S., Avrunin S.J., Pastides H. Validation of the mortality prediction model for ICU patients. Critical Care Medicine 15:208–213, 1987.
15. Zimmerman J.E., Shorten S.M., Rousseau D.M., Duffy J., et al. Improving intensive care: Observations based on organizational case studies in nine intensive care units. A perspective, multicenter study. Critical Care Medicine 21(10):1443–1451, 1993.