# Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1(2012)

## Chapter: Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study

« Previous: Appendix I: Radiation Dose Assessment
Page 389
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

J

Modeling Incidence and Mortality Data in an Ecologic Study

A starting point for ecologic modeling of cancer rate is Poisson regression for rates and counts. In classic Poisson regression, a count, Ni of some data item (e.g., a count of childhood leukemias) is modeled as a Poisson random variable, with a probability distribution function equal to:

Here μi is the expected value of Ni (i.e., the number of cancer incident cases or deaths in a particular geographic unit expected from broad population rates, typically cross-classified by other variables such as age, gender, and race/ethnicity with i as the identifying index). In Poisson regression the mean, mi, is unknown but assumed to be a function of known covariates. For example, in generalized linear regression (McCullagh and Nelder, 1989) a model for the mean involves a covariate vector Xi = (Xi1,Xi2,…,Xip)T observed for each i. These Xi may be either continuous variables, such as dose, or indicator variables, indicating levels taken by categorical variables. The generalized linear model for mi is of form:

Here α = (α1,α2,…,αp)T and α1 is the regression coefficient relating covariate value Xi1 to the mean μi, α2 relates Xi2 to mi, etc. Here g is a link

Page 390
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

function, for example when (as is often the case) g is the log function then the model is equivalent to:

When Ni counts the number of events observed over a period of time, ti (years), for a known number of individuals, ki, then the person-years of observation, pyi, defined as tiki will be made a part of model as:

so that the mean of the counts is proportional to the person-years of observation multiplied by the effect of covariates.

In the setting described here Ni would correspond to a single entry in a cross-tabulation of events (death due to or incidence of a particular cancer) by each geographical unit, and by gender, race, age, calendar time, and any other relevant variable known (from the cancer registry) about the cases. For each cell in the table the number of events and person-years at risk, pyi, are required to be calculated (see discussion below) in addition the variable of interest, dose Di, and other covariates available for each geographical unit (i.e., indices of social economic status) are required for each table entry i.

A variation on model, known as the linear excess relative risk (ERR) model, is commonly used in radiation epidemiology. The linear ERR model incorporates dose in the model for mi as:

Here pyi exp(XiTa) is the background rate of disease (for unexposed cells), multiplied by person-years at risk, and the ERR parameter β is the excess relative risk associated with dose or dose surrogate Di. Much more complex models can be considered and software for generalized Poisson regression is available (Epicure, Hirosoft Software, Seattle, Washington). The background rate of disease is allowed to vary depending on race, gender, age, and calendar time (to allow for disease rates to differ by age and for age-specific rates to vary by calendar year, for example). Covariates in ecologic models are not individual covariates, but instead are summaries obtained for each geographical unit, although these can also vary in time; for example, we may have information about some socioeconomic variables at the level of census tract and these variables may change with time over the period of interest. Such variables are incorporated by including (categories of) calendar time as a cross-classification variable.

Page 391
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

J.1 DOSE AND DOSE SURROGATES

The presumed effect on risk of the dose or dose surrogate variable, Di, in model is much simpler (involving only the ERR parameter, b) than the model for the background risk (involving many additional parameters a); however, Di will also vary in time. For example, if Di is cumulative dose from a particular nearby plant for representative individuals, then Di for all census tracts near that plant would be zero until the start of operations of that plant and would accumulate in time during operation. Even treatment of much simpler dose surrogates (exposed or not exposed according to distance) should reflect startup times of each plant or facility.

Other factors may also need to be considered in the calculation of Di; for example, if it is known that a population around a particular plant or facility has been highly mobile over the period of exposure then it would be desirable to incorporate that mobility into the calculation of Di in order to approximate the average cumulative dose to the individuals in each census tract for each time period considered. If distance is to be used as a dose surrogate then time-weighted distance could also be considered.

J.2 PERSON-YEAR CALCULATIONS

Another key issue in Poisson modeling is to adequately approximate person-years of exposure to some hazard, pyi, as well as counting the number of events Ni. For each cell in the tabulation of events cross-classified by geographical unit, race, age, and calendar time, census data are required in order to determine the population size for each table entry, i.e., the whole population must be classified according to these same variables. Data from each decennial census must be interpolated to the out years. The accuracy of person-year approximations affect the modeling of Ni using Poisson regression and inaccuracies in estimation of person-years is one (among many) reasons to assume that the Poisson model may not adequately capture the variability of the observed counts Ni.

J.3 OVERDISPERSION

It is likely that observed counts Ni will depart from the Poisson regression distribution in a way that must be adequately accommodated when fitting the regression models such as (5). If a random variable is distributed according to the Poisson distribution then the variance of Ni is also equal to mi. However, there are good reasons why we expect that the actual variability of Ni will be greater than that predicted by Poisson distribution. For example, as mentioned above, for the out years at least, the population size and hence person-years will not be known exactly. Even more importantly, however, is that other known and unknown risk factors that influence disease

Page 392
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

occurrence are not being accounted for in the variables that are used in the ecologic regression. Even if those risk factors are completely independent of distance or dose from a plant or facility then they will still increase the dispersion of Ni while leaving the model for the mean unaffected. Ignoring overdispersion will lead to underestimation of standard errors of the estimates of the regression parameters, including those of most interest (i.e., b). The treatment of overdispersion in Poisson regression models has been considered by a number of authors (Liu and Pierce, 1993; McCullagh and Nelder, 1989; Moore, 1986). A simple and usually effective approach (McCullagh and Nelder, 1989) to solving this problem is to fit the means model using Poisson regression but then to estimate an overdispersion term s2 with s2 1 so that the variance of Ni is estimated to be equal to s2mi. Inference about the significance of the parameters of interest (i.e., b) is performed after adjusting the usual standard error estimates (assuming the Poisson model). A method of moments approaches for fitting this and similar models is described by Moore (1986). More generally, the “sandwich estimator” of Zeger and Liang (1986) can be used to compute variances of the parameter estimates that adequately reflect the variability of the counts. The overall approach described above relates observed disease rates to distance or other dose surrogates in a systemic way, i.e., addressing the question of whether or not disease risk appears to be associated with proximity to a nuclear facility, or to other dose surrogates, averaging over all the facilities. For some common cancers it will be possible to consider site-specific analyses, i.e., whether proximity to a specific facility or plant is associated with risk. Such analyses are subject to concerns about multiple comparisons (as described in the main text) but may also be particularly sensitive to the problem of overdispersion described above. If one uses an uncorrected test, i.e., a test based upon the assumption that the Poisson distribution holds exactly, then it is very likely that there will be some sites where for some cancers proximity is “significantly” associated with risk, but for which the inference differs greatly depending upon whether or not purely Poisson variation of counts is assumed. The estimation of overdispersion terms s2 1 (or providing other treatment of overdispersion as in a random effects analysis) is crucial in order to avoid overinterpretation of random fluctuation that simply are greater in magnitude (due to unmeasured characteristics affecting disease risk) than expected under the Poisson model. These problems appear in many different kinds of settings and have been described by a number of different authors (Efron, 1992). Modeling of both the mean (as in equation (5) of the appendix) and the variance of counts will be essential in ensuring that unrealistic inference from fitting these models is avoided; this is true both for the overall analysis of risk in relation to plant proximity and especially for site-specific analyses.

Page 393
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

REFERENCES

Efron, B. (1992). Poisson overdispersion estimates based on the method of asymmetric maximum likelihood. JASA 87.

Liu, Q., and D. A. Pierce (1993). Heterogeneity in Mantel-Haeszel-type models. Biometrika 80(3):543-556.

McCullagh, P., and J. Nelder (1989). Generalized linear models, 2nd edition. Boca Raton, FL: CRC Press.

Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and proportions. Biometrika 73(3):583-588.

Zeger, S., and K. Liang (1986). Longitudinal analysis for discrete and continuous outcomes. Biometrics 42:121-130.

Page 394
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×

Page 389
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Page 390
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Page 391
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Page 392
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Page 393
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Page 394
Suggested Citation:"Appendix J: Modeling Incidence and Mortality Data in an Ecologic Study." National Research Council. 2012. Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1. Washington, DC: The National Academies Press. doi: 10.17226/13388.
×
Next: Appendix K: Letter Template to State Cancer Registries »
Analysis of Cancer Risks in Populations Near Nuclear Facilities: Phase 1 Get This Book
×

In the late 1980s, the National Cancer Institute initiated an investigation of cancer risks in populations near 52 commercial nuclear power plants and 10 Department of Energy nuclear facilities (including research and nuclear weapons production facilities and one reprocessing plant) in the United States. The results of the NCI investigation were used a primary resource for communicating with the public about the cancer risks near the nuclear facilities. However, this study is now over 20 years old. The U.S. Nuclear Regulatory Commission requested that the National Academy of Sciences provide an updated assessment of cancer risks in populations near USNRC-licensed nuclear facilities that utilize or process uranium for the production of electricity.

Analysis of Cancer Risks in Populations near Nuclear Facilities: Phase 1 focuses on identifying scientifically sound approaches for carrying out an assessment of cancer risks associated with living near a nuclear facility, judgments about the strengths and weaknesses of various statistical power, ability to assess potential confounding factors, possible biases, and required effort. The results from this Phase 1 study will be used to inform the design of cancer risk assessment, which will be carried out in Phase 2. This report is beneficial for the general public, communities near nuclear facilities, stakeholders, healthcare providers, policy makers, state and local officials, community leaders, and the media.

1. ×

## Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

No Thanks Take a Tour »
2. ×

« Back Next »
3. ×

...or use these buttons to go back to the previous chapter or skip to the next one.

« Back Next »
4. ×

Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

« Back Next »
5. ×

Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

« Back Next »
6. ×

To search the entire text of this book, type in your search term here and press Enter.

« Back Next »
7. ×

Share a link to this book page on your preferred social network or via email.

« Back Next »
8. ×

View our suggested citation for this chapter.

« Back Next »
9. ×