A
Statistical Details
MIXTURE MODELS
One can naturally model the unobserved population heterogeneity or extrapopulation heterogeneity via mixture models. The simplest and most natural occurrence of the mixture model arises when one samples from a super population g which is a mixture of a finite number, say m, of populations (g_{1},…, g_{m}), called the components of the population. Suppose a sample from a super population g is recorded as data (Y_{i}, J _{i}) for i = 1, …, n, where Y_{i} = y_{i} is a measurement on the i^{th} sampled unit and J_{i} = j_{i} indicates the index number of the component to which the unit belongs. If sampling is done from the j^{th} component, an appropriate probability model for the sampling distribution is given by
P(Y_{i} = y_{i}J_{i} = j) = f_{j}(y_{i}; θ_{j})
The function f_{j}(y_{i}; θ_{j}) represents a density function, called the component density for the i^{th} observation y_{i} and the parameter θ_{j} Θ_{j} ^{d}_{j} called the component parameter, that describes the specific attributes of the j^{th} component population, g_{j}.
We treat the J_{i} as missing data and define the latent random variable Φ= (Ф_{1},…, Ф_{n}) to be the values of the parameter θ {θ_{1}, …,θ_{m}} corresponding to the sampled components J_{1}, …, J_{n}; i.e., if the i^{th} observation came from the j^{th} component, then define Ф_{i} = θ_{j}. Then the Ф_{i} are a random sample from the discrete probability measure Q that assigns a positive probability π_{j} to the support point θ_{j} for j = 1, …, m. That is, the latent
random variable Φ has a distribution Q, which is called the mixing distribution, defined by P(Φ =θ_{j}) = Q({θ_{j} }) = π_{j}. Thus the mixing distribution
puts a mass (or weight) of π_{j} on each parameter θ_{j}. We thus arrive at the mixture distribution , or equivalently
One can find the posterior probability that the i^{th} observation comes from the j^{th} component (i.e, y_{i} G_{j}) using
In turn, one can assign each y_{i} to the population to which it has the highest estimated posterior probability of belonging; i.e., to g_{j} if τ_{ij} > τ_{it} for t = 1, …, m and t ≠ j.
Estimation of the Parameters and the Mixture Components
One has to estimate the number of subpopulations or number of mixture components m as well as the parameters in the mixture distribution g…). This can be achieved via parametric or nonparametric maximum likelihood estimation (Laird, 1978; Böhning et al. 1992, Pilla and Lindsay, 2001). Laird (1978) considered nonparametric likelihood estimation of a mixing distribution and Lindsay (1983) linked its general theory, developed from a study of the geometry of the mixture likelihood, to the problem of estimating a discrete mixing distribution. One may also use an alternative approach to maximum likelihood for which menu driven software is provided by Böhning et al. (1998).
For a given m, one can estimate the parameters in the mixture distribution via the EM algorithm (Dempster et al. 1977) and several improvements of it which are discussed in McLachlan and Krishnan (1997). Let = ( , ) be the vector of estimated parameters. Once has been obtained, estimates of the posterior probability of population membership _{ij} can be formed for each y_{i} to give a probabilistic clustering.
When m, the number of mixtures, is large, the EM algorithm can be excruciatingly slow. In that case, one can use alternative EM methods proposed by Pilla (1997) and Pilla and Lindsay (2001) in the nonparamet
ric mixture setting. Their approach assumes that the θ’s are known and the goal is to estimate the π’s only. However, by setting up the problem in a nonparametric framework, one may find the maximum likelihood estimator (MLE) quite easily in a highdimensional problem via Pilla and Lindsay’s method. These methods are especially useful because θ_{1} < θ_{2} < …, i.e., the support points are ordered and hence the corresponding nearby densities can be paired to take advantage of the correlated densities.
Mixture Setting for Detecting Spatial Occurrence of Suicide
Suppose data are available on the spatial occurrence of suicide in N counties in a state or for the entire country over a five year period. Assume that the counties are a mixture of two groups g_{1} and g_{2}, corresponding to normal and high risk with respect to suicide, in proportions of π_{1} and π_{2} respectively. The number of events y_{i} in the i^{th} county is assumed to have a Poisson distribution with a rate of μ_{j} in g_{j} for j = 1, 2; i= 1, …, n. Let = (μ_{1}, μ_{2}, π_{1}, π_{2})^{T} and θ_{i} = (μ_{1}N_{i}, μ_{2}N_{i})^{T}, where N_{i} = M_{i} t_{i} is the mass of population at risk in the i^{th} county assuming that there are M_{i} individuals at risk over time t_{i}. In this scenario, the log likelihood becomes
where f_{j}(y_{i}; θ_{i}) = exp(–μ_{j}N_{i}) (μ_{j}N_{i})^{y}_{i}/y_{i}!. One can use the EM algorithm to find the MLE of the parameters. The complete data log likelihood function becomes
where z_{ij} = 1 if y_{i} g_{j} and 0 otherwise. The MLE of the parameters are found iteratively via
and
where which is the estimated posterior probability that y_{i} g_{j}.
Testing for the Number of Components in a Mixture
Testing for the number of components in a mixture is a very hard problem. Consider the problem of testing the hypothesis H_{0}: number of components = m against H_{0}: number of components = m + 1. The likelihood ratio test (LRT) statistic defined as 2 log L_{n} = 2 [l(_{m}) – l(_{m}_{+1})], where l(_{m}) is the log likelihood log evaluated at _{m}, the nonparametric MLE under H_{0} and similarly l(_{m}_{+1}) is the nonparametric MLE under H_{1}. The LRT statistic generally has an asymptotic χ^{2} distribution with d degrees of freedom, where d equals the difference between the number of parameters under the alternative and null hypothesis. However, this theory is known to fail for the mixture problem (Titterington et al., 1985:154). The reason for this is that the null hypothesis does not lie in the interior of the parameter space. The problem is not yet solved except for some special cases. However, the simulations or bootstrap can shed some light on the distributional properties of the LRT. See Lindsay (1995) and Böhning (1999) (including the references therein) for recent developments in theory and methods to test for the number of components in a mixture. There are other techniques for identifying the number of components of a mixing distribution including a variety of graphical techniques (Lindsay and Roeder, 1992).
The Choice of Initial Values and Multiple Mode Problem for the EM Algorithm
The continuous EM algorithm that finds the MLE of parameter estimators, for a given m, requires the specification of initial values for the number and location of the support points. Pilla and Lindsay (2001) show that if the number of support points is fixed in advance, then one could have multiple modes whereas the nonparametric ML approach corresponds to a unimodal likelihood. Böhning (1999) illustrates through an example where the choice of initial values is critical; for example, in testing for the number of components in the mixture. Through simulations, he demonstrates the effectiveness of several approaches that combat this problem. His recommendation is to choose the support points as values of certain order statistics of the data (see Böhning, 1999:68–70). In his simulation studies, the gold standard (the grid approach given below) seems to outperform the other methods. The optimization problem solved by the gridbased EM (where the support points are assumed known) proposed by Pilla and Lindsay (2001) has a unique optimum which in turn generates an estimated number of components. However, the continuous EM can converge to the suboptimal mode.
In light of the above characteristics, Pilla and Lindsay (2001) suggest the following: to obtain a continuous solution with a flexible number of support points, first start with one of the fast EMbased algorithms proposed by Pilla and Lindsay on a grid and next use the resulting solution as starting values for the continuous EM. This will protect against using an incorrect number of points or finding suboptimal solutions.
Convergence Criteria
Finding a natural stopping rule for iterative algorithms in the mixture problem when the final log likelihood is unknown is an important problem. In a potentially slow algorithm like EM, a far weaker convergence criterion based on the changes in the log likelihood value or on changes in the parameter estimates between iterations can be very misleading (Titterington et al. 1985:90). See Lindsay (1995:62–63) for further details. The following gradientbased stopping criterion proposed by Lindsay (1995:131–132) can be used to detect the convergence of the EM algorithm: we stop when sup D_{Q}(θ) ≤ ε, with ε = 0.005, where the gradient function D_{Q}(θ) is defined as
then we automatically satisfy the ‘ideal stopping criterion’: log L_{obs}(; y) – log L_{obs}(^{p}; y) ≤ ε, where ^{p} is the estimate at the p^{th} iteration. The gradient function therefore creates a natural stopping rule for iterative algorithms such as the EM when the final log likelihood is unknown, although it is more stringent than the ideal stopping rule.
Illustration of Poisson Mixture Model
To illustrate use of the Poisson mixture model we analyzed countylevel suicide data for the US for the time period of 1996–1998. The three year period was used to obtain more stable estimates of the parameters. Application of the Poisson mixture model identified a mixture of four component distributions with proportions of .1125, .4523, .3119, and .1234, and means of 7.5, 10.9, 14.9, and 20.6 suicides per 100,000. Results of classifying counties into component distributions are displayed in Figure 1. Inspection of the map in Figure 1 reveals that those counties classified in the highest component distribution (20.6 suicides per 100,000), are predominantly in the western states with lowest population density, although there are exceptions throughout the country (see Figure A1). Counties classified in the lowest component distribution (7.5 suicides per 100,000)
appear to be clustered in the central United States. Finally, counties classified into the two intermediate groups are largely in the eastern portion of the United States. Unlike the earlier attempt at fitting a mixture distribution to temporal suicide data collected in Cook county (Gibbons et al., 1990) that was designed to identify “suicide epidemics” and failed to do so, the results of this analysis has focused on the spatial distribution of suicide and has identified qualitatively distinct geographic groupings of suicide rates across the United States. Note that this mixture distribution does not adjust for demographic disparity across the counties (e.g., age, sex, race, population density, etc.) that may have produced the mixture distribution in the first place.
POISSON REGRESSION MODELS
In the following sections, we present statistical details of both fixedeffects and mixedeffects Poisson regression models.
FixedEffects Poisson Regression Model
In Poisson regression modeling, the data are modeled as Poisson counts whose means are expressed as a function of covariates. Let y_{i} be a response variable and x_{i} = (1, x_{i}_{1},…, x_{ip})^{T} be a [(p+1) × 1] vector of covariates for the i^{th} individual. Then the Poisson regression model y_{i} is
(2)
where λ _{i} = exp(β^{T} x_{i}) and β = (β_{0}, β_{1}, …, β_{p})^{T} is a (p+1)dimensional vector of unknown parameters corresponding to x_{i}. An important property of the Poisson distribution is the equality of the mean and variance:
E(y_{i}) = Var(y_{i}) = λ_{i}
(3)
From equation (2) one can write the likelihood and the log likelihood functions for the n independent observations as
and
respectively. The first and second partial derivatives with respect to the unknown parameters β are given by
and
respectively. Since the above equations are not linear, iterative procedures such as the NewtonRaphson algorithm or Fisher’s scoring algorithm are used to obtain the MLE of β, denoted by . A consistent estimator of the variancecovariance matrix of , V() is the inverse of the information matrix, I() — which is the negative of inverse of the matrix of second partial derivatives evaluated at the MLE. Inferences on the regression parameter β can be made using V().
MixedEffects Poisson Regression Model
When we have a mixture of fixed and random effects, the more general mixedeffects Poisson regression model is used. Suppose that there are n = Σ_{i} n_{i} nonnegative observations y_{ij} for i = 1, …, n_{c} clusters and j = 1, …, n_{i} observations and a (p + 1) dimensional unknown parameter vector, β, associated with a covariate vector x_{ij} = (1, x_{ij}_{1}, …, x_{ijp})^{T}. Further, let the random effect υ_{i} be normallydistributed with mean 0 and variance σ^{2} and independent of the covariate vector x_{ij}. Then the conditional density function, f(y_{i}; θ_{i}), of the n_{i} suicide rates in cluster i is written as:
(4)
with
λ_{ij} =exp(β^{T}x_{ij} + υ_{i} )
=exp(β^{T}x_{ij} + σθ_{i} ),
where θ_{i} = υ_{i}/σ such that θ_{i} ~N (0,1). Thus the loglikelihood function corresponding to equation 4 becomes
The first and second derivatives of log L with respect to β and σ are
and
The marginal density of y_{i} can be written as
where g(θ) represents the multivariate standard normal density. One can approximate the above integral via the Gaussian quadrature as
(5)
where λ_{ijq} = exp(β^{T} x_{ij} + σB_{q}), B_{q} is the quadrature node and A(B_{q}) is the corresponding quadrature weight. From equation (5), one can obtain the first derivatives of the approximate log likelihood as
and
Solution of the above likelihood equations can be obtained iteratively using the NewtonRaphson algorithm. On the i^{th} iteration, estimates for a parameter vector Θ are given by
(6)
where is the matrix of second derivatives of the loglikelihood, evaluated at Θ_{i}. Alternatively, one can use the Fisher scoring method. This method replaces the matrix of second derivatives in (6) by their expected values:
which is equal to the negative of the information matrix. The scoring solution is often used in cases where the matrix of second derivatives is difficult to obtain. Note that when the expected value and the actual value of the Hessian matrix coincide, the Fisher scoring method and the NewtonRaphson method reduce to the same algorithm.
The convergence of the iterative procedure depends on the initial values of the parameters. Good starting values reduce the number of iterations needed to reach the final estimates. It is sometimes possible that a poor choice of starting values may reach some local maximum instead of the global maximum. For this reason, it is often desirable to reestimate the parameters under a variety of starting values, or to use the EM algorithm to obtain starting values that are reasonably close to the final solution (Hedeker and Gibbons, 1994).
Estimation of Random Effects
Often, it is of interest to estimate values of the random effects θ_{i} within a sample. For this purpose the expected “a posteriori” (EAP) or empirical Bayes estimator _{i} has been suggested by Bock and Aitkin (1981). The estimator _{i} given y_{i} for cluster i can be obtained as:
Similarly, the variance of the posterior distribution of _{i}, which may be used to make inferences regarding the EAP estimator (i.e., the posterior variance is an estimate of precision of the estimate of _{i}), is given by
Generalized Estimating Equations  GEE
An alternative approach to the analysis of clustered suicide rate data is based on Generalized estimating equations (GEEs) models, which were introduced by Liang and Zeger (1986) and Zeger and Liang (1986).
Let y_{ij} be the j^{th} response for the i^{th} unit (e.g., county ) for i = 1, …, n_{c} and j =1, …, n_{i} observations and x_{ij} = (1, x_{ij}_{1}, …, x_{ijp} ) be a vector of covariates for the i^{th} unit. Let y_{i} be the vector of the responses for the i^{th} unit with corresponding mean vector μ_{i} and let V_{i} be an estimate of the covariance matrix of y_{i}. Then the GEE marginal model for estimating β is given by
(7)
and
g(μ_{ij}) = g(E(y)) =β'x_{ij}
(8)
where g is the link function. Common choice for the link function might be the identity link for continuous data, log link for count data, and logit link for binary data. For example, the link functions for the Poisson and logistic regression models are g(a) = log(a) and g(a) = log(a/(1 – a)), respectively.
In addition to the marginal model, the covariance structure of the correlated observations for a given unit of y_{i} is modeled as
(9)
where A_{i} is a diagonal matrix of variance functions and R(a) is the working correlation matrix of y_{i} specified by the vector of parameters a. Various types of working correlation structures such as exchangeable or autoregressive can be used in the model.
The maximum likelihood estimator can be obtained by solving the above estimating equations iteratively:
(10)
Illustration of Poisson Regression Model
Returning to the national suicide data from the previous section, we now illustrate how Poisson regression models can be used to estimate the effects of age, race, and sex on clustered (i.e., within counties) suicide rate data. For the purpose of illustration, we considered the effects of age divided into five categories (514, 1524, 2544, 4564, and 65 and older), sex, and race (African American versus Other) in the prediction of suicide rates across the U.S. for the period of 19961998. These categories were used so that there would be sufficient sample sizes available to compare observed and expected annual suicide rates for both GEE and mixedeffects (maximum marginal likelihood  MMLE) Poisson regression models. To this end, we fit a Poisson regression model with all main effects and twoway interactions using both GEE and a full likelihood mixedeffects model. Given the large sample sizes almost all terms in the model were statistically significant although of widely varying effect sizes. Table 2 displays a comparison of parameter estimates and standard errors for the GEE and mixedeffects models. In general, the GEE and MMLE parameter estimates were remarkably similar. The only nonsignificant terms were two terms in the race by age interaction. The comparison of rates for ages 514 versus 1524 and 514 versus 2544, did not depend on race.
Based on the parameter estimates in Table A1, we estimated marginal suicide rates for both GEE and mixedeffects models. Table A2
TABLE A1 Comparison of Maximum Marginal Likeihood (MMLE) and Generalized Estimating Equations (GEE) for the Clustered Poisson Regression Model

MMLE 
GEE 

Effect 
Estimate 
SE 
Prob 
Estimate 
SE 
Prob 
Intercept 
–4.331 
0.040 
<0.0001 
–4.408000 
0.048000 
<0.0001 
Female 
–1.009 
0.076 
<0.0001 
–1.009000 
0.073000 
<0.0001 
Black vs Other 
–0.292 
0.103 
0.0046 
–0.279000 
0.112000 
0.0127 
15–24 vs 05–14 
2.788 
0.041 
<0.0001 
2.787000 
0.043000 
<0.0001 
25–44 vs 05–14 
3.030 
0.040 
<0.0001 
3.021000 
0.043000 
<0.0001 
45–64 vs 05–14 
2.971 
0.041 
<0.0001 
2.975000 
0.046000 
<0.0001 
65+ vs 05–14 
3.371 
0.041 
<0.0001 
3.390000 
0.047000 
<0.0001 
Female x Black 
–0.345 
0.036 
<0.0001 
–0.345000 
0.041000 
<0.0001 
Female x 15–24 
–0.664 
0.080 
<0.0001 
–0.663000 
0.077000 
<0.0001 
Female x 25–44 
–0.355 
0.077 
<0.0001 
–0.357000 
0.074000 
<0.0001 
Female x 45–64 
–0.223 
0.077 
0.0038 
–0.223000 
0.075000 
0.0029 
Female x 65+ 
–0.938 
0.078 
<0.0001 
–0.945000 
0.078000 
<0.0001 
Black x 15–24 
0.065 
0.106 
0.5434 
0.068000 
0.107000 
0.5297 
Black x 25–44 
–0.139 
0.105 
0.1829 
–0.141000 
0.107000 
0.1877 
Black x 45–64 
–0.473 
0.107 
<0.0001 
–0.486000 
0.111000 
<0.0001 
Black x 65+ 
–0.740 
0.112 
<0.0001 
–0.748000 
0.117000 
<0.0001 
County Variance 
0.280 
0.003 
<0.0001 

displays observed and expected annual suicide rates for both methods of estimation, broken down by age, sex, and race. Inspection of Table A2 reveals several interesting results. In general, suicide increases with age, is higher in males, and is lower in African Americans. Black females have the lowest suicide rates across the age range. In white males, the suicide rate is increasing with age whereas in all other groups, the suicide rate either is constant or decreases after age 65. Comparison of the expected frequencies for the GEE and mixedeffects models reveal that they are quite similar and the GEE does a slightly better job of recovering the observed marginal rates.
A special feature of the mixedeffects model is the ability of estimating countyspecific rates using empirical Bayes estimates of the random effects as described in the previous section. Using these estimates, we can accomplish two goals. First, we can estimate countyspecific expected suicide rates, which directly incorporate the effects race, sex, and age of that county. Table A3 provides a comparison of observed and expected numbers of suicides (19961998) for 100 randomly selected counties. In
TABLE A2 Observed and Expected Suicide Rates by Age, Race, and Sex. Data: CMF 1996–1998
spection of Table A3 reveals remarkably close agreement between observed and expected numbers of suicides.
Second, we can use the Bayes estimates directly to obtain countylevel suicide rates adjusted for the effects of race, sex, and age. In the case of a Poisson model, the Bayes estimate for a given county is a multiple of the national suicide rate adjusted for the case mix in that county (i.e., race, sex and age). For example, a Bayes estimate of 1.0 represents an adjusted rate that is equal to the national rate. By contrast, a Bayes estimate of 2.0 represents a doubling of the national rate, and a Bayes estimate of 0.5 represents onehalf of the national rate. Figure A2 displays the Bayes estimates by county across the U.S. Inspection of Figure A2 reveals that even after accounting for these important demographic variables, considerable spatial variability remains. Again, the highest adjusted rates are typically found in the less densely populated areas of the western U.S.
The map in Figure A2 also provides a useful tool for qualitative research into the etiology of suicide. A natural approach is to examine the
TABLE A3 Observed and Expected Number of Suicides for 100 Randomly Selected Counties
State 
County 
Observed # of Deaths 
Expected # of Deaths 
State 
County 
Observed # of Deaths 
Expected# of Deaths 
56 
7 
16 
10.3 
51 
75 
5 
5.9 
53 
63 
169 
170.4 
20 
159 
6 
4.5 
5 
91 
14 
13.6 
21 
103 
14 
8.7 
47 
111 
13 
9.4 
40 
45 
1 
1.6 
54 
93 
1 
2.8 
31 
61 
1 
1.5 
28 
5 
9 
5.5 
31 
91 
0 
0.3 
27 
141 
22 
21.7 
40 
73 
9 
6.3 
38 
47 
0 
1.0 
20 
203 
1 
1.0 
21 
59 
25 
27.5 
1 
5 
8 
8.2 
48 
451 
54 
50.2 
38 
51 
0 
1.4 
48 
87 
3 
1.4 
38 
39 
2 
1.3 
18 
97 
391 
385.3 
21 
95 
11 
12.0 
35 
7 
8 
6.2 
48 
73 
32 
25.8 
48 
383 
2 
1.5 
23 
11 
38 
39.6 
31 
41 
3 
4.3 
55 
101 
58 
59.5 
18 
7 
2 
3.4 
17 
107 
9 
10.8 
45 
61 
6 
5.9 
47 
127 
1 
1.9 
19 
1 
4 
3.4 
55 
55 
28 
28.1 
8 
119 
13 
9.9 
21 
223 
5 
3.4 
36 
3 
22 
20.9 
30 
97 
1 
1.4 
19 
93 
2 
2.9 
19 
51 
2 
3.0 
12 
95 
303 
314.1 
47 
129 
4 
6.4 
49 
49 
89 
91.1 
48 
9 
2 
3.0 
28 
45 
30 
24.1 
19 
5 
9 
6.7 
18 
107 
19 
16.9 
28 
69 
3 
3.0 
5 
1 
6 
6.8 
46 
19 
6 
4.1 
17 
167 
64 
64.8 
27 
171 
31 
30.8 
39 
89 
43 
44.7 
28 
53 
1 
2.4 
28 
1 
9 
9.5 
20 
31 
2 
3.1 
53 
23 
0 
0.9 
51 
47 
18 
15.0 
48 
213 
30 
28.8 
46 
123 
3 
2.7 
55 
107 
9 
7.1 
12 
81 
135 
129.6 
31 
177 
6 
6.7 
2 
122 
22 
20.4 
17 
49 
12 
12.2 
21 
61 
0 
3.4 
55 
75 
18 
17.5 
21 
25 
7 
6.3 
8 
55 
3 
2.7 
26 
13 
6 
4.0 
29 
121 
4 
5.4 
18 
157 
42 
44.1 
40 
125 
34 
30.2 
21 
165 
2 
2.2 
20 
73 
3 
3.2 
48 
447 
2 
0.8 
21 
109 
7 
5.5 
13 
265 
3 
0.6 
29 
197 
2 
1.8 
2 
100 
2 
1.0 
51 
133 
7 
5.0 
55 
11 
4 
5.2 
29 
105 
8 
10.0 
48 
81 
0 
1.3 
47 
181 
14 
9.7 
21 
229 
6 
4.6 
51 
103 
9 
5.5 
55 
35 
32 
32.3 
16 
15 
3 
2.2 
39 
151 
112 
113.5 
38 
55 
2 
3.5 
28 
89 
17 
18.4 
39 
17 
91 
93.6 
19 
167 
4 
8.2 
55 
51 
3 
2.8 
23 
27 
17 
15.7 
19 
119 
3 
4.2 
38 
1 
0 
1.1 
spatial distribution of Bayes estimates in Figure A2 for outliers. For example, in the western U.S. and Alaska where suicide rates are typically high there are a few counties that have Bayes estimates that are consistent with the national average. Similarly, in the central U.S. where there is a high concentration of counties with the lowest suicide rates, there are a few counties that exhibit the highest suicide rates. As an example, Table A4 displays the Bayes estimates, observed and expected numbers of suicides and suicide rates for all counties in Alaska and New Mexico and the 8 counties with estimated adjusted suicide rates less than or equal to half of the national average (national average is 12.33 suicides per 100,000 during the period of 1996–1998). Table A4 reveals that although there are generally elevated suicide rates in Alaska and New Mexico relative to the national average, there are several counties with low rates, similar to the national average. What are the protective factors that have produced these spatial anomalies? Are these spatial anomalies simply due to reporting bias or some other unmeasured characteristic that has produced the outlying Bayes estimate. Examining these spatial anomalies in greater detail is certainly a fruitful area for further research.
TABLE A4 Observed and Expected rates per 100,000 for Alaska, New Mexico, and Counties with BE <= 0.50
State 
County Name 
Observed # of Suicides 
Expected # of Suicides 
ALASKA 
ALEUTIANS EAST 
1 
0.84 
ALASKA 
ALEUTIANS WEST 
2 
1.92 
ALASKA 
ANCHORAGE 
118 
115.66 
ALASKA 
BETHEL 
22 
10.17 
ALASKA 
BRISTOL BAY 
1 
0.58 
ALASKA 
DILLINGHAM 
4 
1.46 
ALASKA 
FAIRBANKS NORTH STAR 
52 
46.63 
ALASKA 
HAINES 
2 
0.96 
ALASKA 
JUNEAU 
13 
11.95 
ALASKA 
KENAI PENINSULA 
22 
20.38 
ALASKA 
KETCHIKAN GATEWAY 
4 
4.91 
ALASKA 
KODIAK ISLAND 
5 
5.17 
ALASKA 
LAKE AND PENINSULA 
4 
0.63 
ALASKA 
MATANUSKASUSITNA 
29 
25.80 
ALASKA 
NOME 
14 
5.02 
ALASKA 
NORTH SLOPE 
10 
3.35 
ALASKA 
NORTHWEST ARCTIC 
15 
3.87 
ALASKA 
PRINCE OF WALESOUTER 
3 
2.64 
ALASKA 
SITKA 
6 
3.71 
ALASKA 
SKAGWAYHOONAHANGOON 
1 
1.34 
ALASKA 
SOUTHEAST FAIRBANKS 
6 
2.69 
ALASKA 
VALDEZCORDOVA 
1 
3.31 
ALASKA 
WADE HAMPTON 
18 
4.37 
ALASKA 
WRANGELLPETERSBURG 
8 
3.73 
ALASKA 
YAKUTAT 
0 
0.29 
ALASKA 
YUKONKOYUKUK 
16 
5.91 
ILLINOIS 
MC LEAN 
17 
25.62 
NEW JERSEY 
HUNTERDON 
13 
22.23 
NEW JERSEY 
MORRIS 
80 
88.20 
NEW JERSEY 
UNION 
80 
88.66 
NEW MEXICO 
BERNALILLO 
303 
295.93 
NEW MEXICO 
CATRON 
1 
1.18 
NEW MEXICO 
CHAVES 
24 
23.46 
NEW MEXICO 
CIBOLA 
12 
9.84 
NEW MEXICO 
COLFAX 
8 
6.16 
NEW MEXICO 
CURRY 
23 
20.28 
NEW MEXICO 
DE BACA 
2 
1.05 
NEW MEXICO 
DONA ANA 
63 
62.45 
NEW MEXICO 
EDDY 
30 
26.30 
NEW MEXICO 
GRANT 
23 
17.83 
NEW MEXICO 
GUADALUPE 
2 
1.64 
NEW MEXICO 
HARDING 
1 
0.40 
NEW MEXICO 
HIDALGO 
1 
2.14 
NEW MEXICO 
LEA 
15 
16.90 
Observed Rate/100,000 
Expected Rate/100,000 
Population 
Bayes Estimate 
15.66 
13.08 
2,128 
1.01 
16.27 
15.61 
4,098 
1.00 
16.90 
16.56 
232,752 
1.24 
52.75 
24.37 
13,902 
2.58 
26.75 
15.50 
1,246 
1.03 
34.00 
12.44 
3,921 
1.22 
22.51 
20.19 
76,998 
1.50 
32.20 
15.52 
2,070 
1.08 
15.56 
14.30 
27,850 
1.07 
16.65 
15.43 
44,035 
1.11 
10.28 
12.62 
12,967 
0.92 
12.42 
12.85 
13,415 
0.97 
87.45 
13.69 
1,524 
1.31 
19.23 
17.10 
50,276 
1.26 
58.89 
21.11 
7,924 
2.05 
53.11 
17.80 
6,276 
1.70 
87.41 
22.54 
5,720 
2.45 
15.29 
13.45 
6,541 
1.02 
25.55 
15.80 
7,827 
1.19 
9.36 
12.58 
3,561 
0.97 
38.23 
17.13 
5,232 
1.30 
3.47 
11.51 
9,597 
0.82 
105.40 
25.59 
5,692 
3.00 
41.81 
19.47 
6,377 
1.40 
0.00 
12.45 
764 
0.97 
73.37 
27.08 
7,269 
2.24 
4.31 
6.49 
131,572 
0.48 
3.86 
6.59 
112,392 
0.46 
6.27 
6.91 
425,476 
0.49 
5.73 
6.35 
465,455 
0.49 
20.70 
20.22 
487,809 
1.48 
12.76 
15.02 
2,613 
0.98 
13.88 
13.57 
57,623 
1.02 
16.81 
13.78 
23,799 
1.17 
20.83 
16.03 
12,799 
1.15 
17.94 
15.82 
42,733 
1.22 
29.93 
15.64 
2,227 
1.08 
13.81 
13.69 
152,009 
1.02 
20.25 
17.76 
49,371 
1.31 
26.42 
20.48 
29,022 
1.49 
17.59 
14.42 
3,790 
1.02 
38.83 
15.70 
858 
1.05 
5.77 
12.35 
5,775 
0.90 
9.68 
10.91 
51,633 
0.84 
State 
County Name 
Observed # of Suicides 
Expected # of Suicides 
NEW MEXICO 
LINCOLN 
17 
10.72 
NEW MEXICO 
LOS ALAMOS 
8 
7.68 
NEW MEXICO 
LUNA 
22 
15.23 
NEW MEXICO 
MC KINLEY 
30 
25.16 
NEW MEXICO 
MORA 
3 
2.05 
NEW MEXICO 
OTERO 
16 
17.58 
NEW MEXICO 
QUAY 
3 
3.72 
NEW MEXICO 
RIO ARRIBA 
37 
27.71 
NEW MEXICO 
ROOSEVELT 
7 
6.83 
NEW MEXICO 
SANDOVAL 
30 
29.60 
NEW MEXICO 
SAN JUAN 
44 
40.91 
NEW MEXICO 
SAN MIGUEL 
19 
14.81 
NEW MEXICO 
SANTA FE 
81 
75.18 
NEW MEXICO 
SIERRA 
22 
11.37 
NEW MEXICO 
SOCORRO 
14 
8.99 
NEW MEXICO 
TAOS 
12 
10.79 
NEW MEXICO 
TORRANCE 
12 
7.78 
NEW MEXICO 
UNION 
1 
1.53 
NEW MEXICO 
VALENCIA 
29 
26.99 
NEW YORK 
NASSAU 
217 
225.92 
NEW YORK 
ROCKLAND 
30 
40.87 
NEW YORK 
WESTCHESTER 
163 
167.11 
TEXAS 
HIDALGO 
77 
85.03 
Observed and expected number of suicides : sum of 1996  1998 Population = Annual population 
MIXEDEFFECTS ORDINAL REGRESSION MODELS
In motivating probit and logistic regression models, it is often assumed that there is an unobservable latent variable (y) which is related to the actual response through the “threshold concept.” For a binary response, one threshold value is assumed, while for an ordinal response, a series of threshold values γ_{1}, γ _{2}, …, γ _{K}_{–1}, where K equals the number of ordered categories, γ_{0} = –∞, and γ _{K} = ∞. Here, a response occurs in category k (Y = k) if the latent response process y exceeds the threshold value γ _{k}_{–1}, but does not exceed the threshold value γ _{k}.
The traditional ordinal regression model is parameterized such that regression coefficients α do not depend on k, i.e., the model asumes that the relationship between the explanatory variables and the cumulative logits does not depend on k. McCullagh (1980) calls this assumption of identical odds ratios across the K – 1 cutoffs, the proportional odds assump
Observed Rate/100,000 
Expected Rate/100,000 
Population 
Bayes Estimate 
37.89 
23.88 
14,955 
1.64 
15.39 
14.77 
17,330 
1.01 
33.63 
23.28 
21,807 
1.70 
16.60 
13.92 
60,256 
1.44 
22.35 
15.27 
4,474 
1.07 
10.55 
11.59 
50,533 
0.86 
10.51 
13.03 
9,516 
0.93 
36.07 
27.01 
34,197 
2.07 
13.65 
13.31 
17,099 
1.00 
12.85 
12.68 
77,840 
1.00 
15.47 
14.38 
94,802 
1.25 
23.81 
18.57 
26,596 
1.38 
23.93 
22.20 
112,852 
1.58 
70.75 
36.57 
10,364 
2.33 
31.08 
19.96 
15,014 
1.48 
16.35 
14.70 
24,469 
1.08 
29.56 
19.15 
13,533 
1.39 
8.86 
13.55 
3,762 
0.95 
16.84 
15.67 
57,397 
1.15 
5.92 
6.17 
1,221,351 
0.44 
3.86 
5.25 
259,392 
0.40 
6.51 
6.67 
834,764 
0.50 
5.66 
6.25 
453,250 
0.50 
tion. This assumption implies that the effect of a regressor variable is the same across the cumulative logits of the model, or proportional across the cumulative odds. In the present context, this implies that the effect of an intervention is the same on the shift from no ideation to suicidal ideation, as it is from suicidal ideation to a suicidal attempt, an assumption that is questionable at best. As noted by Peterson and Harrell (1990), hovever, examples of nonproportional odds are not difficult to find. Recently Hedeker and Mermelstein (1998) have described an extension of the random effects proportional odds model to allow for nonproportional odds for a set of explanatory variables. For example, it sems reasonable to assume that the effects of the model covariates on suicidal ideation and suicidal attempts may not be the same. The resulting mixedeffects partial proportional odds model follows Peterson and Harrell’s (1990) extension of the fixedeffects proportional odds model.
Proportional Odds Model
Assuming that there are i = 1, …, N level2 units and j = 1, …, n_{i} level1 units nested within the i^{th} level2 unit. The cumulative probabilities for the k ordered categories (k = 1, … K) are defined for the ordinal outcomes Y as:
P_{ijk} = Pr(Y ≤ k  β_{i} ,γ_{k ,}α ),
where the mixedeffects logistic regression model for these cumulative probabilities is given as
with K – 1 strictly increasing model intercepts γ_{k} (i.e., γ_{1} > … > γ_{K}_{–1}). As before, w_{ij} is the p × 1 covariate vector and x_{ij} is the design vector for the r random effects, both vectors being for the j^{th} level1 unit nested within level2 unit i. Also, α is the p × 1 vector of unknown fixed regression parameters, and β_{i} is the r × 1 vector of unknown random effects for the level2 unit i. Since the regression coefficients ,α do not depend on k, the model assumes that the relationship between the explanatory variables and the cumulative logits does not depend on k.
As in most mixedeffects models, it is convenient to orthogonally transform the response model. Letting β = Λt, where ΛΛ^{T}=Σ_{β} is the Cholesky decomposition of Σ_{β}, the reparameterized model is then written as log
where t_{i} are distributed according to a multivariate standard normal distribution.
PartialProportional Odds Model
To allow for a partial proportional odds model the intercepts γ_{k} are denoted instead as γ_{0}_{k}, and the following terms are added to the model:
or absorbing γ_{0k} and into γ_{k}_{′}
where, u_{ij} is a [(h+1) × 1] vector containing (in addition to a 1 for γ_{0}_{k}) the values of observation ij on the subset of h covariates for which proportional odds are not assumed (i.e., u_{ij} is a subset of w_{ij}). In this model, γ_{k} is a (h+1) × 1 vector of regression coefficients associated with the h variables (plus the intercept) in u_{ij}. Notice that the effects of these h covariates vary across the (K – 1) cumulative logits. This extension of the model follows similar extensions of the ordinary fixedeffects ordinal logistic regression model discussed by Peterson and Harrell (1990) and Cox (1995). Terza (1985) discusses a similar extension for the ordinal probit regression model.
There is a caveat in this extension of the proportional odds model. For the explanatory variables without proportional odds, the effects on the cumulative log odds, namely result in (K – 1) nonparallel regression lines. These regression lines inevitably cross for some values of u*, leading to negative fitted values for the response probabilities. For u* variables contrasting two levels of an explanatory variable (e.g., gender coded as 0 or 1), this crossing of regression lines occurs outside the range of admissible values (i.e., < 0 or > 1). However, if the explanatory variable is continuous, this crossing can occur within the range of the data, and so, allowing for nonproportional odds is problematic. For continuous explanatory variables, other than requiring proportional odds, a solution to this dilemma is sometimes possible if the variable u has, say, m levels with a reasonable number of observations at each of these m levels. In this case (m – 1) dummycoded variables can be created and substituted into the model in place of the continuous variable u.
With the above mixedeffects regression model, the probability of a response in category k for the ith level2 unit, conditional on γ_{k}, α, and t is given by:
where, under the logistic response function,
with Note that P_{j}_{0} and P_{jK} = 1.
Estimation of the p covariate coefficients α, the population variancecovariance parameters Λ (with r(r+1)/2 elements), and the (h + 1)(K –1) parameters in γ_{k} (k = 1, …, K – 1) is described in detail by Hedeker and Gibbons (1994).
Illustration
To illustrate application of the mixedeffects ordinal logistic regression model, we reanalyzed longitudinal data from Rudd et al. (1996) on suicidal ideation and attempts in a sample of 300 suicidal young adults (personal communication, M.D. Rudd, Baylor University, October 2001). 180 subjects were assigned to an outpatient intervention group therapy condition and 120 subjects received treatment as usual. In this analysis, we used an ordinal outcome measure of 0=low suicidal ideation, 1=clinically significant suicidal ideation, and 3=suicide attempt. Suicidal ideation was defined as a score of 11 or more on the Modified Scale for Suicide Ideation (MSSI; Miller et al., 1986). Model specification included main effects of month (0, 1, and 6) and treatment (0=control, 1=intervention), and the treatment by month interaction. Although data at 12, 18 and 24 months were also available, the dropout rates at these later months were too large for a meaningful analysis. In addition, to illustrate the flexibility of the model, depression as measured by the Beck Depression Inventory (BDI; Beck and Steer, 1987) and anxiety as measured by the Millon Clinical Multiaxial Inventory (MCMIA; Millon, 1983) were treated as timevarying covariates in the model, to relate fluctuations in depressed mood and anxiety to shifts in suicidality.
Table A5 displays the observed proportions in each of the three ordinal suicide categories as a function of month and treatment. Table A5 reveals that, if anything, the observed rate of attempts is as high or higher in the intervention group than the control group. Note that this is also true at baseline, therefore, these difference may be an artifact of randomization.
Table A6 displays the model parameter estimates, standard errors, and associated probability estimates. A model with both random intercept and slope (i.e., month) effects failed to converge because the variance component for the intercept was close to zero. In light of this, the model was specified with a single random effect (i.e., a random slope model). Table A6 reveals that both of the timevarying covariates, depression and anxiety, were significantly associated with suicidality. By contrast, the treatment by time interaction was not significant, indicating that the intervention did not significantly affect the rates suicide ideation or at
TABLE A5 Suicide as a Function of Treatment and Time, Observed Frequencies
Month 
None 
Ideation 
SuicideAttempt 
Total 

0 
Control 
11 
43 
66 
120 

9.2% 
35.8% 
55.0% 



Active 
11 
56 
113 
180 

6.1% 
31.1% 
62.8% 


1 
Control 
70 
18 
2 
90 

77.8% 
20.0% 
2.2% 



Active 
96 
32 
7 
135 

71.1% 
23.7% 
5.2% 


6 
Control 
41 
8 
1 
50 

82.0% 
16.0% 
2.0% 



Active 
61 
11 
3 
75 

81.3% 
14.7% 
4.0% 

TABLE A6 Mixedeffects Ordinal Regression Model, Suicide Data (Rudd et. al., 1996)
tempts over time. The random month effect was significant, indicating appreciable interindividual variability in the rates of change over time.
It might be argued, that by including the effects of depression and anxiety in the model, the effect of treatment and the treatment by time interaction may be masked. For example, the intervention may work by modifying depression and anxiety which are in turn related to suicide ideation and attempts. In this case, including depression and anxiety in the model may mask the effect of the intervention. Reanalysis excluding depression and anxiety, however, also yielded nonsignificant treatment related effects, suggesting that this is not the case. Finally, a nonproportional odds model, in which the treatment by time interaction was allowed to be categoryspecific, also failed to identify any positive treatment related effects on suicide ideation or attempts.
INTERVAL ESTIMATION
Poisson Confidence Limits
A (1 –α)100% confidence interval for the true suicide rate λ can be approximated as
(11)
where = y/n, y is the total number of suicides over n time intervals (e.g., months) or geographic areas (e.g. counties), and z is an upper percentage point of the normal distribution. In general, the approximation will work extremely well for y ≥ 20. When y < 20 tabled values are available in Hahn and Meeker (1991).
As an example, consider the case in which after 61 months of observation for a particular county, 123 suicides are recorded; therefore, y = 123 and n = 61. The 95% confidence limit for the population suicide rate λ is therefore
(12)
In light of this, we can have 95% confidence that the true suicide rate is between 1.659 and 2.373 suicides per month.
Poisson Prediction Limits
Cox and Hinkley (1974) consider the case in which y has a Poisson distribution with mean µ. Having observed y their goal is to predict y*, which has a Poisson distribution with mean cµ, where c is a known constant. In the present context, y is the number of suicides observed in n previous timeperiods, and y* is the number of suicides observed in a single future timeperiod, therefore, c = 1/n. Alternatively, for a given time period, n may represent a number of counties, and our inference is to the predicted suicide rate in a single future county for the same length of time.
Following Cox and Hinkley (1974), Gibbons (1987) derived a prediction limit using as an approximation the fact that the left side of:
(13)
is approximately a standard normal deviate, therefore, the 100 (1 –a)% prediction limit for y* is formed from percentage points of the normal distribution. Upon solving for y* the upper limit value is found as the positive root of the quadratic equation:
(14)
Returning to the previous example in which after 61 months of observation for a particular county, 123 suicides are recorded, we have y = 123 and c = 1/61. The 95% confidence Poisson prediction limit is given by
The prediction limit reveals that we can have 95% confidence that no more than four suicides will occur in the next month.
Poisson Tolerance Limits
The uniformly most accurate upper tolerance limit for the Poisson distribution is given by Zacks (1970). In terms of a future time interval, we can construct an upper tolerance limit for the Poisson distribution that will cover P(100)% of the population of future measurements with (1 – a)100% confidence. The derivation begins by obtaining the cumulative probability that a or more suicides will be observed in the next time period:
(15)
which can be approximated by:
Pr(a,μ) = Pr( χ^{2} [2a+2] ≥ 2μ
(16)
where χ^{2}[f] designates a chisquare random variable with f degrees of freedom. This relationship between the Poisson and chisquare distribution was first described by Hartley and Pearson (1950). Given n independent and identically distributed Poisson random variables the sum
(17)
also has a Poisson distribution. Substituting T_{n} for μ, we can find the value for which the cumulative probability is 1 – α, that is,
(18)
The P(100)% upper tolerance limit is therefore Pr^{–1}[P; K_{1–α}(T_{n})] = smallest j (≥ 0) such that
(19)
The required probability points of the chisquare distribution can be most easily obtained using the Peizer and Pratt approximation described by Maindonald (1984:294).
Returning to the previous example, where after 61 months of observation for a particular county, 123 suicides are recorded, we have, T_{n} = 123 and n = 61. Furthermore, we require the resulting tolerance limit to have 99% coverage and 95% confidence. The cumulative 95% probability point is
The 99% upper tolerance limit is obtained by finding the smallest nonnegative integer j such that:
Inspection of standard chisquare tables (extracted below for j = 5 to 7) reveals that the value of j that satisfies this equation is 7.
6 
4.6604 
7 
5.8122 
Therefore, the P^{–1} [.99 ; K_{.95}(123)] upper tolerance limit is 7 suicides per month in the example posed.
REFERENCES
Beck AT, Steer RA. 1987. Beck Depression Inventory: Manual. San Antonio, TX: Psychological Corporation.
Bock RD, Aitkin M. 1981. Marginal maximum likelihood estimation of item parameters: an application of the EM algorithm. Psychometrika, 46: 443–459.
Böhning D, Schlattmann P, Lindsay BG. 1992. Computerassisted analysis of mixtures (C.A.MAN): Statistical algorithms. Biometrics, 48: 283–303.
Böhning D, Dietz E, Schlattmann P. 1998. Recent developments in computerassisted analysis of mixtures. Biometrics, 54: 525–536.
Böhning D. 1999. ComputerAssisted Analysis of Mixtures and Applications. MetaAnalysis, Disease Mapping and others. Chapman and Hall CRC: Boca Raton.
Cox C. 1995. Locationscale cumulative odds models for ordinal data: A generalized nonlinear model approach. Statistics in Medicine, 14: 1191–1203.
Cox DR, Hinkley DV. 1974. Theoretical Statistics. Chapman and Hall: London.
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39: 1–38.
Gibbons RD. 1987. Statistical models for the analysis of volatile organic compounds in waste disposal sites. Ground Water, 25: 572–580.
Gibbons RD, Clark DC, Fawcet J. 1990. A statistical method for evaluating suicide clusters and implementing cluster surveillance. American Journal of Epidemiology, 132 (Suppl 1): S183–S191.
Hahn GJ, Meeker WQ. 1991. Statistical Intervals: A Guide for Practitioners. Wiley: New York.
Hartley HO, Pearson ES. 1950. Tables of the chisquared integral and of the cumulative Poisson distribution. Biometrika, 37: 313–325.
Hedeker D, Gibbons RD. 1994. A random effects ordinal regression model for multilevel analysis. Biometrics, 50: 933–944.
Hedeker D, Mermelstein RJ. 1998. A multilevel thresholds of change model for analysis of stages of change data. Multivariate Behavioral Research, 33: 427–455.
Laird N. 1978. Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73: 805–811.
Liang KY, Zeger SL. 1986. Longitudinal data analysis using generalized linear models. Biometrika, 73: 13–22.
Lindsay BG. 1983. The geometry of mixture likelihoods, part I: A general approach. Annals of Statistics, 11: 783–792.
Lindsay BG. 1995. Mixture Models: Theory, Geometry and Applications. NSFCBMS Regional Conference Series in Probability and Statistics, Vol. 5. Hayward, CA: Institute of Mathematical Statistics.
Lindsay BG, Roeder K. 1992. Residual diagnostics for mixture models. Journal of the American Statistical Association, 87: 785–794.
Maindonald JH. 1984. Statistical Computation. Wiley: New York.
McCullagh P. 1980. Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society, Series B, 42: 109–142.
McLachlan GJ, Krishnan T. 1997. The EM Algorithm and Extensions. New York: Wiley.
Miller IW, Norman WH, Bishop SB, Dow MG. 1986. The Modified Scale for Suicidal Ideation: Reliablity and validity. Journal of Consulting and Clinical Psychology, 54(5):724– 725.
Millon T. 1983. Millon Clinical Multiaxial Inventory Manual. 2nd ed. Minneapolis, MN: National Computer Systems.
Peterson B, Harrell FE. 1990. Partial proportional odds models for ordinal response variables. Applied Statistics, 39: 205–217.
Pilla RS. 1997. Improving the Rate of Convergence of EM in HighDimensional Finite Mixtures. Ph.D. dissertation, The Pennsylvania State University.
Pilla RS, Lindsay BG. 2001. Alternative EM methods for nonparametric finite mixture models. Biometrika, 88: 535–550.
Rudd MD, Rajab MH, Orman DT, Joiner T, Stulman DA, Dixon W. 1996. Effectiveness of an outpatient problemsolving intervention targeting suicidal young adults: Preliminary results. Journal of Consulting and Clinical Psychology, 64(1):179–180.
Terza JV. 1985. Ordinal probit: A generalization. Communications in Statistical Theory and Methods, 14: 1–11.
Titterington DM, Smith AFM, Makov UE. 1985. Statistical Analysis of Finite Mixture Distributions. New York: Wiley.
Zacks S. 1970. Uniformly most accurate upper tolerance limits for monotone likelihood ratio families of discrete distributions. Journal of the American Statistical Association, 65: 307–316.
Zeger SL, Liang KY. 1986. Longintudinal data analysis for discrete and continuous outcomes. Biometrics, 42: 121–130.