GENERAL APPROACHES TO CAPTURE-RECAPTURE SAMPLING
Capture-recapture sampling (CRC) has a history reaching back at least to the 19th century (Bohning, 2008; Goudie and Goudie, 2007). It is often used to estimate the total number of individuals in a population. In its simplest form, an initial sample is obtained from the population and the individuals in the sample are “marked” in such a way that one can subsequently observe if the individual was in the sample. A second sample is obtained independently, and the number of individuals marked in the first sample is recorded. Under simplifying assumptions about the representativeness of marked individuals in both samples, the total number of individuals in the population can be estimated (Thompson, 2002). In the case of more than one recapture sample, the names “multiple-recapture,” “multiple-system methods,” or “multiple list” are often used.
CRC methods have a long history in the estimation of the abundance of biological populations, such as fish, birds, and mammals. More recently, they have been used to estimate the abundance of hard-to-reach human populations such as the homeless (Hopper et al., 2008; Laska and Meisner, 1993; Sudman et al., 1988) and to adjust for census undercounts of minorities (Darroch et al., 1993). For human populations, CRC methods are referred to as “dual-system methods” or “dual-list methods.”
Let N be the population size, n and m be the initial and second sample sizes, and X be the number of marked individuals in the second sample. Intuitively, if the second sample is representative of the population as a whole, then the proportion of marked individuals in it will be close to
the proportion in the population. Thus, the size of the population can be estimated by equating these two proportions and solving for it: N = mX/n. This is the so-called Petersen estimator (Seber, 2002).
The International Working Group for Disease Monitoring and Forecasting (1995a, 1995b) provides an excellent discussion of classical capture-recapture ideas. Other good discussions are given by Seber (2002) and Thompson (2002:Chapter 18). In a special issue of an academic journal focusing on recent developments in CRC, an editorial by Bohning (2008) also succinctly describes the state of CRC research.
Log-linear models are important in demography and are very useful in analyzing CRC data (Bishop et al., 1975). Such models have been proposed to allow for departures from homogeneity of the capture probabilities between individuals and/or associations between the two sampling processes (Fienberg, 1972). The capture history of an individual can be classified into four categories based on observation or non-observation in the first and second sample. This can be represented by a four-cell multinomial model. If the capture probabilities of the individuals are homogeneous within each of the samples, then the maximum likelihood estimate of N is the integer part of the Petersen estimator. If the captures and recaptures are treated as separate factors, then the number of capture histories falling into the various categories can be modeled as Poisson or multinomial counts. Different estimators can be derived under different assumptions about the population and sampling processes. More importantly, log-linear models allow for (positive or negative) dependencies between the captures to be modeled, especially if there are multiple recaptures (Bishop et al., 1975). A good application of this approach when two recaptures are made is given by Darroch and colleagues (1993). Pledger (2000) developed a unified linear-logistic framework for fitting many of these models. Baillargeon and Rivest (2007) present an R package to estimate many capture-recapture models, focusing on those that can be expressed in log-linear form.
Other approaches tend to model the heterogeneity in specific forms, typically by incorporating random effects for them. Darroch and colleagues (1993) developed Rasch-type models for CRC in the context of human censuses and supplementary demographic surveys. They also developed log-linear quasi-symmetry models. Other extensions include methods of finite mixtures to partition the population into two or more groups with relatively homogeneous capture probabilities. Examples of these are the logistic-normal generalized linear mixed model and log-linear latent class models with homogeneity within the classes (Agresti, 2002:Sections 12.3.6, 13.1.3, 13.2.6).
Fienberg and colleagues (1999) integrate many of the above approaches for multiple-recapture or multiple-list data in developing a mixed effects approach (fixed effects for the lists and random effects for the individuals). This approach allows the modeling of the dependence between lists
and the incorporation of covariates. They develop Bayesian inference for their specification. Manrique-Vallier and Fienberg (2008) expand on this approach, modeling individual-level heterogeneity using a Grade of Membership model wherein individuals are postulated as mixtures of latent homogeneous but extreme “ideal” types.
Many populations, including that of unauthorized crossers, are open in the sense that the population experiences change during or between the sampling (e.g., births, deaths). Many of the models reviewed above implicitly presume the population is closed (i.e., have fixed and unchanging membership). For open populations, interest typically has focused on the case where the population is closed during the period of each capture and experiences immigration and mortality between the capture periods. Cormack (1989) reviews many of the classical models for this case. Pledger and colleagues (2003) extend these to allow for individual heterogeneity in survival and capture rates using a finite mixture formulation. These models are receiving continuous development (see the review by Royle and Dorazio [2010]).
CAPTURE-RECAPTURE APPLICATIONS TO UNAUTHORIZED BORDER CROSSINGS
The most direct expression of capture-recapture ideas as applied to unauthorized border crossings is the work of Espenshade (1990, 1995b) and Singer and Massey (1998). They develop simple CRC models in the context of apprehensions (“capture”) and re-apprehension (“recapture”) of unauthorized crossers. Specifically, Espenshade (1995b) models as a geometric distribution the number of crossings an individual makes until a successful crossing. Under assumptions that individuals continue to attempt crossings until they succeed, that the probability of success is the same for each attempt, and other strong assumptions, he derives the equivalent of the Petersen estimator for the number of unauthorized crossers. He does not develop measures of uncertainty of this estimate, nor does he tie the work into the broader CRC literature. This approach is similar in spirit to that of the “frequency of apprehension frequencies” discussed in Chapter 5. Chang and colleagues (2006) extend these methods to treat “discouragement” due to prior apprehension and “return and rentry” due to unobserved exit and reentry into the United States. However, the panel did not have access to their paper and therefore could not review it; the only available description was by Morral and colleagues (2011).
A variant of CRC is “red teaming,” in which individuals are recruited to attempt to cross so as to get an estimate of the probability of apprehension. This is referred to as plant-capture in the ecological literature (Goudie et al., 2007).