This chapter discusses the usefulness and limitations of the surveys described in Chapter 3 for estimating the number of foreign nationals who attempt illegal entry across the U.S.–Mexico land border. The major criteria for evaluation are the nature of the target population and related issues of sample size and survey design, the frequency with which surveys are conducted and the speed with which data are made publicly available, and the types of questions that are asked about migration.
A probability survey is critical to drawing inferences to a population much larger than the number of individuals actually observed (i.e., questioned). The American Community Survey (ACS), Current Population Survey (CPS), Mexican Census (10 percent sample for the long form), Mexican National Survey of Occupation and Employment (ENOE), National Survey of Population Dynamics (ENADID), Mexican Family Life Survey (MxFLS), and Survey of Migration at the Northern Border of Mexico (EMIF-N) were designed as probability surveys. However, for the purposes of this study, each suffers from various limitations.
The ACS and CPS are U.S.-based surveys of U.S. residents rather than border crossers, and they cannot be used to directly estimate flows of unauthorized immigrants across the U.S.–Mexico border. Instead, inferences about the flows of unauthorized immigrants have to be made based on changes in the estimated stock of unauthorized residents in the United States. The estimates produced by these methods are necessarily imprecise
(see Box 3-1 in Chapter 3). Given their focus on issues of residence and intention to live in the United States, the ACS and CPS may also have problems covering people who have been in the United States for a short time (less than 1 year). They also appear to omit most of the seasonal workers, who usually live in Mexico and cross into the United States to work for a few months each year (a group that may account for a significant share, or even a majority, of unauthorized border crossers). Like all surveys tied to the decennial census, the ACS and CPS suffer from undercount. The undercount rate for unauthorized immigrants appears to be larger than for the rest of the population; estimates of the total unauthorized immigrant population based on the ACS (Hoefer et al., 2012) and the CPS (Passel and Cohn, 2011) make an adjustment for undercount in the range of 10 to 15 percent. Moreover, year-to-year comparisons have been complicated by the introduction of new population weighting methods in 2007 and 2008, redesigned questionnaires in 2008 (ACS only), and the switch to the 2010 Census as the base for weighting adjustments (in 2010 for the ACS and 2012 for the CPS).
The Mexican Census, ENOE, ENADID, and the MxFLS focus on Mexican households, a target population that is more relevant to this study than the U.S. households that are targeted by the ACS and CPS. One limitation of the Mexican Census, ENOE, and ENADID is that they miss entire households that have migrated1, thereby potentially underestimating flows from Mexico to the United States. ENOE will also miss whole households returning to Mexico because the migration information is only based on the second through fifth interviews; data from the 2010 Mexican Census suggest that about half of returning migrants return to households that did not exist prior to the return (Passel et al., 2012). The MxFLS, which tries to follow people when they move from Mexico to the United States, is less likely to miss the migration of entire households. However, only the MxFLS baseline sample (selected in 2002) reflects the national population in Mexico, and that sample is tied to the population at that time. Subsequent survey waves do not refresh the sample with new households and, hence, do not reflect the Mexican population at the time of data collection.
A more fundamental and general concern, however, has to do with the sample size of these traditional national household surveys. International migration is a relatively rare event, and it is important for a general survey to have a sample that is sufficiently large to obtain reliable information on
1 In the ENOE 2010, for example, an average of 3.5 percent of households were declared hogares mudados, or households who moved out between rounds. The reasons may be residential change in the same locality, internal migration, or international migration; we do not have precise information on the nature of the geographic mobility associated with the hogares mudados.
|Number of Mexican Households in Survey Sample with Migration Experience
|Total Number of Mexican Households That Would Need to Be Sampled by the Survey
migration. Specifically, sample sizes should be sufficiently large so as to accurately detect relatively small changes in flow rates (i.e., by a few percentage points) associated with changes in enforcement policies, market forces, and other factors. Some simple calculations by the panel using information from ENOE illustrate the challenges at hand, both for existing surveys as well as any new ones that may be put in place to specifically address the migration question. Table 4-1 shows the total number of households across Mexico that would need to be sampled in any given time period (be it quarterly or yearly) in order to obtain a target number of sampled households with “migration experience” (i.e., having crossed, or intending to cross, the U.S.–Mexico land border). The panel made two assumptions. The first assumption is that roughly 1.5 percent of households in Mexico each year have an individual who crosses the border. This assumption is based on a recent per person out-migration rate in ENOE of 3.78 per 1,000 (0.00378 percent) (Instituto Nacional de Estadística y Geografía, 2012), with average household size being around four people. The second assumption, based on documentation material for ENOE (Instituto Nacional de Estadística y Geografía, 2007:48), is that the survey response rate is approximately 85 percent. The number of households that would need to be interviewed is equal to the target sample size divided by the product of the response rate and the household out-migration rate.2 It is possible, in principle, to reduce sample sizes by oversampling in traditional Mexican “sending regions” or by otherwise using stratification or clustering based on what is known about the migration process to date. However, as discussed in Chapter 2, the sampling design would have to be adaptive to changing patterns of population migration, and strategies for oversampling could all too easily become out of date.
Total sample sizes would have to be even larger if one wanted precise flow estimates by, for example, each of the nine geographic sectors into
2 For example, 1,000/(0.015 * 0.85) = 78,431.
which the U.S. Border Patrol divides the southwest U.S. border. Assuming that the survey had information about crossing location—which ENOE currently does not—the survey design would have to capture flows from points of origin throughout Mexico to each of the geographic areas of interest at the U.S.–Mexico border. Pilot studies would very likely be needed to inform such a complex design, and the complexity of the design would make it all the more vulnerable to being rendered obsolete by changes in migration patterns.
There are different approaches to determining the benchmark sample size (represented in the left-hand column of Table 4-1). One approach consists in first deciding the magnitude of the change in migration flows that the U.S. Department of Homeland Security (DHS) would like to detect with a given probability. Another approach is based on the uncertainty that DHS is willing to accept surrounding estimates of, for example, the number of attempts by undocumented migrants by sector and by quarter.
To illustrate the type of calculations that could be carried out to estimate a benchmark for the number of sampled Mexican migrants, suppose that p is the true proportion of Mexican households with migration experience. The width (w) of the confidence interval around that proportion will be 2z[p(1-p)]0.5n–0.5, from which the number of sampled households with migration experience (n) can be calculated for any given width and for any given level of confidence. Specifically, solving for n gives n = 4z2p(1-p)/w2. Narrower intervals correspond to a more precise estimate. Supposing that p = 0.015, and that one wishes to detect a 5 percent change in p with 95 percent probability (so that z = 1.96), w = 0.0015 (calculated as 0.015 × 0.05 × 2),3 so the sample would require approximately 105,000 households with migration experience in any given time period, be it quarterly or yearly (see Table 4-2). According to Table 4-1, this would require a nationwide sample of about 8 million households. Such a survey would be on a scale much larger than the ACS, which in 2009 had a $197 million appropriation for a final interview sample of approximately 1.9 million housing units (U.S. Census Bureau, 2008, 2012). As indicated by Table 4-2, wider confidence intervals (which imply less precise estimates for DHS’s evaluation and operational purposes) require that fewer Mexican households with migration experience be sampled by the survey. Even so, the sample sizes would not be trivial—detecting a 15 percent change in p would require a sample size of approximately 12,000 households with migration experience, which in turn (according to Table 4-1) would require a nationwide sample of approximately 915,000 households.
3 Supposing p = 0.015 and wanting to be able to detect a change (plus or minus) of 5 percent, one would want to see whether p goes down to 0.01425 or up to 0.01575. Therefore, w = 0.015775 – 0.01425 = 0.0015.
TABLE 4-2 The Number of Mexican Households with Migration Experience That Needs to Be Sampled in Order to Detect (with 95 percent confidence) Changes in the True Proportion of Mexican Households with Migration Experience
NOTE: n = the number of Mexican households with migration experience that needs to be sampled; p = the true proportion of Mexican households with migration experience; w = width of the confidence interval around p; z = 1.96.
There are also other approaches to thinking about how to compute the benchmark sample size. In the preceding example, the problem could have been formulated in terms of “tolerance intervals” rather than confidence intervals (see, e.g., Krishnamoorthy and Mathew, 2009), which would have resulted in an even higher n (as tolerance intervals are similar to prediction intervals). Or, one could propose that two consecutive surveys be carried out to estimate the observed change in the number of households with migration experience in the intervening period. Regardless of how the problem is formulated, however, designing a Mexico-based survey of this size to interview households about their entries (and intended entries) across the U.S.–Mexico border would be complex, and the costs of administering and conducting such a survey would be very high. Response rates associated with a survey sponsored by a foreign government (such as the United States) and its immigration enforcement agency would likely be far lower than those currently associated with national surveys such as ENOE; the probability of erroneous response would also be higher. As discussed in Chapter 2, moreover, the survey design would need to be adaptive to changing patterns of population migration and to enforcement changes that influence non-apprehension rates. Although the actual details might vary somewhat by changing the assumptions, the order of magnitude and the complexity would not.
In contrast to the ACS, CPS, Mexican Census, ENOE, ENADID, and MxFLS, EMIF-N has a target population that is directly relevant to the estimation of unauthorized flows and that also includes significantly larger sample sizes of migrants compared to other annually collected data sources on both sides of the U.S.–Mexico border (Rendall et al., 2009:36). However, the panel notes that there are uncertainties surrounding the weighting methodology that is meant to ensure that the collected data reflect the entire Mexican population. Unlike traditional survey sampling designs, the
sampling design of EMIF-N is dynamic and adaptive. Since 1993, units in the sampling frame (i.e., cities, zones, and points) have been added and removed in response to perceived changes in the geographical and temporal distributions of migratory flows. Not every locality at Mexico’s northern border is in the sampling frame of localities at every EMIF-N administration (survey wave), and the weighting assumes that all flows are through the cities in the sampling frame (and through no other cities) at the time of that survey wave.
The accuracy of the sample weights depends on the quality of the adaptability of the sampling frame, and this is difficult to quantify. Specifically, localities and transportation modes (e.g., private cars) not in the sampling frame at a given point in time are presumed to have zero flows. The survey will have coverage error if significant flows are missing in the sampling frame. This coverage error can be reduced by expanding the covered localities and modes of transportation. Although EMIF-N investigators believe the coverage to be between 90 and 95 percent of the flows, the size of the coverage error needs to be quantified, or at least bounded. In addition, the weights for a time-location design like EMIF-N are estimated from quantities collected during the survey and require careful adherence to the sampling protocol. In particular, it requires an accurate count of the total number of people passing through sampling points during the application of the survey. Another concern relates to possible deviations from the random selection of people at sampling points (due, for example, to traveling groups). This can be resolved by refining the sampling design. In addition, new statistical methodology could be developed to adjust for uncertainty in the weights due to deviations from the desired design.
Unlike the ACS, CPS, Mexican Census, ENOE, ENADID, and EMIF-N, the Mexican Migration Project (MMP) cannot be regarded as a probability survey for two reasons. First, although households are randomly selected within communities, the communities themselves are not randomly selected. Second, the additional companion sample collected in the United States introduces additional selection bias, as it is unclear who volunteers their relatives for the U.S.-based survey and how they might differ from other Mexican emigrants. Therefore, inferences cannot be drawn from the MMP results to the larger population of communities. Even though MMP data on migrant characteristics are similar to those from ENADID (Massey and Zenteno, 2000), the MMP “is not a technique for aggregate statistical estimation” (Massey et al., 1987:12-13). Rather, it is best used in causal models, such as modeling the determinants of migration in a multivariate setting. Similarly, the Mexican Migration Field Research Program (MMFRP) seeks to explain changes in migration and settlement behavior;
its data are not meant to be representative of any larger groups (especially since survey samples and questionnaires change from year to year4).
Estimating annual flows in a timely fashion using survey data is a great challenge, and doing so on a quarterly and border sector/subregion basis is an even greater challenge. For border flow estimates to have practical value, survey data would need to be collected, analyzed, and released in a timely fashion.
There are two components that need to be considered when discussing the timeliness of surveys: the frequency with which estimates are reported and the turnaround time from data collection to release of the data. ENOE is conducted on a quarterly basis and can be used to look at short-term population movements occurring within the 15 months when its five interviews are conducted, thereby providing a snapshot of seasonality and yearly changes in migration patterns. EMIF-N (along with the CPS) also contains information that is granular at the monthly level or smaller intervals. EMIF-N has historically been released in 12-month waves, although the investigators at COLEF have recently released data for the first quarter of 2012. Surveys such as ENADID and the Mexican Census, in contrast, provide an accumulated picture of migration during a 5-year period. Moreover, ENADID—like the MxFLS—is characterized by an irregular periodicity that makes it difficult to use for planning purposes. And, because of the challenges in tracking participants across communities and international borders, the periods during which survey personnel are in the field have continued to increase between MxFLS-1 and MxFLS-3.
Granularity aside, survey data will be most useful when they are made publicly available as quickly as possible. Monthly microdata from the CPS are released very quickly (within less than a month); the Annual Social and Economic Supplement of the CPS (formerly known as the “March supplement”) is usually released in August or September. Data from ENOE are also made available relatively promptly, within a year of collection. Until recently, EMIF-N had at least a 2-year delay for the public release of data. However, the turnaround time for the most recent EMIF-N data released has been reduced to less than 12 months (as well as being released on a quarterly rather than annual basis).
4 In some years, surveyors interview all 15 to 65-year-olds, while in other years they only interview those with migration experience.
EMIF-N, the MMP, and the MMFRP are specifically designed to focus on migration and, not surprisingly, have the richest array of information on border crossings and the migration process more generally. While ENOE, the Mexican Census, ENADID, and the MxFLS do not contain as much information about migration as EMIF-N, the MMP, and the MMFRP, they do contain a number of items that are relevant to this study. Specifically, ENADID and the MxFLS ask about the documentation status of migrants, the MxFLS asks about crossing locations and (in the U.S. sample) the number of crossing attempts, and ENOE (for those not currently working) and the MxFLS ask about intentions to cross the border. (Unauthorized crossings are not illegal in Mexico, and there is substantial experience collecting data in Mexico on documentation status and mode of crossing; see Table 3-1.)
Although each of these household surveys contains useful information, they also have various gaps and limitations in terms of the questions asked. If one wanted to make substantive improvements to those survey instruments, one of the best options would be to add questions to the high-frequency and timely ENOE study. For international migration, the survey could specify the country to which individuals migrated. Questions could also be added that ask, for example, about the documentation status of migrants crossing to the United States, the number of attempts that were made, and crossing location. Furthermore, questions about intentions to move to or look for a job in the United States could be asked of all household members, not just of those currently not working. It may not be appropriate to go much beyond this, however, since the dynamic nature of the migration process (discussed in Chapter 2) could very well render questions about the migration process that are salient today less so in the near future. It is also important to ensure the “reliability” of survey instruments, which has to do with the degree to which a survey instrument elicits similar responses from different individuals under similar conditions.5
Such improvements could prove useful to researchers and others, and they would be welcome by the panel—as would, for example, improvements in the timeliness of EMIF-N. But from the perspective of estimating flows on an annual or quarterly basis, such survey modifications would still take place against the backdrop, as discussed above, of larger limitations and complexities relating to sample size and survey design. The panel also notes the challenges that arise from the fact that ENOE falls under the jurisdiction of the Government of Mexico. The addition of appropriate survey
5 Instrument reliability is typically tested in pilot studies that precede the survey but are then rarely discussed. The panel did not find documentation associated with reliability in the descriptions of the various surveys carried out in Mexico.
questions on the Mexican side would be challenging enough (to say nothing of potential problems arising from low and erroneous response rates), and the coordinated collection of complementary data from U.S. samples (which could provide useful information on undocumented migrants who successfully crossed the border) would be even more challenging. The panel does not discourage DHS from engaging with entities in Mexico that collect survey data relevant to the analysis of unauthorized migration, and in fact we believe that engagement would be beneficial. However, the difficulties of doing so, especially on such a politically sensitive issue, should also be acknowledged.
DHS appears to have multiple goals associated with obtaining information on unauthorized migration flows across the U.S.–Mexico border. Annual estimates of flows and apprehension probabilities would allow DHS to better evaluate (and report on) the effectiveness of its enforcement efforts, and estimates obtained on a quarterly basis and by specific geographic region (e.g., U.S. Border Patrol sector) might inform operational decisions regarding, for example, the allocation of enforcement resources along the border. More generally, this information would allow DHS to provide a more complete report to the public on the state of illegal immigration. With these DHS needs and interests in mind, the panel evaluated a range of surveys according to the following criteria: the nature of the target population and related issues of sample size and survey design; the frequency with which surveys are conducted and the speed with which data are made publicly available; and the types of questions that are asked about migration. The criteria chosen by the panel were based in large part on the standards of federal statistical agencies (National Research Council, 2009).
For border flow estimates to have practical value, survey data need to be collected, analyzed, and released in a timely fashion. Since ENOE is conducted on a quarterly basis and its data are released relatively promptly, it does the best job of meeting DHS’s need for timeliness. The significant reduction in the turnaround time for public release of EMIF-N data, which took place during the drafting and revision of this study, have also increased the usefulness of EMIF-N to DHS.
Although ENOE has historically fared better than EMIF-N in terms of timeliness, EMIF-N collects a much broader range of information about border crossings than does ENOE (see Table 3-1 in Chapter 3). This is unsurprising considering that EMIF-N is a specialty migration survey whereas ENOE is a general labor force survey. Even so, both ENADID and the MxFLS—neither of which are specialty migration surveys—ask questions about border crossings that ENOE does not. In order for ENOE to be useful
to DHS, basic questions about the legal status of migrants, the number of crossing attempts, crossing location, and so on would have to be added to the survey instrument. Such modifications would entail a number of administrative challenges, given that ENOE falls under the jurisdiction of the Government of Mexico.
A more important concern of the panel regarding both ENOE and EMIF-N is the nature of the target populations of those surveys. Since the focus of DHS is a specific population group (unauthorized migrants) in a particular geographic area (the U.S.–Mexico border), a survey with a relatively narrow target population—such as EMIF-N—would appear to be of greatest use to DHS. The accuracy of the sampling weights in such a survey with an EMIF-N design is difficult to quantify, however, and the panel’s concerns about its coverage error are significant. These issues are especially important given the adaptive and dynamic nature of the migration process. The panel’s concerns about coverage error and the accuracy and transparency6 of sampling weights are less pronounced for national-level household surveys such as ENOE. The issue there, rather, is that existing sample sizes are generally inadequate for detecting changes in flow rates. (Several possibilities for the magnitude of change that DHS may want to detect with 95 percent confidence are presented in Table 4-2.) Sample sizes would have to be even larger to obtain estimates for geographic subregions, and the design of the survey would be all the more complex—and, therefore, all the more vulnerable to obsolescence because of changing patterns of migration and enforcement along the border.
The financial costs that DHS would incur in establishing a new household survey in Mexico—or in adding a host of border crossing-related questions to ENOE and dramatically expanding its sample size (which would be tantamount to creating a new survey)—would be very high. The challenges associated with Mexican-side implementation and coordination would also be formidable (much more so than with adding new questions to ENOE while keeping the sample size the same), and the involvement of DHS could create additional problems relating to low and erroneous response rates. The financial costs and administrative complexity associated with any such survey would be multiplied by several factors if DHS wanted to obtain estimates by geographic subregion.
A survey such as EMIF-N that uses a time-location design and focuses directly on migrant populations holds greater promise (setting aside the panel’s concerns regarding coverage error and accuracy of sampling weights) for estimating unauthorized flows across the U.S.–Mexico border. Nevertheless, extensive and direct involvement by DHS—be it in instituting a new survey similar to EMIF-N or working to improve the existing
6 It was not until several panel members made a site visit to Tijuana, Mexico, that the panel was able to understand EMIF-N sampling weights and procedures.
EMIF-N—would still raise concerns regarding Mexican-side implementation and low and erroneous response rates, similar to those that would be raised if DHS were to be involved in a Mexican-side household survey.7
• Recommendation 4.1: For the purpose of estimating unauthorized migration flows across the U.S.–Mexico border on an annual or quarterly basis, DHS should not invest substantial resources in making major changes to existing surveys or in implementing a new survey.
• Conclusion 4.1: Existing surveys are subject to a variety of limitations having to do with target populations and associated issues of sample size and survey design, the frequency with which surveys are conducted and the speed with which data are made publicly available, and the types of questions that are asked about migration. Therefore, although survey data are critical for understanding patterns and general trends in unauthorized migration, they will not be sufficient by themselves to meet the needs of DHS for estimating unauthorized migration flows across the U.S.–Mexico border.
• Conclusion 4.2: Implementing a new household survey that meets the needs of DHS would require an investment at least comparable to that associated with the American Community Survey in the United States; any such survey would also have to fall within the purview of the Government of Mexico. A survey that uses a time-location design and focuses directly on migrant populations (e.g., EMIF-N) would be more promising, but such a non-traditional design would necessitate careful adherence to the sampling protocol and, in particular, would require that concerns about coverage error be addressed. Mexican-side implementation would also be an issue. Substantial modifications of existing general household or specialty migration surveys to meet the needs of DHS would encounter similar challenges. These challenges are only magnified by the complex and dynamic nature of the underlying migration process.
The next chapter explores the usefulness and limitations of another approach: the use of DHS administrative data as they relate to the estimation of unauthorized migration flows across the U.S.–Mexico border.
7 Sample sizes and financial costs would also be significantly smaller for an EMIF-N-like design than they would for a nationwide Mexican household survey.