Nonresponse Research in Federal Statistical Agencies
Although the panel considered the issue of nonresponse in surveys in both the private and the public sector and in both the United States and abroad, we placed more emphasis on U.S.-public-sector–sponsored surveys primarily because, with a few important exceptions, the largest, most consistent, and most costly survey operations in social science fields are conducted by and for the U.S. federal government.
In its two workshops, the panel heard from survey methodologists from five U.S. federal statistical agencies who summarized the state of nonresponse research in their agencies. These presentations are summarized in this appendix.
BUREAU OF LABOR STATISTICS
In a presentation to the panel, John Dixon of the Bureau of Labor Statistics (BLS) stated that the response rates in surveys sponsored by BLS range from a high of about 92 percent in the Current Population Survey (CPS) (labor force and demographics) to about 55 percent in the Telephone Point of Purchase Survey (TPOPS) (commodity and services purchasing behavior). The response trends for most BLS surveys are stable. The Consumer Price Index Housing Survey had a problem at the end of 2009 due to budgetary constraints, but has recovered. TPOPS had a decline in the last decade, but has stabilized. The American Time Use Survey (ATUS) has been low, but stable. TPOPS is a random digit dialing (RDD) survey, and ATUS is a telephone survey of specific members of CPS households. Reporting on bias studies, Dixon said that a CPS-Census match yielded propensity scores
that indicated little bias in labor force statistics; the time-use survey studies have also found little bias except for “volunteering” (see Dixon, 2012). The Consumer Expenditure Survey studies have found very little bias in expenditures (Goldenberg et al., 2009).
In conducting these surveys, BLS tends to use six methods to evaluate nonresponse: linkage to administrative data; propensity scores and process data; the results of experiments with alternative practices and designs; comparisons to other surveys; benchmark data; and the R-index. When linking survey to administrative data, BLS has found that the estimate of bias due to refusals based on the last 5 percent is similar to the estimate based on linkage to the Census 2000 long-form sample. However, these studies have shortcomings in that rarely are all the records linked successfully. Consequently, the linked measure may be defined differently from the survey estimate, and it may have error.
The R-index uses a propensity score model for nonresponse and relates that to other variables (usually frame variables, such as urbanicity, poverty, etc.). The BLS studies used 95 percent confidence intervals for the R-index, somewhat flatter than the response rate. Since one of the major flaws in nonresponse studies lies in what is not known, the use of confidence intervals that account for the estimation of both the measure of interest and the model of nonresponse would be helpful.
Panel member Nancy Bates from the Census Bureau reported that Census Bureau nonresponse research studies have covered the gamut. Topics have included causes of nonresponse, techniques for reducing nonresponse, nonresponse adjustments, nonresponse metrics and measurement, consequences of nonresponse (bias, costs), nonresponse bias studies, responsive designs and survey operations, the use of administrative records and auxiliary data and paradata, level of effort studies, and panel or longitudinal survey nonresponse. During her presentation, Bates offered different examples of research, including mid-decade decennial census tests to target bilingual Spanish language questionnaires; a test adding a response “message deadline” to mail materials; the addition of an Internet response option; and varying the timing of the mail implementation strategy (e.g., the timing of advance letters, replacement questionnaires, and reminder postcards). Nonresponse research in conjunction with the 2010 Census included an experiment that tested different confidentiality and privacy messages and another that increased the amount of media spending in matched-pair geographic areas. Additionally, the Census Bureau sponsored three ethnographic studies to better understand nonresponse among hard-to-count populations.
Bates also discussed nonresponse research associated with the American Community Survey (ACS), including a questionnaire format test (grid versus sequential layout), a test of sending additional mailing pieces to households without a phone number, and a test of adding an Internet option as a response mode. For other Census Bureau demographic surveys, Bates mentioned nonresponse tests involving incentives (debit cards) to refusals in the Survey of Income and Program Participation and in the National Survey of College Graduates. Other examples included nonresponse bias studies, including studies considering the use of propensity models in lieu of traditional post-adjustment nonresponse weights. She concluded with a discussion of administrative records and how they hold great potential for understanding non-ignorable nonresponse. Currently, most Census Bureau studies using administrative records are more focused on assessing survey data quality, such as underreporting or misreporting, and less focused on nonresponse.
Many Census Bureau nonresponse research projects are tied to a particular mode, namely mail, since both the decennial census and the ACS use this mode. Bates observed that many Census Bureau research projects are big tests with large samples and several test panels. The majority of tests try out techniques designed to reduce nonresponse, while only a few are focused on understanding the causes of nonresponse.
Bates concluded with the following recommendations:
• Leverage the survey-to-administrative-record match data housed in the new Center for Administrative Records Research and Applications. This could have great potential for studying nonresponse bias in current surveys.
• Make use of the ACS methods panel for future nonresponse studies. Its multimode design makes it highly desirable.
• Leverage decennial listing operations to collect paradata that could be used across surveys to examine nonresponse and bias.
• Select a current survey that produces leading economic indicators and do a “360-degree” nonresponse bias case study. (This ties into a recent Office of Management and Budget request on federal agency applications of bias studies.)
• Going forward, think about small-scale nonresponse projects that fill research gaps and can be quickly implemented (as opposed to the traditionally large-scale ones undertaken by the Census Bureau).
• Expand the collection and application of paradata to move current surveys toward responsive design (including multimode data collection across surveys).
NATIONAL AGRICULTURAL STATISTICS SERVICE
The National Agricultural Statistics Service (NASS) surveys farms, which are both establishments and, in surveys such as the Agricultural Resource Management Survey, households. Jaki McCarthy of NASS reported at the panel’s workshop that NASS has conducted studies of its respondents and nonrespondents in an effort to test whether knowledge of and attitudes toward NASS as a survey sponsor had an effect on response. The agency found that cooperators have more knowledge and better opinions of NASS statistics. Other studies of the relationship between burden and response found no consistent relationship between nonresponse and burden as measured by the number and complexity of questions. In fact, the highest burden sample units tend to be more cooperative than low-burden units.
Other NASS studies looking at the impact of incentives on survey response have found that $20 ATM cards increased mail response, although not in-person interview responses, and that they were cost-effective and did not increase bias. Calibration-weighting studies found that calibration weighting decreased bias in many key survey statistics.
NASS is currently exploring use of data mining to help predict survey nonrespondents and determine if current patterns can be used to help provide explanatory power or if, instead, they are most useful for non-theoretical predictive power. Preliminary findings suggest that in large datasets many variables are significantly different among cooperators, refusals, and non-contacts, but although the differences are significant, they are usually small in practical terms. Many variables are correlated, and using these variables alone is not useful in predicting individual nonresponse or managing data collection.
A breakthrough procedure is to use classification trees in which the dataset is split using simple rules and all variables and all possible breakpoints are examined. In this procedure the variable maximizing the difference between subgroups is selected, and a rule is generated that splits the dataset at the optimum breakpoint. This process is repeated for each resulting subgroup. The classification trees are used to manage data collection and, in the process, allow an indication of nonresponse bias. By this means it is possible to identify likely nonrespondent groups that will bias estimates.
Despite this research, there are still a number of important and foundational “unknowns,” which she summarized as follows: Is nonresponse affecting estimates? Is there bias after nonresponse adjustment? What are the important predictors of nonresponse? Can these be used to increase response? Who are the “important” nonrespondents?
NATIONAL CENTER FOR HEALTH STATISTICS
National Center for Health Statistics (NCHS) research supports a very active survey management activity designed to reduce nonresponse. As reported by Jennifer Madans of NCHS at the panel’s workshop, the National Health Interview Survey (NHIS) research focuses on issues of nonresponse, with much of the research making use of paradata collected as part of the survey. NCHS uses a so-called contact history instrument, audit trails of items and interview times using the Blaise survey management platform, and analysis of the front and back sections of the survey instrument. The issues NCHS has been investigating include differences arising from reducing the length of the field period and the effort that the interviewer makes and the trade-offs between response rates and data quality. The research has found that the loss of high-effort households had minor impacts on estimates. The research also found that respondent reluctance at the first contact negatively impacts data quality. Interviewer studies have found that pressure to obtain high response rates can be counterproductive in that the pressure often leads to shortcuts and violations of procedures. These investigations have helped to develop new indicators to track interview performance in terms of time, item nonresponse, and mode.
The National Survey of Family Growth has focused on paradata-driven survey management. The survey collects paradata on what is happening with each individual case. These paradata are transmitted every night, analyzed the following day, and used to manage the survey. The paradata measures include interviewer productivity, costs, and response rates by subgroup. They emphasize sample nonrespondents, the use of different procedures (including increased incentives), and identification of cases to work for the remainder of field period.
To measure content effects the National Immunization Survey (NIS) has run several controlled experiments, along several lines of inquiry. In one experiment, NIS used such tools as an advance letter, screener introduction, answering machine messages, and caller ID (known name versus 800 number). Other experiments involved scheduling of call attempts by type of respondent and nonrespondent; incentives (prepay plus promised) to refusals and partials; propensity modeling for weighting adjustments; dual frame sampling (landline plus cell phone RDD samples) and oversampling using targeted lists; and benchmarking results against the NHIS. Findings thus far include that the response rate showed differences when the content and wording of the screener introduction were varied; advance letters, which were improved for content, readability, contact and callback information, and Website information, improved participation; a legitimate
institutional caller ID improved callbacks and participation versus an 800 number; optimized call scheduling improved participation; an optimized number of call attempts by disposition type reduced costs and improved participation; and having call centers in different time zones led to improved contact and call scheduling.
NATIONAL CENTER FOR SCIENCE AND ENGINEERING STATISTICS
Work by the National Center for Science and Engineering Statistics (NCSES) centers on research to minimize nonresponse, handle nonresponse statistically, and evaluate nonresponse bias. Future research, according to Steven Cohen of the NCSES at the panel’s workshop, will focus on responsive designs, increased use of paradata, and nonresponse bias analysis on the National Survey of College Graduates by making comparisons to the American Community survey.