An Active Research Program
STATISTICAL agencies need active research programs that are closely tied to their mission of producing relevant and high-quality statistics. Research is not an “optional” or “extra” activity that can be deferred whenever resources are tight. It produces the innovation that refreshes relevance. The underfunding of statistical agencies’ research has threatened the data infrastructure that provides vital information needed by governments, businesses, organizations, and individuals.18
To maintain relevance for public and policy purposes, federal statistical agencies must identify emerging needs and look for ways to develop new information sources. To improve the quality and timeliness of their data products, they must keep abreast of methodological and technological advances and be prepared to implement new procedures in a timely manner (see Practice 3). They must also continually seek ways to make their operations more efficient (see Practice 6).
An effective statistical agency’s research program includes research on the substantive issues for which the agency’s data are compiled as well as methodological research to improve statistical methods and operational procedures. A key and growing research concern for statistical agencies in recent years focuses on the use of administrative records and alternative data sources to enhance or potentially replace some of the information currently obtained through surveys. Current research questions being examined because of this concern include how closely statistics from
18 E.g., see https://www.linkedin.com/pulse/federal-statistical-agencies-struggle-maintain-vital-role-citro/?trackingId=hWmaUxpC4ao5VxtMmWioyg%3D%3D. [February 2021]
these sources correspond to existing measured concepts, what additional information they may offer, and methodological issues for evaluating quality and integrating data sources.
Substantive Research and Analysis
A statistical agency should include staff with responsibility for conducting objective substantive analyses of the data that the agency compiles, such as analyses that assess trends over time or compare population groups. Substantive analyses provided by an agency should be kept relevant to policy by addressing topics of public interest and concern; however, such analyses should not espouse policy positions or be designed to reflect any particular policy agenda (see Martin, 1981; Norwood, 1975; Triplett, 1991). The existence and output of an analytical staff can contribute not only to the knowledge base in the applicable subject areas, but also to the credibility, relevance, accuracy, timeliness, and cost-effectiveness of the agency’s data collection programs. Benefits that a strong subject-matter staff bring to a statistical agency include the following:
- Agency analysts understand the need for the data from a statistical program and how the data will be used, and they can communicate more effectively with data users (see Practice 9).
- Agency analysts have access to the complete microdata and so are better able than outside analysts to understand and describe the limitations of the data for analytic purposes and to identify errors or shortcomings in the data that can lead to subsequent improvements (see Practice 10).
- Substantive research maintains the relevance of an agency’s data program, suggesting changes in priorities, concepts, and needs for new data or discontinuance of outmoded or little-used series.
An agency’s subject-matter analysts should be encouraged and have ample opportunity to build networks with analysts in other agencies, academia, the private sector, other countries, and relevant international organizations and to present their work at relevant conferences and in working papers and refereed journal articles (see Practice 4).
Research on Methodology and Operations
Statistical agencies should be innovative in the methods they use for data collection, processing, estimation, analysis, and dissemination, with the goals of improving data accuracy, timeliness, and operational efficiency and of reducing respondent burden. Careful evaluation of new methods is required to assess their benefits and costs in comparison with current methods and to determine effective implementation strategies, including the development of methods for bridging time series before and after a change in procedures.
Research on methodology and operations must be ongoing and geared to both current and future needs. Some current research topics include
- developing methods for producing rapid statistics to respond to high-priority situations or emergencies, such as the COVID-19 pandemic;19
- evaluating administrative records for use to replace or enhance existing surveys;
- assessing uncertainty when combining data from a variety of data sources;
- developing models for improved forecasting in subnational areas (e.g., Young, 2019);
- improving the accuracy of survey estimates in the presence of nonresponse;
- using adaptive designs for maintaining and improving the quality and the cost-effectiveness of surveys;
- understanding and minimizing mode effects on data quality; and
- developing and evaluating new methods of confidentiality protection.
Surveys will likely remain an important component of federal statistical agencies’ portfolios because (1) some information is best (or only) obtained by asking questions; and (2) surveys can collect information on many characteristics at the same time, thereby permitting rich multivariate analysis. But declining survey response rates are making it increasingly difficult to maintain high data quality while controlling data collection costs (see NRC, 2013c; NASEM, 2017d). Many of the large
19 See, for e.g., https://www.census.gov/newsroom/press-kits/2020/pulse-surveys.html [February 2021]; https://www.cdc.gov/nchs/covid19/index.htm. [February 2021]
federal surveys are designed to produce annual nationwide estimates and do not produce the rapid and granular estimates needed by some data users. It is thus essential to consider how administrative records and alternative data sources can bolster the completeness, quality, and utility of statistical estimates while containing costs and reducing respondent burden (see NASEM, 2016b, 2017b).
Expanding the Statistical Use of Administrative Records
Administrative records include records of federal, state, and local government agencies that are used to administer a government program. Examples include U.S. Social Security Administration records of payroll taxes collected from workers and benefits paid out to beneficiaries; state agency records provided by applicants for assistance programs and payments to applicants deemed eligible; and property tax records of local governments and federal agencies. Administrative records have been used for statistical purposes for many years to generate up-to-date population estimates by age, gender, race, and ethnicity. In turn, these estimates are used to adjust population survey weights for coverage errors and for many other purposes (see, e.g., NRC, 2004a, 2007b).
Some of the many examples of statistical agencies’ use of administrative data include the Census Bureau using tax records for the economic censuses for small and nonemployer businesses,20 the National Center for Health Statistics’ National Vital Statistics System drawing upon birth and death records from the states,21 and the National Center for Education Statistics’ National Postsecondary Student Aid Study drawing upon federal and institutional administrative data to analyze student financial aid.22 Research is being conducted to assess whether tax records can replace income items in the American Community Survey (see NASEM, 2019a). Administrative records are also frequently used with survey data to produce model-based estimates with improved accuracy for small geographic areas or population groups (see, e.g., NRC, 2000c, 2000d)
There are many other potential statistical uses for administrative records from program agencies, and expanding the use of these records could improve the cost-effectiveness and quality of some statistical
20 Nonemployer businesses include just the sole proprietor with no other employees.
21 See https://www.cdc.gov/nchs/nvss/index.htm. [February 2021]
22 See https://nces.ed.gov/surveys/npsas/index.asp. [February 2021]
programs. Potential uses include substituting administrative records for specific survey questions and adding richness to a combined dataset by appending administrative records variables to matched survey records (e.g., Commission on Evidence-Based Policymaking, 2017; NASEM, 2018d, 2019b; NRC, 1997a, 2009e; NRC and Institute of Medicine, 2012). Administrative records from multiple federal agencies are also being used in the decennial census to verify vacant units and, when good information exists, to fill in data if an initial nonresponse follow up visit is not successful in locating a respondent.23
Evaluating and Using Alternative Data Sources
This data-rich age has a multitude of data sources beyond administrative records, including data gleaned or “scraped” from Internet websites (e.g., price quotes or social media postings), data extracted from sensors (e.g., from traffic cameras), and data obtained from the private sector (e.g., credit card transactions or scanner data on retail purchases). Often, these sources generate large volumes of data that require computationally intensive techniques for extracting useful information for statistics (see NASEM, 2017b, 2017d; NRC, 2008a). However, to make use of most nontraditional data sources, it is necessary for statistical agencies to first evaluate the accuracy and error properties of the data.
In an era when data users expect timeliness and when budgets are constrained, researchers in statistical agencies should explore how nontraditional data sources can contribute to their programs (see NASEM, 2017b, 2017d). Procedures could include (1) augmenting information obtained from traditional sources; (2) replacing information elements previously obtained from traditional sources; (3) providing preliminary estimates that are later benchmarked with traditional sources; and (4) analyzing information streams to identify needed changes (e.g., in types of jobs, education majors) in statistical classifications and survey questions.
A major challenge for statistical agencies has been the difficulty of identifying, locating, and accessing administrative records that could be useful for their programs. As the Foundations for Evidence-Based Policymaking Act of 2018 (Evidence Act) is implemented, it is hoped
23 See https://www2.census.gov/programs-surveys/decennial/2020/program-management/planning-docs/administrative-data-use-2020-census.pdf. [February 2021]
that the data inventories and practices of the program agencies will make these resources more transparent and make processes for obtaining these datasets for statistical purposes more streamlined (also see Practice 8).
In considering their strategies, statistical agencies should adopt broad quality frameworks that capture user needs, including aspects such as relevance, accuracy, timeliness, comparability (over time and with other data sources), transparency, accessibility, privacy, protection from outside manipulation, and interpretability. They should examine the tradeoffs between different quality aspects, such as trading precision for timeliness and granularity (see NASEM, 2017b, and Appendix C). An agency’s own research staff can assist in examining these tradeoffs, and the Federal Committee on Statistical Methodology (2019, 2020) also has been pursuing work in this area to assist agencies.
Value of an Active Research Program
Supporting federal agencies’ in-house research staffs is critical given the challenges and opportunities posed by the increasing availability of alternative data sources. Many current practices in statistical agencies were developed through research they conducted or obtained from other agencies. Federal statistical agencies, frequently in partnership with academic researchers, pioneered the use of statistical probability sampling, the national economic accounts, input-output models, and other analytic methods. The U.S. Census Bureau pioneered the use of computers for processing the census. Several statistical agencies use academic principles of cognitive psychology—a research strand dating back to the early 1980s (see NRC, 1984)—to improve the design of questionnaires, the clarity of data presentation, and the ease of use of electronic data collection and dissemination tools. History has repeatedly shown that research conducted within federal statistical agencies on subject areas, methods, and operations can lead to large productivity gains in statistical activities at relatively low cost (see, e.g., Citro, 2016; NRC, 2010c).
An effective statistical agency also actively partners with the academic community for methodological research. It seeks out academic and industry expertise for improving data collection, processing, and dissemination operations. For example, a statistical agency can learn techniques and best practices for improving software development processes from computer scientists (see NRC, 2003c, 2004d). An effective agency also
learns from and contributes to methodological research of statistical agencies in other countries and relevant international organizations (see Practice 7). Thus, it is important for agency staff to seek to publish their work in the leading peer-reviewed journals, which enables broader dissemination as well as adding credibility to the changes the agency makes.
Preparing for the future requires that agencies periodically assess the scope of existing data series, alter data series as required, and innovate to improve their programs. Because of the decentralized nature of the federal statistical system, innovation often requires and benefits from cross-agency collaboration (see Practice 7) and a willingness to implement different kinds of data collection efforts to answer different needs, while being mindful of the need for historical trend data and comparability across different levels of geography.
This page intentionally left blank.