Definition of a Federal Statistical Agency
A federal statistical agency is a component of the federal government whose principal function is the compilation and analysis of data and the dissemination of information for statistical purposes.6
THE COMPONENT IS RECOGNIZED as a distinct entity. It may be located within a cabinet-level department or an independent agency, or it may be an independent agency. It may be labeled a bureau, center, division, office, or other entity so long as it is recognized as a distinct entity.
Compilation may include direct collection of data from individuals, organizations, or establishments through surveys. It may also include the acquisition of data from other sources, such as administrative records maintained by government agencies to operate a program, datasets available from the private sector, or data gleaned from sensors or selected Internet websites.7
6 The Confidential Information Protection and Statistical Efficiency of 2002 (P.L. 107-347, Section 502(8)) provides a similar definition of a statistical agency: “An agency or organizational unit of the executive branch whose activities are predominantly the collection, compilation, processing, or analysis of information for statistical purposes.” There are 13 principal statistical agencies (the focus of this document), several “recognized statistical units” (components of nonstatistical agencies), and statistical programs, such as a survey or time series, which can be in any agency. The designation of a principal statistical agency and recognized statistical unit is not entirely consistent in legislation and guidance (see “Brief History of the U.S. Federal Statistical System,” below, and Appendix B).
7 “Data,” “information,” and “statistics” do not have clear definitional boundaries: this document generally uses “data” to refer to individual responses or items in a dataset and “statistics” or “information” to refer to data that have been organized, modified as necessary
Analysis may take various forms. It includes methodological research to improve the quality, usefulness, and usability of data; the use of data gathered in real time about a source (paradata) to make compilation as efficient and error-free as possible; and modeling to combine data from more than one source into useful statistics. It also includes substantive analysis, such as developing indicators from one or more data series, making projections, interpreting data, and explaining differences among statistics obtained by different methods, such as surveys and administrative records. An analysis by a statistical agency does not advocate policies or take partisan positions.
Dissemination means making information available to the public, to the executive branch, and to Congress in easily accessible and usable forms with appropriate documentation to facilitate informed use. Dissemination also means taking care to curate information so that its value and usability are maintained over time.8
Statistical purposes include description, evaluation, analysis, and inference for groups of individuals or other units; they do not include interest in or identification of an individual person or economic unit. A statistical agency collects data directly from providers or from other sources for statistical purposes. It does not use these data for nonstatistical purposes, such as regulation or law enforcement. It also protects the confidentiality of responses collected under a confidentiality pledge and safeguards individual records against unauthorized access.9
This definition of a federal statistical agency does not include many statistical activities of the federal government, such as statistics compiled by an operating agency for administrative purposes (e.g., U.S. Office of Personnel Management statistics on new hires and retirements). Nor does it include agencies whose primary functions are the conduct or support of problem-oriented research, although their research may be based on information gathered by statistical methods, and they may sponsor important surveys: examples include the National Institutes of Health, the
(e.g., to weight survey responses to national population controls), and otherwise processed for use.
8 Data curation includes and is more than data archiving; it encompasses an explicit data preservation policy, validation of the data and documentation to be archived, procedures for migration of the data to new media, and other policies and procedures to ensure that authorized researchers and others can access, understand, and use the data over time; see Practice 5.
9 The Federal Cybersecurity Enhancement Act of 2015 (part of P.L. 114-113) upgraded the protection of individual records against cyberattacks. Concomitantly, to respect the autonomy of respondents, statistical agencies revised their confidentiality pledges to note that respondents’ records are screened to protect against cybersecurity risks; see Principle 3 and Practice 8 for a fuller discussion; see also Appendix A.
Agency for Healthcare Research and Quality, and other agencies in the U.S. Department of Health and Human Services.
This definition of a statistical agency also does not usually include agencies whose primary function is policy analysis and planning (e.g., the Office of Tax Analysis in the U.S. Department of the Treasury, the Office of the Assistant Secretary for Planning and Evaluation in the U.S. Department of Health and Human Services). Such agencies may collect and analyze statistical information, and statistical agencies, in turn, may perform some policy-related analysis (e.g., produce reports on trends in after-tax income or child care arrangements of families). However, to maintain credibility as an objective source of accurate, useful information, statistical agencies must be separate from units that are involved in developing policy and assessing policy alternatives.
Statistical agencies, as noted above, typically collect information under a pledge (to a person or organization) of confidentiality. Statistical agencies may collect information that identifies individual government agencies when the data are already public information—as, for example, in the Census Bureau’s program of statistics for state and local governments (see National Research Council, 2007a) and the National Center for Science and Engineering Statistics’ program to collect information on research and development spending from federal agencies (see National Research Council, 2010a).
Occasionally, statistical agencies are charged to collect information for both statistical and nonstatistical purposes. Three examples are:
- The Bureau of Transportation Statistics (BTS) maintains the Airline On-Time Statistics Program (originated by the former Civil Aeronautics Board), which identifies individual airlines.10 However, BTS does not itself use the data for administrative or regulatory purposes—those functions are carried out by the Federal Aviation Administration—and the data are not collected under a pledge of confidentiality (see National Research Council, 1997b).
- Higher education institutions that participate in federal student aid programs are required by law (20 USC 1094(a)(17)) to respond to the Integrated Postsecondary Education Data System (IPEDS) from the National Center for Education Statistics (NCES). The data provided to IPEDS on enrollments, graduation rates, faculty and staff, finances, institutional prices, and student financial aid are not collected under a pledge of confidentiality, and NCES makes information on individual institutions available to parents and students to help them in choosing a college, as well as to researchers and others.11
10 See https://www.transtats.bts.gov/ONTIME/Index.aspx [April 2017].
11 See http://nces.ed.gov/ipeds [April 2017].
- The U.S. Census Bureau participates with a consortium of agencies, led by U.S. Customs and Border Protection, in the Automated Export System (AES), by which exporters electronically file information about shipments. The Census Bureau and other statistical agencies use information in the AES for statistical purposes and do not make the data available to others—that is, they maintain confidentiality following their own data acquisition. Independently, export enforcement agencies use data they acquire from the system to administer U.S. export laws.12
Statistical agencies should carefully consider the advantages and disadvantages of undertaking a program with both statistical and nonstatistical purposes. A potential advantage is that there may be improved consistency, quality, and cost-effectiveness when a statistical agency collects information for its own use and that of other agencies. A potential disadvantage is that the program may compromise the public perception of the agency as objective and separate from government administrative, regulatory, and enforcement functions.
When an agency decides to carry out a program that has both statistical and nonstatistical uses, it must clearly describe the program on such dimensions as the extent of confidentiality protection, if any (e.g., some but not all of the data may be collected under a pledge of confidentiality); the statutory basis for the program and the public purposes it serves, including benefits to respondents from having comparative information available of uniform quality; and the role of the agency (e.g., providing information to the public, working with respondents to improve reporting).13 Should an agency decide that the nature of a program is such that no amount of description or explanation is likely to make it possible for the agency to maintain its credibility as a statistical agency, it should decline to carry out the activity.
12 See https://www.cbp.gov/trade/aes [April 2017].
13U.S. Office of Management and Budget (2007:Sec. V) spells out the requirements for statistical agencies to inform the public and respondents regarding any programs with nonstatistical uses.