National Academies Press: OpenBook
« Previous: 6 Making the Practices of the National Center for Science and Engineering Statistics More Transparent
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

7

Best Practices for Federal Statistical Agencies

BEST PRACTICES FOR DOCUMENTATION, RETENTION, RELEASE, AND ARCHIVING OF DATA

The following tables identify the information whose retention or public release, depending on the type of information, would fully support transparency in the production of a given set of official statistics. The tables therefore provide a roadmap for transparency in methods, operations, and data quality. For some types of information, this means internal retention on an agency’s computer system in a permanent and internally accessible manner. For other types of information, this means the public release, such as on an agency Website or by making the information available when requested or else through release in a secure manner, for example to a federal statistical research data center.

We understand that there can be specific legal prohibitions, the need to protect proprietary information, contractual obligations, memoranda of understanding, or other constraints that could make it impossible to publicly release some information or even to keep some data internally for a period of time. On those occasions, the agencies are obligated to state publicly what has not been retained or released and why.

These tables and the accompanying recommendations are meant to be applicable to all of the principal federal statistical agencies. The Office of Management and Budget (OMB) or Interagency Council on Statistical Policy should consider monitoring how closely the principal U.S. federal statistical agencies follow these tables, acknowledging those agencies that come close to complete adherence to them.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

The tables are organized as follows:

Table 7.1 Documenting Basic Elements of a Statistical Program

Table 7.2 Documenting Statistical Programs Using Survey Data

Table 7.3 Documenting Statistical Programs Using Administrative Records and/or Digital Trace Data

Table 7.4 Documenting Data Integration Issues

Table 7.5 Documenting Paradata from Statistical Programs

Table 7.6 Archiving of Data

In the tables, the leftmost column, “Information to retain,” identifies the informational components. The middle and rightmost columns identify the documentation of methods and the archiving of input data and official statistics, along with associated metadata, that should be retained within a federal statistical agency (middle column), and those that should be made available to the public (right column). All of these actions support the various benefits of transparency previously discussed (see Box S-1). The metadata standards discussed and presented in Chapter 5 address a substantial percentage of the contents of these tables. For the areas that the current group of standards does not address, there is an opportunity to join the development efforts to improve the scope of the standards.

A number of lists with similar elements have been compiled, including OMB’s Standards and Guidelines for Statistical Surveys, the (unpublished) Census Bureau’s Statistical Quality Standards, the American Association for Public Opinion Research Code of Ethics and Practices,1 Federal Committee on Statistical Methodology Statistical Policy Working Paper #31, and the Committee on National Statistics’ Principles and Practices for a Federal Statistical Agency. The panel wanted to create this new list as an easy reference source of the elements from many of these standard documents with respect to surveys. In addition, for issues such as what to retain regarding administrative record data sources, use of digital trace data, and model-based estimates, we believe that the guidelines we submit are a reasonable start for documenting research in still developing areas.

___________________

1https://www.aapor.org/Standards-Ethics/AAPOR-Code-of-Ethics.aspx.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE 7-1 Documenting Basic Elements of a Statistical Program

Information to retain or archivea To be available internally, to program staff To be available externally, to the public
The Estimation Problem: The concept or concepts this official statistical program is measuring (e.g., the percentage of U.S. households in poverty at the county level in April 2020, where poverty is defined using … and households are defined as … ). This should include
✔ the precise definition of the key concepts
✔ the relevant population
✔ the levels of aggregation at which the estimates are provided
✔ the relevant time period covered by the estimates
✔ the nature of the products, e.g., tabulations, confidential, microdata, or public-use files.
Description should be updated regularly, versioned, and curatedb for easy access. When concepts or definitions change, the data documentation for specific data products should be able to precisely connect to the version of the description that applies. Description should be on the appropriate Web page for access by the public, updated regularly, versioned, and curated. When concepts or definitions change, the data documentation for specific data products should be able to precisely connect to the version of the description that applies. In addition, the relationship between the old and new versions should be explained for the benefit of the public.
Justification for the statistical program and input data relied on: Information required includes
✔ product sponsorship and legal authority for the data collection or program
✔ specific input datasets collected using surveys or otherwise acquired to support this estimation effort
✔ an overview of the techniques used to collect these input datasets
✔ an overview of how these datasets are used in support of the program, including weighted aggregation or any use of models, statistical or otherwise, based on these data
✔ a description of any revisions to an ongoing program, including changes to key datasets, models, methods, or procedures
Description should be updated regularly, versioned, and curated for easy access. Information should be updated regularly, versioned, curated, and broadly publicized (e.g., on agency Website, accompaniment to estimates release, press releases, social media).
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archivea To be available internally, to program staff To be available externally, to the public
Point of contact for information requests Information on directing queries to the appropriate staff members should be available. It should be clear when requests for information should be directed to the agency’s FOIA office and when not. Prominent display of point-of-contact information on agency Website.

NOTES:

a Archiving is a more permanent and monitored process than simple saving or retention, requiring various activities to ensure continued reusability. We have not gone into each element of this series of tables to indicate whether archiving or retention is the more appropriate action; but in general, archiving would be more relevant for data and estimates and retention would be more appropriate for methodological details.

b One definition of curation: “The process of ‘caring’ for data, including organizing, describing, cleaning, enhancing and preserving data for public use. Through curation, the ICPSR provides meaningful and enduring access to data.” This definition could include metadata.

Type of Data Collection

For each separately input dataset—survey (Table 7-2), administrative records, or digital trace (Table 7-3)—the following information should be saved, made public, or both. Information needed for transparency is generally the same for each data product. All documentation should be versioned and curated, and all technical reports should be made permanently available. If the reports are public facing, they should have a published DOI (Digital Object Identifier).

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE 7-2 Documenting Statistical Programs Using Survey Data

Information to retain or archive To be available internally, to program staff To be available externally, to the public
Sample design
Target population All details should be curated for easy access. All releasable details should be made publicly available when estimates are released. Details that cannot be made public should be identified as such.
Sampling frame and coverage All details should be curated for easy access. All details should be made publicly available when estimates are released. Details that cannot be made public or are too detailed should be identified as such and made available on request.
Sampling methods, including:
✔ probability type, stratification, stages, clustering
✔ use of any optimization rules in sample design (e.g., Neyman allocation)
All details should be curated for easy access. All details should be made publicly available when estimates are released. Details that cannot be made public or are too detailed should be identified as such and made available on request.
Sample size All details should be curated for easy access. All details should be made publicly available upon estimates’ release.
Data collection
✔ questionnaire employed (exact wording and skip patterns)
✔ self-administration instructions
✔ interviewer instructions
✔ languages offered
✔ self/proxy rates of a collection; identification of respondent
All details should be curated for easy access. All details should be made publicly available upon estimates release. Details that are too voluminous should be identified as such and made available on request.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available internally, to program staff To be available externally, to the public
Modes of data collection, by percent:
✔ at the unit level
✔ including multi-mode sequence, if appropriate
All details should be curated for easy access. All details should be made publicly available upon estimates release. Details that cannot be made public or are too voluminous should be identified as such and made available on request.
Data collection agency and dates of data collection All details should be curated for easy access. All details should be made publicly available upon estimates release. Details that cannot be made public or are too voluminous should be identified as such and made available upon request.
Field operation details:
✔ number of contacts per case
✔ use of incentives, including sequence if in stages
✔ final case dispositions (e.g., completed interviews, proxy interviews, imputation, refusals, noncontacts)
✔ for surveys employing adaptive design, additional information on the field operations is requireda
All details should be curated for easy access. All details should be made publicly available upon request. Details that cannot be made public or are too detailed should be identified as such and made available on request.
Data quality measures
Response rate and formula employed (refer to AAPOR Standard Definitions):b
✔ summary statistics of case disposition by major domain
All details should be curated for easy access. All details should be made publicly available upon estimates release. Details that cannot be made public or are too voluminous should be made available on request.
Coverage error:
✔ undercoverage, overcoverage, duplications by key domains
Detailed technical reports should be prepared or updated for each data release, versioned, and curated for easy access. Releasable technical reports should be prepared, updated, for each data release, versioned, curated and made permanently, publicly available on agency Website with DOI.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available internally, to program staff To be available externally, to the public
Total unit nonresponse rates, and by key domains:
✔ unit nonresponse bias reports
Technical reports should be curated for easy access. Technical reports should be made publicly available upon release of estimates.
Item nonresponse rates by question:
✔ assessment of imputation methods
✔ analysis of item nonresponse rate by question
Details should be curated for easy access. Releasable details should be made available upon release of estimates, as part of technical reports.
Percentage of failed edits:
✔assessment of editing procedures
Details should be curated for easy access. Releasable details should be made publicly available upon release of estimates, as part of technical reports.
Pretesting methods reports, including:
✔ pilot reports
✔ testing reports
✔ experiments
✔ cognitive interviews reports
Technical reports should be curated for easy access. Technical reports should be made publicly available on agency Website.
Changes
Changes made in survey design, survey instrument, field directions since last administration:
✔ maintain list of survey versions
A list accessible to staff should be maintained. The list of such changes should be readily accessible by the public, as part of public technical reports, and where appropriate, on public Websites in accessible formats.
Description of data processing with commented codec
Treatments for failed edits/edit specification The code for the methodology used for treating failed edits should be retained and curated for easy access. The general approach taken for treatments applied to address failed edits should be described and made available for the public on request. Further, the code should be commented to be readable by others and made available on request.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available internally, to program staff To be available externally, to the public
Treatment for unit nonresponse:
✔ any adjustments derived for unit nonresponse
✔ hot deck imputation, administrative records substitution
The code for the methodology used for treating unit nonresponse should be retained and curated for easy access. The general description of the methodology used for treating unit nonresponse should be made available to the public. Further, the code should be commented to be readable by others and made available on request.
Treatments for item nonresponse The code for the methodology used for treating item nonresponse should be retained and curated for easy access. The general description of the methodology used for treating item nonresponse should be made available to the public.
Other post-survey adjustments:
✔ base weights
✔ undercoverage weights
✔ nonresponse weights
✔ other weight adjustments
✔ rounding, etc.
The codes for the methodology(ies) used for post-survey adjustments should be retained and curated for easy access. The general reason for and the description of various post-survey adjustments should be made available to the public.
Transformations of variables (e.g., creation of new variables for analysis through recoding or combining multiple items) The code(s) for the various transformations used should be retained and curated for easy access. The description of the various transformations used and the reasons for their use should be made available for the public.
The methodology used for disclosure protection:
✔methodology should be preserved from collection until input into final methodology used for estimation
✔ entire workflow history must be retained
The commented code to carry out disclosure protection should be retained and curated for easy access. A high-level summary description of what is done to preserve confidentiality should be made available to the public; the disclosure avoidance methods should be released if differential privacy is employed.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available internally, to program staff To be available externally, to the public
Methodology used to produce the official estimates The commented code to implement the methodology used to produce the official estimates should be retained. In addition, a detailed technical description of the methodology used should be written up as a technical report suitable for publication in a technical journal and should be retained. The summary description of the methodology used to produce the official estimates should be made available to the public. The commented code used to implement this methodology should be made available upon request.
Methods used for variance estimation Details and commented code should be curated for easy access. Details and commented code should be made publicly available on request.
Variability of the official estimates A technical report providing details of the estimation of the variability of the official estimates, taking into consideration the effects of nonresponse on the input datasets used, should be retained. A high-level report providing an outline of the estimation of the variability of the official estimates should be made available to the public.
When official estimates are the result of a model-based estimation methodology:
✔ assessment of quality of inputs used in the model
Information on what is known about the variability of the inputs should be retained. Information on what is known about the variability of the inputs should be made available to the public.
✔ model form and related information The form of the model, the associated parameters and how they are estimated, and assessments of the variability of the parameter estimates should be retained. The form of the model, the associated parameter estimates and how they are estimated, and assessments of the variability of the parameter estimates should be made available to the public.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available internally, to program staff To be available externally, to the public
Relevant literature of the application of the model for this purpose Any descriptions of the application of this type of model to analogous problems should be retained. Any descriptions of the application of this type of model to analogous problems should be made available to the public.
Assessments of model fit, plots and tests on residuals Any information on the fit of the model through summary statistics, residual tests, and residual plots should be retained. Any information on the fit of the model through summary statistics, residual tests, and residual plots should be made available to the public.
Efforts to validate the model Any efforts to apply the model to historical data, using simulated data or through use of cross-validation, should be retained. Any efforts to apply the model to historical data, using simulated data or through use of cross-validation, should be made available to the public.
Methodology reports Any methodology reports not included in any of the previous cells should be finalized and retained. Any methodology reports not included in any of the previous cells should be finalized and made public on the agency Website.
Changes in methodology since last implementation Changes in the methodology used from the previous implementation to the next should be described in a technical report and retained. Changes in the methodology used from the previous implementation to the next should be described in a technical report and this should be made available to the public.

NOTES:

a See U.S. Census Bureau Statistical Quality Standards (Requirement A1-3.1) for details: https://www.census.gov/about/policies/quality/standards/standarda1.html.

b https://www.aapor.org/Education-Resources/For-Researchers/Poll-Survey-FAQ/ResponseRates-An-Overview.aspx.

c All code should default to being publicly available. Deviations (confidential parameters) should be justified, and generically identified. This applies to all rows that follow in this table.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE 7-3 Documenting Statistical Programs Using Administrative Records and/or Digital Trace Data

Information to retain or archive To be available only internally, to program staff To be available externally, to the public
ADMINISTRATIVE RECORDS
Target population:
✔ description of any conceptual differences between survey (if any) and administrative record source
✔ coverage of the administrative data records
All details including how they differ from the information needed in support of the official statistical product should be retained. All details should be made permanently available to the public, versioned and curated, as part of technical reports with DOI.
Source of records and time period covered Information should be curated for easy access. The information should be made available to the public.
Data treatments administered All details on how raw responses were treated prior to use should be retained. Summary of how raw responses were treated prior to use should be made available on agency web site.
Changes made in data collection since last implementation Any changes made to the form or the nature of the data collection since the previous implementation should be retained and any relevant technical reports on how the nature of the data elements might have changed due to changes in the external environment should be retained. Any changes made to the form or the nature of the data collection since the previous implementation should be made available to the public, as should any relevant technical reports on how the nature of the data elements might have changed due to changes in the external environment.
Changes in the nature of the program or how people respond to the program that would impact the continuity of data from one time period to the next All such changes should be retained. All such changes should be made available to the public.
DIGITAL TRACE DATA
Data disposition:
✔ source of data
✔ description of data elements
✔ conceptual link between data and information needed for statistical product, including justification for use
Descriptions of data elements and how they compare from the information needed in support of the official statistical product should be retained. Descriptions of data elements and how they compare from the information needed in support of the official statistical product should be made available to public.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available only internally, to program staff To be available externally, to the public
How were data identified?
✔ type of data: e.g., social media, utility monitor
✔ search procedure employed (if any), e.g., Website “scraping,” Internet search, cellphone records, social media sampling
The data elements and how they were found and collected should be described and retained. Descriptions of data elements and how they compare from the information needed in support of the official statistical product should be made available to the public.
Data treatment and characteristics:
✔ cleaning and transformations, when and by whom
✔ reliability and validity of cleaned data
All data treatment techniques and results should be retained. Descriptions of all data treatment techniques and results should be made available to the public, versioned and curated, as part of technical reports with DOI.
Changes made in data collection since last implementation Any changes made to the form or the nature of the data collection since the previous implementation should be retained and any relevant technical reports on how the nature of the data elements might have changed due to changes in the external environment should be retained. Any changes made to the form or the nature of the data collection since the previous implementation should be made available to the public, as should any relevant technical reports on how the nature of the data elements might have changed due to changes in the external environment.
Changes in the nature of the program or how people respond to the program that would impact the continuity of data from one time period to the next All such changes should be retained. All such changes should be made available to the public.
FOR BOTH ADMINISTRATIVE RECORDS AND DIGITAL TRACE DATA
Transformations of variables (e.g., creation of new variables for analysis through recoding or combining multiple items) The code(s) for the various transformations used should be retained and curated for easy access. The description of the various transformations used and the reasons for their use should be made available to the public.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available only internally, to program staff To be available externally, to the public
The methodology used for disclosure protection:
✔should be preserved from collection until input into final methodology used for estimation
✔ entire workflow history must be retained
The commented code to carry out disclosure protection should be retained and curated for easy access. A high-level summary description of what is done to preserve confidentiality should be made available to the public; the disclosure avoidance methods should be released if differential privacy is employed.
Methodology used to produce the official estimates The commented code to implement the methodology used to produce the official estimates should be retained. In addition, a detailed technical description of the methodology used should be written up as a technical report suitable for publication in a technical journal and should be retained. The summary description of the methodology used to produce the official estimates should be made available to the public. The commented code used to implement this methodology should be made available upon request.
Methods used for variance estimation Details and commented code should be curated for easy access. Details and commented code should be made publicly available on request.
Variability of the official estimates A technical report providing details of the estimation of the variability of the official estimates, taking into consideration the effects of nonresponse on the input data sets used, should be retained. A high-level report providing an outline of the estimation of the variability of the official estimates should be made available to the public.
When official estimates are the result of a model-based estimation methodology
✔ assessment of quality of inputs used in the model
Information on what is known about the variability of the inputs should be retained. Information on what is known about the variability of the inputs should be made available to the public.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available only internally, to program staff To be available externally, to the public
✔ model form and related information The form of the model, the associated parameters and how they are estimated, and assessments of the variability of the parameter estimates should be retained. The form of the model, the associated parameter estimates and how they are estimated, and assessments of the variability of the parameter estimates should be made available to the public.
Relevant literature of the application of the model for this purpose Any descriptions of the application of this type of model to analogous problems should be retained. Any descriptions of the application of this type of model to analogous problems should be made available to the public.
Assessments of model fit, plots and tests on residuals Any information on the fit of the model through summary statistics, residual tests, and residual plots should be retained. Any information on the fit of the model through summary statistics, residual tests, and residual plots should be made available to the public.
Efforts to validate the model Any efforts to apply the model to historical data, using simulated data, or through use of cross-validation, should be retained. Any efforts to apply the model to historical data, using simulated data, or through use of cross-validation, should be made available to the public.
Methodology reports Any methodology reports not included in any of the previous cells should be finalized and retained. Any methodology reports not included in any of the previous cells should be finalized and made public on the agency Website.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

Data Integration Issues

As mentioned in Chapter 4, Czajka and Stange (2018) have detailed a new paradigm “characterized by the use of administrative data and other forms of Big Data as alternatives to survey data… [that] necessitates new quality standards that address integrated data” (p. ix). The authors noted that many groups around the world have been studying the issue but no consensus exists for assessing the ultimate quality of an integrated data product. Rather, current research is focused on determining the quality of individual (survey, administrative, or digital trace data) components. As a result, Table 7-4 focuses on record linkage or matching, a technique being employed at most, if not all, statistical agencies and which is one of the primary integration techniques currently in use.

TABLE 7-4 Documenting Data Integration Issues

Information to retain or archive To be available only internally, to program staff To be available externally, to the public
Data files that were linked:
✔identification of files
✔description of files
A description, including the metadata, of the specific data files that were matched should be retained. A description of the specific data files that were matched should be provided routinely as part of the technical reports or versioned data documentation.
Matching methods used:
✔details of matching procedures
Study-specific information and technical reports should be retained. Study-specific information and technical reports should be made available to the public.
Methods used for record linkage:
✔ methods of linking
✔ processes used to select variable sources when multiple source data sets have the same variable
The code used to carry out record linkage, along with a description of the techniques used, the variables used to match on, and a description of how the matching algorithm is implemented, including how uncertain matches are treated (sent to clerical review?) should be retained. (Note that linkage is often probabilistic, so uncertainty is built into the matching method.) A description of the techniques used for record linkage, the variables used to match on, and a description of how the matching algorithm is implemented, including how uncertain matches are treated, should be made available to the public as part of technical reports on data quality.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available only internally, to program staff To be available externally, to the public
Evaluation of linkage success and sensitivity analysis, if used Any information on the quality of such a match should be retained. If available, the estimated error rates for the record linkage routine in this environment should be provided, and if not available, any information on the quality of such a match should be provided instead.

Documentation of Paradata

While paradata are typically considered for survey data, we see no reason why paradata should not be available for administrative data as well, with analogous measures (see Table 7-5). The person entering the data is analogous to the survey taker. There may be differences in availability and completeness of such measures, but they could be used for the same purposes. Also note that paradata occur during three different aspects of the data collection process: (1) paradata that result from interview contact attempts, (2) paradata that result from interview observations, and (3) paradata that result from respondent behavior.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

TABLE 7-5 Documenting Paradata from Statistical Programs

Information to retain or archive To be available only internally, to program staff To be available externally, to the public
PARADATA FROM INTERVIEW CONTACTS
Information on difficulty in obtaining an interview, including contact history. Include contact history instruments employed to record results of each contact attempt. Measures based on notes or debriefs of the number of attempts needed, who the ultimate respondent was, and quality of information at each stage of data collection should be retained for as long as the information has research interest. Information should be made available to the public on request subject to privacy considerations. For adaptive design, retention and availability of paradata are needed to justify data collection decisions that relied on them.
Information on which cases belong to which (anonymized) interviewer in order to check whether there are interviewer effects. Information should be curated for easy access. Information should be made available to the public on request, subject to privacy considerations.
PARADATA FROM INTERVIEW OBSERVATIONS
For each question, computer-generated information on:
✔ the frequency of any delays, asking for assistance, visual discomfort, etc.
Measures based on notes or debriefs of interviewers on problematic questions should be retained for three years. Information should be made available to the public on request, subject to privacy considerations.
PARADATA FROM RESPONDENT BEHAVIOR
For each question, measurements on the respondents’ degree of difficulty in responding, including
✔ click sequences for Web surveys
✔ use of various types of assistance for difficulties in responding
✔ time taken to respond
✔ degree of backtracking, etc.
Any information on measures relevant to difficulty individuals had in responding to individual survey questions should be retained. Information should be made available to the public on request, subject to privacy considerations. Some agencies may wish to make paradata for some surveys available on a special-request basis in secure environments, like a federal statistical research data center.
Response paradata reports:
✔ response latency
✔ key stroke studies
Technical reports should be curated for easy access. Technical reports should be made publicly available on agency Website.
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

Documentation for Archiving of Data2

All input datasets used to produce a set of official estimates and the official estimates themselves should have a records schedule and/or data management plan that indicates a permanent location for such files, how long they are to be retained, how they can be accessed, what metadata standards will be used to access them, and that these must be made public. All data products are subject to records schedule (see Table 7-6).

TABLE 7-6 Archiving of Data

Information to retain or archive To be available only internally, to program staff To be available externally, to the public
Archiving of treated input data and metadata (i.e., modified to account for failed edits, nonresponse, etc.) The input datasets used to produce the official estimates (i.e., the collected data after various treatments have been applied) should be retained along with metadata that provides the record layout. (They do not need to be retained at all agencies making use of them; only the first agency producing them needs to archive.) The metadata should be machine actionable. The input datasets used to produce the official estimates (i.e., the collected data after various treatments have been applied) should be made accessible at a secure repository, such as a federal statistical research data center, along with metadata that provide the record layout. The metadata should be machine actionable.
Archiving of untreated input data and metadata The untreated input datasets used to produce the official estimates should also be retained along with metadata in order to support research on the treatments applied. The metadata should be machine actionable. The untreated input datasets should be accessible at a secure repository, such as a federal statistical research data center, along with metadata that provides the record layout. The metadata should be machine actionable.

___________________

2 Note that changes to the software used or to the media the data are stored on will likely make the data unreadable and, therefore, procedures also are needed to ensure that proper conversions are carried out when necessary.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Information to retain or archive To be available only internally, to program staff To be available externally, to the public
Archiving of official estimates and metadata The official estimates should be retained along with metadata that provide the record layout. The metadata should be machine actionable. The official estimates should be stored using persistent identifiers. The official estimates should be available online for the public for a substantial time. They will be archived as per their record schedule, along with metadata that provide the record layout. The metadata should be machine actionable. The official estimates should be stored using persistent identifiers.

Recommendation 7.1: The National Center for Science and Engineering Statistics and all agencies that produce federal statistics should, to the fullest extent feasible, document their data collection methods, their data treatments, their estimation methodologies, and assessments of the quality of their official estimates, and they should archive their input datasets and their official estimates to support reproducibility and later reuse, as specified in the tables developed by the panel. To the extent possible, they should make as much of this information as possible available to their external user communities; for data treatments and estimation methodologies, they may do so through methodological overviews. They should provide reasons, such as legal or contractual constraints, for omitting items in the tables.

DEALING WITH ERRATA IN OFFICIAL STATISTICS

In discussions of the transparency of official statistics, one topic that arises is what information to provide concerning errata, which we will define to be procedural, computational, conceptual, or other kinds of errors that are discovered after release of a set of official statistics. The panel would like to distinguish between errors of different magnitudes. For example, there are errors that are relatively modest that are unlikely to change policy inferences, because the estimates have essentially the same general structure with the error retained or removed. Every series of official statistics has regular improvements of various types, and if the errors are relatively minor, it is reasonable to include with these improvements such additional “corrections” that have also been made since the last release,

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

mentioning that the improvements also include the correction of a number of minor errors. Such improvements could be included in the release schedule for a set of official statistics, with an accompanying notification of updates of various kinds that have various sources, which could potentially include new data sources, various methodological improvements, conceptual improvements, and finally correction for small errors of various kinds.

In addition, there are on occasion (hopefully rare) more substantial errors that could result in different patterns in the estimates, which in turn could easily impact policy inferences. In such instances, we believe that it is important for transparency to call out such errors, provide their cause and nature, and release a corrected set of estimates as soon as possible and not on the above release schedule.

A VISION OF FEDERAL STATISTICS IN THE FUTURE

The panel envisions a not-too-distant future federal statistical system characterized by the implementation of more transparent methods and data, resulting in greater care in the documentation of methods, the use of uniform processes for archiving of input data and all official statistics, and the greater use of metadata standards. This will result in more sharing and reuse of input data and official statistical estimates and the methods used to produce them with the accompanying knowledge transfers within and among federal statistical agencies and with national statistical offices around the world. In this envisioned future, there will be greater interaction with the public, because today’s user also wishes to make use of official statistics for nonstandard tabulations and as input to their own statistical models. As a result, agencies will have done much more in support of these alternative uses of their estimates.

Further, members of statistical programs’ user communities will be more fully understood, due to increased focus on their needs (as noted at the end of Chapter 4), and they will more regularly serve as beta testers for proposed user interfaces and tools intended to provide better access to official estimates. In addition, greater use of data for research purposes will be facilitated through greater use of the federal statistical research data centers.

Internally—continuing this envisioned future—archived and documented materials will be retained in permanent Web locations and code will be fully commented and available across agencies in indicated (possibly secure) locations online. Identical machine-readable metadata standards will be used by all statistical programs, which will make sharing of methods and data easier among the statistical community. This greater sharing of data and information could extend to a variety of activities. Each program that produced official statistics would contribute to standardization and generalization by facilitating the sharing of questionnaire items, methods of

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

coding of administrative records, applications of paradata used for common purposes (e.g., investigating nonresponse bias), or methods of combining survey and nonsurvey data.

In addition, standardized transparency will facilitate interagency collaboration on research. For instance, if one wishes to interview college graduates today, the target population is drawn from a combination of lists, but there are other surveys with similar goals, including from the National Center for Science and Engineering Statistics, the Census Bureau, and the National Center for Education Statistics, which all provide some statistics on college graduates. This could be, after a period of adjustment, addressed by sharing data across agencies.3

Further, because the input datasets used to produce official statistics are less likely to be survey data for many programs, and will instead use combinations of survey data, administrative data, and digital trace data, there will be the need to use sophisticated (and currently novel or unknown) models or matching techniques in producing future sets of official statistics. There is also the obvious need for more computer science expertise, both with respect to current employees and also as consultants to the agencies. This has many implications. First, the manner in which business processes operate in federal statistical agencies will be based on how other agencies have produced similar estimates, which will be straightforward given the documentation and archiving of those methods and data, respectively. Second, adding survey expertise will have a somewhat lower priority in comparison to the higher priority of providing additional expertise in statistical modeling and computer science techniques useful in documentation, archiving, and code development. There will be a need for increased interaction among the staffs of all federal statistical agencies, their user communities, and with international statistical agencies. Given the need for research into the nature and fitness for use of these novel data sources, a great deal more effort will be given to validation activities. This is also likely to require additional methodological resources.

An approach to federal statistics in which transparency and reproducibility play a larger role will be instrumental in raising the level of trust in official statistics. This would be particularly important in circumstances where normal survey operations have been disrupted.

Much of the above vision is conditional on securing additional resources, and it is also predicated on the assumption that a number of legal issues get resolved. Both are discussed below.

___________________

3 See https://www2.census.gov/ces/wp/2021/CES-WP-21-19.pdf.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

RESOURCE NEEDS TO PROCEED

This study was partly motivated by recent legislation, particularly the Foundations for Evidence-Based Policymaking Act, but the vision it aims for has been described in a number of earlier National Academies of Sciences, Engineering, and Medicine reports, including Innovations in Federal Statistics: Combining Data Sources While Protecting Privacy (2017). This vision will require additional resources, at least in the near term. Given that need, and for other reasons including various disruptions to the status quo, bringing it to fruition will require support from senior management, pilot testing, new hires, and new consulting arrangements. It is also likely that this will require new legislation, such as facilitating the collection and use of administrative data for the production of official statistics and changes to sections of the U.S. Code that prohibit specific data sharing across—or even within—federal agencies or with the public. In addition, it is important that the statistical agencies engage in the further development of statistical metadata standards, especially among all the agencies in concert and through international cooperation, such as with the United Nations Economic Commission for Europe, the Data Documentation Initiative Alliance, and the Statistical Data and Metadata Exchange.

For that reason, we have two final recommendations:

Recommendation 7.2: Senior management at the agencies that produce federal statistics should provide resources and staff support to help transform their current processes to incorporate the use of data sharing and reuse through use of metadata tools and standards. This entails support for pilot projects, additional training of existing staff, enlisting of assistance from experts through support contracts, and reconfiguring of existing processes.

Recommendation 7.3: Agencies that produce federal statistics, in order to implement many of the recommended initiatives in this report, should be provided with additional funds to acquire the necessary training and information technology assistance, as well as cover any increased operational costs, to modify current processes to improve documentation and archiving in support of the greater transparency of official statistics.

This report recommends that the U.S. federal statistical agencies change the way they manage metadata. All change is difficult, especially for agencies that are used to the way they conduct business. Federal budgets for the statistical agencies are, at best, flat; users want to see content additions to the products the agencies already produce, and now this report urges

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

the agencies to take on new activities that will, at least initially, require additional funds. However, the report also recommends an incremental approach that relies on achievable goals. In this way, the report contains some advice on how to manage the change being suggested.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×

This page intentionally left blank.

Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 147
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 148
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 149
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 150
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 151
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 152
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 153
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 154
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 155
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 156
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 157
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 158
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 159
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 160
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 161
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 162
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 163
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 164
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 165
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 166
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 167
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 168
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 169
Suggested Citation:"7 Best Practices for Federal Statistical Agencies." National Academies of Sciences, Engineering, and Medicine. 2022. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies. Washington, DC: The National Academies Press. doi: 10.17226/26360.
×
Page 170
Next: References »
Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies Get This Book
×
 Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies
Buy Paperback | $35.00 Buy Ebook | $28.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Widely available, trustworthy government statistics are essential for policy makers and program administrators at all levels of government, for private sector decision makers, for researchers, and for the media and the public. In the United States, principal statistical agencies as well as units and programs in many other agencies produce various key statistics in areas ranging from the science and engineering enterprise to education and economic welfare. Official statistics are often the result of complex data collection, processing, and estimation methods. These methods can be challenging for agencies to document and for users to understand.

At the request of the National Center for Science and Engineering Statistics (NCSES), this report studies issues of documentation and archiving of NCSES statistical data products in order to enable NCSES to enhance the transparency and reproducibility of the agency's statistics and facilitate improvement of the statistical program workflow processes of the agency and its contractors. Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies also explores how NCSES could work with other federal statistical agencies to facilitate the adoption of currently available documentation and archiving standards and tools.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!