Skip to main content

Currently Skimming:

2 Current Practices for Documentation and Archiving in the Federal Statistical System
Pages 33-50

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 33...
... Whether the input data come from a survey, administrative r­ ecords, or digital traces, each of these steps may require addressing various complex questions about how to collect the highest quality input data, or what methods will 1 Unfortunately, there is no commonly accepted term for the new sources of data that tech nology has made possible, which includes Internet transaction data, social media data, and sensor data. Other terms that have been suggested for (some of)
From page 34...
... There is also use of imputation employed for item nonresponse. Many users of official statistics have an incomplete understanding not only of the complexity of the collection of the raw input data, but of the measures taken to turn those input data into the final official estimates.
From page 35...
... Know ing how a set of official statistics is produced entails retaining details of the various data collection processes used, including the survey design, the survey instru­ments and field instructions, should survey data be collected, or retaining descriptions of any processes used to collect data from admin istrative sources and/or from digital traces. Then, whatever computations are carried out in converting the raw input data to the set of inputs fed into the estimation methodology also need to be documented.
From page 36...
... Innovation and Progress Science marches on, and the science underlying federal statistics is no different. It is very likely that we are currently in a period in which substantial changes are either occurring or will soon occur as a result of the greater cost of collecting survey data and the greater use of other sources for input data.
From page 37...
... This is accomplished by making their official statistics, their data collection techniques, and the estimation methods used to produce them available to the public, to the extent possible. Further, making available the data collection techniques and computations, along with the relevant input datasets -- under secure arrangements to protect confidentiality -- permits the validation of a set of official statistics by demonstrating its computational reproducibility.
From page 38...
... These standards say much less about statistical programs that are based on administrative data or digital trace data. Similarly, OMB provides less guidance with respect to the documentation of methodologies either used for data treatment or for estimation associated with the official statistics, including whether the associated software code should be made known to the public.
From page 39...
... For example, at the 2017 workshop on transparency (NASEM, 2019a) , representatives of the Longitudinal Employer-Household D­ ynamics program at the C ­ ensus Bureau reported that they have the capability of recovering any input data set, the associated program code, and the resulting official estimates in a few minutes.
From page 40...
... We asked these individuals about their practices regarding the documentation of their data collections, the archiving of the resulting input datasets and resulting official statistics, and the documentation of statistical methods used to treat the data and to produce the indicated official statistics. RESPONSES TO THE INFORMAL QUESTIONNAIRE By requesting that program chiefs or other informed staff respond to these informal questionnaires, the committee was able to get a rough sense of what the current practice is, both internally and externally, regarding the documentation of the data collection methods, the data treatments used, BOX 2-1 Programs That Responded to Informal Panel Questionnaire 1.
From page 41...
... The input data files retained were generally the edited files used as input to produce the associated official estimates, they said, but some programs retained the raw data files and other intermediate files prior to the production of the final i­nput data. There often were internal guidelines for what and how to save but, respondents said, there were typically no guidelines on the use of metadata standards.4 Some agencies pointed out that there was no repository on their own (internal)
From page 42...
... Otherwise, the adop tion of these standards or other metadata standards seems not to be typical of most federal statistical agencies. One agency expressed concern that to adopt these standards would ­obligate the agency to do so for the entire historical series to ensure back ward comparability.
From page 43...
... The wording of some questions often made implicit assumptions about the data collection or the methods that made the questions difficult for respondents to interpret in 5 The federal statistical agencies have written a number of excellent technical reports and handbooks for various official estimates that are based on the data collected from surveys. A key example of this is U.S.
From page 44...
... Some agencies make available public-use microdata samples, and some metadata are saved with the input data, generally in the form of codebooks and other agency-specific procedures. Regarding information on data collection methods, data treatments, and estimation methodologies provided to external users, there is often substantial information on the survey design used and the survey instrument.
From page 45...
... The practice of archiving input datasets and official estimates varies greatly across agencies, and
From page 46...
... ­Further, access to input datasets using secure avenues varies substan tially across agencies. CHALLENGES THAT ARISE IN IMPLEMENTING TRANSPARENCY AND REPRODUCIBILITY There are challenges and costs associated with the use of increased transparency.
From page 47...
... In the case of survey input data, often the only way to make such input datasets available for ­analysis to members of the external research community is to do so in secure environments that provide comprehensive protection against dis closures. Such datasets are therefore only provided through use of federal statistical research data centers and comparable constructs, they are anony mized, and additional techniques, such as differential privacy, are applied to them to reduce any remaining risk of disclosure.
From page 48...
... The statistical agencies do not receive raw responses if the data are collected under the Census Bureau's Title 13 regulation. Instead, the Census Bureau provides the official estimates that are estimated from the raw data.
From page 49...
... Commercial entities also can provide datasets at a price that might be used to construct target populations for sampling. In such cases, contractors and commercial vendors may provide their algorithms or data to federal statistical agencies for a specific use for a limited period of time during which they are not to be shared.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.