Skip to main content

Currently Skimming:

3 Changes in Archiving Practices to Improve Transparency
Pages 51-72

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 51...
... That implies both selecting which objects will be preserved and which will not and taking actions to maintain the possibility of access to those objects that are preserved. Storage alone is not necessarily preservation, and this is especially true in an age in which many objects are "born digital." The National Archives and Records Administration (NARA)
From page 52...
... Transparency is thus also critical to public participation in and provision of information to the federal statistical system, improving data quality and reducing the costs of data collection. To provide this transparency, it is necessary to archive -- to preserve and make accessible -- the full data life cycle, including questionnaires, metadata about the data collection process, metadata about the transformation of raw data into data products, and the data products themselves.
From page 53...
... principles provide further guidance. The FAIR principles focus on making data products accessible and useful, but do not specifically address questions of preservation or providing the metadata necessary to evaluate data quality 8 https://www.whitehouse.gov/wp-content/uploads/2019/06/M-19-21.pdf.
From page 54...
... Because digital data can be altered in ways that are not transparent, it is now often not possible to find out precisely what was published at a particular point in time. More generally, there is no systematic preservation of the public-use digital data products of the federal statistical system.
From page 55...
... R ­ obert Fogel and Douglas North won the Nobel Memorial Prize in e­conomics for their work analyzing data from federal statistical agencies that had been a­ rchived and that they digitized, transforming our understanding of ­American history and economic development. Various research teams have used data originally preserved by the National Archives, but subsequently (partially)
From page 56...
... BLS also provided a hypothetical alternative unemployment rate, the rate that would have resulted if all records had been correctly classified ("the overall un­ employment rate would have been about 3 percentage points higher than reported") , and the agency let researchers, as always, access the released ­anonymized mi crodatae or the confidential data, via the federal statistical ­research data centers (FSRDCs)
From page 57...
... 17 U.S. Census Bureau microdata from its economic censuses and surveys that are available for research access are described at https://www.icpsr.umich.edu/web/pages/appfed/index.html.
From page 58...
... These surveys create novel and sometimes unresolved challenges to an agency's ability to archive and preserve its raw input data, processes, and methods. CURRENT PRACTICES WITH RECORD SCHEDULES AND DATA MANAGEMENT PLANS As mentioned previously in this report, and as is well known, currently the great majority of the input datasets for official statistics in the United States are either survey based or administrative records based (often from tax data, which are extremely sensitive)
From page 59...
... the management and sharing of scientific data generated from NIH-funded or conducted research."23 Guidance on creat ing DMPs is provided by numerous entities,24 and various online tools exist to assist researchers in crafting DMPs.25 Records Schedules All U.S. government agencies are required to maintain "records s­ chedules." All federal records, including those created or maintained for the government by a contractor, must be covered by a NARA-approved agency disposition authority SF 115, Request for Records Disposition Authority, or the NARA General Records Schedules (36 CFR § 1225.10)
From page 60...
... . Records schedules can persist for a long time, and are not frequently updated, though recent executive branch memos and efforts at the National Archives may lead to updates and modernizations across U.S.
From page 61...
... The Census Bureau has a record control schedule specifically addressing surveys it conducts for other agencies (DAA-0029-2013-0002a)
From page 62...
... The DCAT metadata standard is used by a number of other n ­ ational governments, including those countries following the EU-managed ­DCAT-AP standard, and it serves as the basis for the Schema.org Dataset schema used by Google Dataset Search and others. Since 2013, most m ­ ajor federal agencies have implemented comprehensive dataset inventories following the metadata standard using metadata management platforms provided by their 31 Data.Gov, while not an archive, does contribute to the harmonization and preservation of metadata and improves the discovery of federal data resources.
From page 63...
... The metadata that accompany such data should also be preserved using broadly accepted metadata standards appropriate to the data at hand. The records schedules, which describe the plans for retaining, preserv ing, and making accessible microdata and associated metadata, should be easily accessible on each statistical agency Website so that users know when and where microdata and associated metadata will be made available, and when they are scheduled to be destroyed.
From page 64...
... In general. -- In consultation with the Director and in accordance with the guidance established under paragraph (2) , the head of each agency shall, to the maximum extent practicable, develop and maintain a comprehensive data inventory that accounts for all data assets created by, collected by, under the control or direction of, or maintained by the agency.
From page 65...
... In general. -- The Administrator of General Services shall maintain a single public interface online as a point of entry dedicated to sharing agency data assets with the public, which shall be known as the "Federal data catalogue". The Admin istrator and the Director shall ensure that agencies can submit public data assets, or links to public data assets, for publication and public availability on the interface.
From page 66...
... We include a short discussion of them here, because these data are collected as part of the production process and can inform survey data improvements and assessments of survey data quality, most often internal to a statistical agency. As a result, maintaining and understanding these data can be viewed as a component of an agency's transparency about the information it makes available.
From page 67...
... However, given that most survey-data production processes are aided by digital devices, over the last decade paradata have been collected and used not only to guide process decisions but also to evaluate the quality of the data afterwards. Case Studies In this last section we review three types of paradata whose capture and preservation may prove highly useful: data concerning interviewers, call record data, and data derived from measurements of respondent survey behavior such as keystrokes.
From page 68...
... would allow external researchers33 not only to evaluate the quality of the final survey product, but also to make suggestions on how the data collection process could be improved. Currently it is an open question how large the user base for paradata from call record data is, and this needs to be weighed against the effort to make such data available.
From page 69...
... In the American Community Survey, for example, while one unit of observation can represent upwards of 20 contact attempts, a full interview likely has hundreds of keystrokes submitted whether in a computer-aided interview setting or by the respondents in self-administered Web settings. Few examples exist where log data or audit trails are available as publicuse data.
From page 70...
... In Chapter 5, we will argue that meta data standards provide the means for organizing data for archival purposes through conforming to a metadata specification, and this could be useful in archiving paradata as well. Continued research on the use of paradata is important to improve the quality of survey data in ongoing data programs.
From page 71...
... The federal statistical agencies should retain, preserve, and make accessible machine- and human-readable metadata -- including survey instru­ments and the provenance of any administrative data -- used in the production of official statistics. In addition, because paradata help to provide a better understanding of the quality of survey data, the federal statistical agencies should retain, preserve, and make accessible both machine- and humanreadable paradata necessary for evaluating data quality.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.