Skip to main content

Currently Skimming:


Pages 55-70

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 55...
... R ­ obert Fogel and Douglas North won the Nobel Memorial Prize in e­conomics for their work analyzing data from federal statistical agencies that had been a­ rchived and that they digitized, transforming our understanding of ­American history and economic development. Various research teams have used data originally preserved by the National Archives, but subsequently (partially)
From page 56...
... BLS also provided a hypothetical alternative unemployment rate, the rate that would have resulted if all records had been correctly classified ("the overall un­ employment rate would have been about 3 percentage points higher than reported") , and the agency let researchers, as always, access the released ­anonymized mi crodatae or the confidential data, via the federal statistical ­research data centers (FSRDCs)
From page 57...
... Archiving would mean pre serving the survey data, the administrative data, and the processes that were used to integrate the two sources of data to create the public estimates. Even in cases where the underlying microdata could not be made publicly avail able, the legitimacy of the resulting public estimates would be increased if the process by which they were produced and the way multiple data sources were integrated were publicly available.
From page 58...
... These surveys create novel and sometimes unresolved challenges to an agency's ability to archive and preserve its raw input data, processes, and methods. CURRENT PRACTICES WITH RECORD SCHEDULES AND DATA MANAGEMENT PLANS As mentioned previously in this report, and as is well known, currently the great majority of the input datasets for official statistics in the United States are either survey based or administrative records based (often from tax data, which are extremely sensitive)
From page 59...
... In 2019, NSF encouraged, but did not mandate, the use of DMPs.21 Discipline-specific cri teria can be more tightly enforced. The Interdisciplinary Earth Data A ­ lliance offers its tool to provide proof of compliance with NSF Data P ­ olicies.
From page 60...
... . Records schedules can persist for a long time, and are not frequently updated, though recent executive branch memos and efforts at the National Archives may lead to updates and modernizations across U.S.
From page 61...
... The Census Bureau has a record control schedule specifically addressing surveys it conducts for other agencies (DAA-0029-2013-0002a)
From page 62...
... The DCAT metadata standard is used by a number of other n ­ ational governments, including those countries following the EU-managed ­DCAT-AP standard, and it serves as the basis for the Schema.org Dataset schema used by Google Dataset Search and others. Since 2013, most m ­ ajor federal agencies have implemented comprehensive dataset inventories following the metadata standard using metadata management platforms provided by their 31 Data.Gov, while not an archive, does contribute to the harmonization and preservation of metadata and improves the discovery of federal data resources.
From page 63...
... The basic DCAT-based metadata captured in the data inventories follow­ing M-13-13 and the current Evidence Act requirements are unlikely to capture the rich detail needed for adequate transparency for statistical products. However, since these metadata records are required for all data products, and since most agencies have implemented the metadata management systems and processes to support them, they present a useful starting point.
From page 64...
... In general. -- In consultation with the Director and in accordance with the guidance established under paragraph (2) , the head of each agency shall, to the maximum extent practicable, develop and maintain a comprehensive data inventory that accounts for all data assets created by, collected by, under the control or direction of, or maintained by the agency.
From page 65...
... the comprehensive data inventory developed pursuant to subparagraph (C) , including any real-time updates to such inventory, and data assets made available in accordance with subpara graph (E)
From page 66...
... , having information about the actual data-generating process -- the paradata -- will increase transparency. In situations where the data-generating process is in the hands of statistical agencies or other well-defined entities and is d ­ esigned for a specific purpose (i.e., a survey)
From page 67...
... Interviewer IDs Many products produced by statistical agencies still rest on i­nterviewer-administered surveys. In these surveys, interviewers influence greatly the collection and processing of the collected information, so cap turing paradata about the interviewers and their activities is necessary to ensure transparency.
From page 68...
... would allow external researchers33 not only to evaluate the quality of the final survey product, but also to make suggestions on how the data collection process could be improved. Currently it is an open question how large the user base for paradata from call record data is, and this needs to be weighed against the effort to make such data available.
From page 69...
... As an example, a small, common denominator of paradata availability guidelines that seems feasible is outlined in Box 3-4. The case studies above illustrate uses of paradata and suggest that their availability, with appropriate documentation, can help survey r­esearchers understand the survey-data collection process.
From page 70...
... For responding cases: • Edit and imputation flags where appropriate • Total duration of the survey administration in minutes • If interviewer administered, an anonymized interviewer ID • Any interviewer observations or notes collected as well.  collection cycle can help improve the data collected in a future data cycle. The use of paradata, mostly motivated by survey methodologists, has grown over paradata's brief history.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.