Skip to main content

Currently Skimming:

5 Protecting Privacy and Confidentiality While Providing Access to Data for Research Use
Pages 73-96

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 73...
... . It is also well recognized that having external users analyze statistical data is key to improving the quality of federal statistical agency processes (Abowd et al., 2004)
From page 74...
... We note some inadequacies of these laws. We summarize a variety of approaches used by federal statistical agencies for providing access to confidential data for statistical purposes, as well as access models from other countries, focusing on those that include combining multiple data sources.
From page 75...
... . Data provided by federal statistical agencies…are the factual base needed for informed public discussion about the direction and implementation of those policies.
From page 76...
... . Federal statistical agencies typically make pledges to survey respondents that they will keep their information confidential and use it only for statistical purposes.
From page 77...
... CONCLUSION 5-2 Combining multiple data sources increases risks to the public from data breaches and identity theft. The proliferation of publicly accessible data, outside of the statistical agencies, has dramatically increased the risks inherent in releasing microdata because these other data sources can be used to re-identify putatively anonymized data.
From page 78...
... 100) considered the general problem of how to enable researchers' use of medical data while safeguarding privacy: In recent years, a number of techniques have been proposed for modifying or transforming data in such a way so as to preserve privacy while sta tistically analyzing the data (reviewed in Aggarwal and Yu, 2008; NRC, 2000, 2005, 2007b,c)
From page 79...
... In addition to its provisions for the basic protection of individuals' records, the Privacy Act includes provisions that pertain to the use of records for statistical purposes. The law sought to enable the continued use of data generated by federal agencies while safeguarding privacy (Allen and Rotenberg, 2016)
From page 80...
... When federal statistical agencies collect survey data from respondents, they usually pledge to keep the information they collect confidential and to use it only for statistical purposes.6 Statistical agencies are able to make this pledge to respondents because of authority in their authorizing statutes (e.g., Census Bureau's Title 13) or through the Confidential Information Protection and Statistical Efficiency Act of 2002 (CIPSEA)
From page 81...
... Agencies are also required to annually train and certify that all employees and agents with access to data covered by CIPSEA have completed CIPSEA training and that all statistical products have been reviewed to ensure that there is no disclosure of identifiable information. CIPSEA permits recognized federal statistical agencies or units7 to d ­ esignate external researchers to obtain access to confidential statistical data for exclusively statistical purposes by giving these agencies the 7  For a list of OMB-recognized statistical agencies and units, see https://obamawhitehouse.
From page 82...
... The publication of statistics covering various groups and subgroups requires careful consideration of how to safely release statistical products and of the potential privacy losses that might occur. In this section, we discuss several different approaches to protecting the privacy of data, including minimizing the personal data that are collected, minimizing disclosure risk by restricting the data that are released, controlling access to and use of the data, encrypting data, and using differential privacy techniques to measure and control cumulative privacy loss.
From page 83...
... Restricted Data Restricting data includes removing explicit identifiers and applying a variety of statistical disclosure limitation methods to the dataset (see Federal Committee on Statistical Methodology, 2006) to reduce the risk of disclosure.
From page 84...
... Such synthetic datasets use statistical models to create microdata records that are plausible predictions of an individual record. In total, the synthetic dataset can reproduce many of the statistical conclusions available from the actual dataset.
From page 85...
... Federal statistical agencies have also used a number of different modes for researchers to access and analyze "restricted use" datasets. These methods include licensing agreements, remote access, and online data analysis systems (see Federal Committee on Statistical Methodology, 2006)
From page 86...
... Federal Statistical Research Data Centers Another approach for providing access to data for researchers is through federal statistical agency research data centers. Such centers have been used to provide more stringent controls on who has access to data and the conditions under which they have access.
From page 87...
... In a further expansion of the role of FSRDCs, administrative data from other federal agencies are also being made more accessible to researchers through them. Nongovernment Data Enclaves NORC at the University of Chicago has created a data enclave that provides various data services, including archiving, curating, and indexing the 14  For example, the National Agricultural Statistics Service provides external researchers a ­ ccess to CIPSEA protected microdata for statistical purposes at its data lab in headquarters or at data labs in its 12 regional field offices.
From page 88...
... Some universities are creating their own data enclaves, which can also house federal statistical data. The Center for Urban Science Progress at New York University is developing a data facility as a secure research setting with datasets, tools, and expert staff to provide research support services to students, faculty, and government employees.
From page 89...
... In addition to acquiring datasets, the ADRN is able to link and de-­ identify datasets for researchers. Two examples of linkage that the ADRN has done are linking benefits and earning data with health data to learn more about the impact of poverty on health and linking education data with crime data to understand how education affects criminality (Administrative Data Research Network, 2015)
From page 90...
... . Turn and Ware note that privacy and security issues emerged separately in the 1960s until the "privacy cause célèbre," which was the proposal for a National Data Center, intended to be a centralized databank of all personal information collected by federal agencies for statistical purposes.
From page 91...
... Data could be encrypted using state-of-the-art technology both in transit and at its destination to provide protection against harm in the case of data breaches or inappropriate data access. This can be done using mature technology.
From page 92...
... As statistical information is extracted from a dataset, there is increasing risk of disclosure of individuals in the dataset. This cumulative privacy loss can be conceptualized as a "privacy loss budget": when a specified level of cumulative risk has been attained, the privacy loss budget would have been fully expended.
From page 93...
... At their best, differentially private algorithms can make confidential data widely available for useful data analysis, without resorting to data clean rooms, data usage agreements, data protection plans, or restricted use enclaves. It permits the measurement and control of privacy loss that accumulates over multiple analyses.
From page 94...
... The U.S. Census Bureau uses synthetic data generated with a variant of differential privacy in the agency's OnTheMap tool, which provides aggregate information about where people work and where workers live (see Machanavajjhala et al., 2008)
From page 95...
... CONCLUSION 5-6 As federal statistical agencies move forward with linking multiple datasets, they must simultaneously address quantifying and controlling the risk of privacy loss. CONCLUSION 5-7 Privacy-enhancing techniques and privacy preserving statistical data analysis can potentially enable the use of private-sector data sources for federal statistics.
From page 96...
... As noted above, the fundamental law of information recovery has ramifications for statistical agencies' disclosure limitation activities. Statistical agencies are accustomed to protecting data from individual inappropriate uses and reviewing each statistical product for disclosure risks; they are not accustomed to limiting statistical analysis or prioritizing analyses based on considerations of cumulative privacy loss or of using a privacy loss budget (Abowd, 2016b; Abowd and Schmutte, 2016)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.