Skip to main content

Currently Skimming:

2 The Data Access, Confidentiality Tradeoff
Pages 5-17

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 5...
... The panel that produced the report Private Lives and Public Policies recognized this variability and did not advocate trying to identify such a tradeoff (National Research Council and Social Science Research Council, 19931. However, workshop participants expressed general optimism about the possibilities for developing tools that would enhance, on a case-by-case basis, the ability to increase data access without compromising data protection or, conversely, to increase confidentiality without compromising data access.
From page 6...
... Microdata sets, such as those from the Health and Retirement Study (HRS) , offer the most promising means of answering specific questions such as how social security interacts with pensions and savings in household efforts to finance retirement, how social security age eligibility requirements affect retirement rates and timing, and how changes in out-of-pocket medical expenses affect utilization of federal programs.
From page 7...
... Likewise, they were able to estimate the behavioral impact of expected benefit level, expected earnings, gender effects, and policy variables. Linking to administrative records can improve data accuracy as well as data scope by giving researchers access to information that individuals may not be able to recall or estimate accurately in a survey context.
From page 8...
... By having access to a National Longitudinal Survey of Youth (NLS-Y) file with detailed geographic codes for survey respondents, Gordon was able to add contextual data, such as the availability of child care, for the neighborhoods in which respondents lived.
From page 9...
... Research Impact of Data Alteration Versus Access Restriction Alternative methods for reducing statistical disclosure risks were discussed at length. At the most general level, the options fall into two categories data alteration and access restriction.
From page 10...
... Several researchers cautioned that, although they would rather live with the burden of limited access than deal with synthetic data, data centers and other limited access arrangements do impose costs on research. To the extent that data centers favor large-budget research projects, less well-funded disciplines can be prevented from accessing important data resources.
From page 11...
... Participants acknowledged the need to assess disclosure risks; they were less certain about how best to quantify harm the true cost that results from disclosure. This question requires additional attention.
From page 12...
... CNSTAT member Thomas Louis of the University of Minnesota suggested recasting the debate: instead of comparing risks with probability zero, one might consider how the probability of disclosure changes as a result of a specific data release or linkage, or from adding (or masking) fields in a data set.
From page 13...
... Effort expended to protect the security of data at the source versus at the linking phase should be proportional to the relative disclosure risks posed at each point. Unfortunately, such assessments of relative risk have as yet typically not been made.
From page 14...
... Sweeney s works suggests that it is technically possible to quickly link records of some individuals from two large files; at the same time, as Dean argued, building an accurate comprehensive linked data set from the two sources may require many identifiers and a high degree of sophistication. 6similar arguments were advanced to support the view that regulating the behavior of data users is more efficient than altering the data to allow broader access.
From page 15...
... Statistical disclosure risks are defined by the ability of snoopers to draw inferences from de-identified data; linking is generally part of this process. The authors' framework, represented in the diagram below, essentially attempts to maximize data utility by minimizing necessary distortion, subject to a maximum tolerable risk constraint.
From page 16...
... Sweeney's computational approaches to assessing risk involve constructing functions that relate hierarchical data aggregation to data security. Working at the cell level, she has developed algorithms designed to balance the tension between the usefulness of data, as measured by the degree of aggregation (i.e., number of variables collapsed)
From page 17...
... The key is finding optimal levels of distortion across data set fields and exploiting tradeoffs between specificity and anonymity protection. Applying her techniques to u-ARGUS and DataFly, Sweeney concluded that the former tends to underprotect the data, while the latter tends to overdistort.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.