Skip to main content

Currently Skimming:

2 Disclosure Avoidance in the 2020 Census
Pages 7-28

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 7...
... In the interest of time, Leclerc did not delve too deeply into the details of how the Census Bureau's algorithm is executed, with some of that material being covered in brief in later remarks by David Van Riper and Seth Spielman (Section 3.1) or deferred to background documents accompanying the release of the 2010 Demonstration Data Products (DDP)
From page 8...
... described the development of what is formally known as the 2010 Decennial Census TopDown Disclosure Limitation Algorithm (TDA for short) , fleshing out some details of the Census Bureau's simulated database reconstruction attack against itself that was alluded to in Chapter 1.
From page 9...
... as being correct on name, census block, sex, age, race, and ethnicity. Though "it is heartening that this 38 percent is not 100 percent," Leclerc noted that it is "extremely disconcerting that this rate is several orders of magnitude larger than anything we have seen in the past in similar studies of this kind." The magnitude of the bottom-line 17 percent reidentification risk reinforced the notion that the data landscape has fundamentally changed in recent decades and that database reconstruction attack "does potentially represent a significant and meaningful threat to traditional disclosure limitation algorithms." Leclerc said that the magnitude of the potential risk is the reason why the Census Bureau turned to formal privacy, or differential privacy, techniques: because the mathematical proofs about the privacy guarantees that they impart provide a strong measure of future-proofing against the range of data, computational power, and algorithms that attackers in the future (or today)
From page 10...
... The basic premise of the Census Bureau's planned TDA routine is to start with the CEF, the true underlying data collected in the census after any quality edits or imputations for missing information are made. CEF data are in microdata form, individual records for every person and housing unit.
From page 11...
... House of Representatives is such that those figures are held immune from change in the disclosure avoidance process. More subtly, the Census Bureau included the count of housing units and group quarters units at the block level as invariants in the 2010 DDP to prevent the algorithm from, for example, establishing a count of college-age dormitory students in census blocks that do not include such dormitories.
From page 12...
... NOTES: Entities in the dashed rectangle are the "spine" or "on-spine" geographic levels, in the parlance that developed around the Census TopDown Algorithm; those not in the rectangle are "off-spine." Counts associated with geographic levels are tallies from the 2010 Census (see https://www.census.gov/geographies/reference-files/time-series/geo/tallies.html) with number of inhabited census blocks from Philip Leclerc presentation (Section 2.1)
From page 13...
... 0.5 Detailed Tabulations (fully saturated contingency table) 0.1 allocated to specific noisy-measurement queries and geographic level, separately for persons and housing units.
From page 14...
... Census Bureau in support of the workshop, the key text of which was also included in the technical documentation associated with the 2010 Demonstration Data Products; allocation by levels and tables was referenced in workshop presentation by Philip Leclerc.
From page 15...
... Even though the basic histogram schema Geography × Ethnicity × Race × Age × Sex × HHGQ is, at finest grained detail (geography at census blocks) , massive -- roughly 10 million blocks crossed by 1.25 million combinations of the
From page 16...
... . The consequence, though, would be "much larger error at higher geographic levels." He said that the key benefit of the TDA routine is that the error due to disclosure limitation and noise injection does not increase with the number of census blocks contained within a particular area.
From page 17...
... These discussions are challenging because one does not want to bound legitimate scientific inference using the resulting, privatized data, but only inference that erodes privacy. The second major thrust in the arguments presented about choosing involve what Leclerc called "optimistic empirical analyses," in essence repeating the simulated database reconstruction attack that motivated the Census Bureau to adopt differential privacy in the first place, using TDA output at various levels of to get a sense of practical change in privacy protection.
From page 18...
... faced by the Census Bureau in its disclosure efforts, but it was still deemed crucial to assess how well the TDA performs at addressing the simulated database reconstruction attack. Leclerc said that the Census Bureau carried out new experiments and simulated attacks, the only difference this time being that the TDA-generated microdata were used as the input rather than the published 2010 Census tables.
From page 19...
... 2.2 SETTING THE PRIVACY-LOSS BUDGET FOR THE 2010 DEMONSTRATION DATA PRODUCTS Matthew Spence (U.S. Census Bureau)
From page 20...
... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation.
From page 21...
... . Figure 2.4 replicates some of the boxplots for absolute difference in population between the keyed MDFs and the raw 2010 Census data, while Spence also presented boxplots showing the percentage of difference in population for the counties, in both cases grouping data by these total-population size categories.
From page 22...
... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation.
From page 23...
... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation.
From page 24...
... Consistent with the previous plots in Spence's series, the lowest values of show some major discrepancies, such as a county with 0 percent of its true 2010 Census population identifying as one of these major race groups but registering as 40 percent in the MDF. As in the preceding graphs, Spence noted that the scattering of points above the 45-degree line for small percentages in the 2010 Census data and below the line for large percentages in 2010 is consistent with the previous finding that more populous race groups for particular counties tend to lose population in TDA processing and less populous groups tend to gain.
From page 25...
... David Van Riper (Minnesota Population Center) asked Leclerc for his thoughts about the confirmed reidentification rates when reprising the simulated database reconstruction attack (Figure 2.2)
From page 26...
... On a whole range of outcomes, including but certainly not limited to health, census data are of interest in comparing "who's doing better" among different groups in the country relative to "who's doing worse." Hence, the impacts of disclosure avoidance on being able to understand the magnitude of inequities within the general population also deserves consideration. Leclerc said that this was a very useful insight and that the Census Bureau would appreciate feedback on measuring equity and the effect on dimensions of equity in a more rigorous manner.
From page 27...
... He illustrated the point by saying that, based on analyses done since the release of the DDP, workshop presentations were inevitably going to mention the unusual results those products show for housing unit vacancy rates. Fixing the housing unit counts by block, while having to accommodate changes in total population counts, requires either effectively placing people in known-vacant units (unnaturally driving down the vacancy rate)
From page 28...
... As another example, Cai mentioned that DDP information for the small city of Emporia suggests that the city's teen pregnancy rate would shift from 5 percent to 66 percent, which has definite implications. Spence replied that he understood the importance of the point and the role of census data as denominators for other measures.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.