The National Academies Press

Currently Skimming:

2 Disclosure Avoidance in the 2020 Census
Pages 7-28

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 7... ... In the interest of time, Leclerc did not delve too deeply into the details of how the Census Bureau's algorithm is executed, with some of that material being covered in brief in later remarks by David Van Riper and Seth Spielman (Section 3.1) or deferred to background documents accompanying the release of the 2010 Demonstration Data Products (DDP) Read the entire page →
From page 8... ... described the development of what is formally known as the 2010 Decennial Census TopDown Disclosure Limitation Algorithm (TDA for short) , ﬂeshing out some details of the Census Bureau's simulated database reconstruction attack against itself that was alluded to in Chapter 1. Read the entire page →
From page 9... ... as being correct on name, census block, sex, age, race, and ethnicity. Though "it is heartening that this 38 percent is not 100 percent," Leclerc noted that it is "extremely disconcerting that this rate is several orders of magnitude larger than anything we have seen in the past in similar studies of this kind." The magnitude of the bottom-line 17 percent reidentiﬁcation risk reinforced the notion that the data landscape has fundamentally changed in recent decades and that database reconstruction attack "does potentially represent a signiﬁcant and meaningful threat to traditional disclosure limitation algorithms." Leclerc said that the magnitude of the potential risk is the reason why the Census Bureau turned to formal privacy, or diﬀerential privacy, techniques: because the mathematical proofs about the privacy guarantees that they impart provide a strong measure of future-prooﬁng against the range of data, computational power, and algorithms that attackers in the future (or today) Read the entire page →
From page 10... ... The basic premise of the Census Bureau's planned TDA routine is to start with the CEF, the true underlying data collected in the census after any quality edits or imputations for missing information are made. CEF data are in microdata form, individual records for every person and housing unit. Read the entire page →
From page 11... ... House of Representatives is such that those ﬁgures are held immune from change in the disclosure avoidance process. More subtly, the Census Bureau included the count of housing units and group quarters units at the block level as invariants in the 2010 DDP to prevent the algorithm from, for example, establishing a count of college-age dormitory students in census blocks that do not include such dormitories. Read the entire page →
From page 12... ... NOTES: Entities in the dashed rectangle are the "spine" or "on-spine" geographic levels, in the parlance that developed around the Census TopDown Algorithm; those not in the rectangle are "oﬀ-spine." Counts associated with geographic levels are tallies from the 2010 Census (see https://www.census.gov/geographies/reference-ﬁles/time-series/geo/tallies.html) with number of inhabited census blocks from Philip Leclerc presentation (Section 2.1) Read the entire page →
From page 13... ... 0.5 Detailed Tabulations (fully saturated contingency table) 0.1 allocated to speciﬁc noisy-measurement queries and geographic level, separately for persons and housing units. Read the entire page →
From page 14... ... Census Bureau in support of the workshop, the key text of which was also included in the technical documentation associated with the 2010 Demonstration Data Products; allocation by levels and tables was referenced in workshop presentation by Philip Leclerc. Read the entire page →
From page 15... ... Even though the basic histogram schema Geography × Ethnicity × Race × Age × Sex × HHGQ is, at ﬁnest grained detail (geography at census blocks) , massive -- roughly 10 million blocks crossed by 1.25 million combinations of the Read the entire page →
From page 16... ... . The consequence, though, would be "much larger error at higher geographic levels." He said that the key beneﬁt of the TDA routine is that the error due to disclosure limitation and noise injection does not increase with the number of census blocks contained within a particular area. Read the entire page →
From page 17... ... These discussions are challenging because one does not want to bound legitimate scientiﬁc inference using the resulting, privatized data, but only inference that erodes privacy. The second major thrust in the arguments presented about choosing involve what Leclerc called "optimistic empirical analyses," in essence repeating the simulated database reconstruction attack that motivated the Census Bureau to adopt diﬀerential privacy in the ﬁrst place, using TDA output at various levels of to get a sense of practical change in privacy protection. Read the entire page →
From page 18... ... faced by the Census Bureau in its disclosure eﬀorts, but it was still deemed crucial to assess how well the TDA performs at addressing the simulated database reconstruction attack. Leclerc said that the Census Bureau carried out new experiments and simulated attacks, the only diﬀerence this time being that the TDA-generated microdata were used as the input rather than the published 2010 Census tables. Read the entire page →
From page 19... ... 2.2 SETTING THE PRIVACY-LOSS BUDGET FOR THE 2010 DEMONSTRATION DATA PRODUCTS Matthew Spence (U.S. Census Bureau) Read the entire page →
From page 20... ... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation. Read the entire page →
From page 21... ... . Figure 2.4 replicates some of the boxplots for absolute diﬀerence in population between the keyed MDFs and the raw 2010 Census data, while Spence also presented boxplots showing the percentage of diﬀerence in population for the counties, in both cases grouping data by these total-population size categories. Read the entire page →
From page 22... ... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation. Read the entire page →
From page 23... ... Presentation also included values of 0.1, 0.25, 0.5, 2, 8, and 16. SOURCE: Matthew Spence workshop presentation. Read the entire page →
From page 24... ... Consistent with the previous plots in Spence's series, the lowest values of show some major discrepancies, such as a county with 0 percent of its true 2010 Census population identifying as one of these major race groups but registering as 40 percent in the MDF. As in the preceding graphs, Spence noted that the scattering of points above the 45-degree line for small percentages in the 2010 Census data and below the line for large percentages in 2010 is consistent with the previous ﬁnding that more populous race groups for particular counties tend to lose population in TDA processing and less populous groups tend to gain. Read the entire page →
From page 25... ... David Van Riper (Minnesota Population Center) asked Leclerc for his thoughts about the conﬁrmed reidentiﬁcation rates when reprising the simulated database reconstruction attack (Figure 2.2) Read the entire page →
From page 26... ... On a whole range of outcomes, including but certainly not limited to health, census data are of interest in comparing "who's doing better" among diﬀerent groups in the country relative to "who's doing worse." Hence, the impacts of disclosure avoidance on being able to understand the magnitude of inequities within the general population also deserves consideration. Leclerc said that this was a very useful insight and that the Census Bureau would appreciate feedback on measuring equity and the eﬀect on dimensions of equity in a more rigorous manner. Read the entire page →
From page 27... ... He illustrated the point by saying that, based on analyses done since the release of the DDP, workshop presentations were inevitably going to mention the unusual results those products show for housing unit vacancy rates. Fixing the housing unit counts by block, while having to accommodate changes in total population counts, requires either eﬀectively placing people in known-vacant units (unnaturally driving down the vacancy rate) Read the entire page →
From page 28... ... As another example, Cai mentioned that DDP information for the small city of Emporia suggests that the city's teen pregnancy rate would shift from 5 percent to 66 percent, which has deﬁnite implications. Spence replied that he understood the importance of the point and the role of census data as denominators for other measures. Read the entire page →

From page 7...

... In the interest of time, Leclerc did not delve too deeply into the details of how the Census Bureau's algorithm is executed, with some of that material being covered in brief in later remarks by David Van Riper and Seth Spielman (Section 3.1) or deferred to background documents accompanying the release of the 2010 Demonstration Data Products (DDP)

2 Disclosure Avoidance in the 2020 Census Pages 7-28

2 Disclosure Avoidance in the 2020 Census
Pages 7-28