National Academies Press: OpenBook
« Previous: 7 Evaluation of the Demonstration Data in Public Health
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

8

Privacy Concerns

As explained previously in this proceedings, the TopDown Algorithm (TDA), which is based on differential privacy methods, is the disclosure avoidance system used by the Census Bureau for its 2020 Census data products. This chapter first presents a use case to demonstrate the risk of re-identification and is followed by a panel discussion on privacy concerns.

RISK OF RE-IDENTIFICATION

Abraham Flaxman (University of Washington) began by commenting on a 2006 quote by Professor Stephen Fienberg that resonated with his research:

Sharing data is a matter of ethics and the U.S. Census Bureau’s data are a public good. So the need to provide greater access to Bureau data seems obvious to me. But I also see the possibility that some data can be misused, either by government officials or by others who access them. There is an ethical obligation not to aid and abet that abuse.

Flaxman stated that his presentation investigates some of the risk of abuse of sharing census data. His analysis examines how linked census data might disclose sensitive gender identity information using computer simulation. He posited that this work is salient given the heightened scrutiny of transgender people, with a particular focus on transgender children, and cited the recent example of the Texas governor directing the state Department of Family and Protective Services to investigate the parents of any transgender child who receives gender-affirming care. The study—which

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

used data from the Behavioral Risk Factor Surveillance System, 2010 American Community Survey, and simulated reported sex in the 2020 decennial census based on gender—investigated the risk of disclosing a child’s transgender status through discordant reporting of binary gender in successive censuses. This simulation resulted in four scenarios.

The first scenario counted all simulants with differing values recorded for sex in 2010 and 2020 to estimate the number of transgender youth who would have their gender identity revealed if census microdata, including names, were released or re-identified. Flaxman and his colleague Os Keyes (University of Washington) found disclosure of gender identity for more than 6,000 simulants (38% of all transgender children in the simulated Texas data).

The second scenario tested for no disclosure avoidance with a reconstructed-abetted linkage attack without the re-identification step. This targets simulants ages seven and younger in 2010 who had a unique combination of age, race, and ethnicity in their census block. Transgender children who moved between the 2010 and 2020 censuses were likely not revealed. Transgender children who did not move might not have their transgender status revealed if in-migration resulted in them no longer having a unique combination of attributes in 2020.

The third scenario tested the swapping method for disclosure avoidance. Instead of using each simulant’s geography directly in the reconstruction-abetted linkage attack, the researchers chose a random subset of households to have their reported location swapped to somewhere other than their true location. They selected some households to swap independently at random with a probability of five percent. They chose a reported location to swap by selecting uniformly from all simulated households in Texas.

The final and fourth scenario used the TDA for disclosure avoidance. Instead of simulating forward from 2010 to 2020, these researchers initialized simulants in 2020 and simulated time backwards to 2010. This scenario allowed using the Demographic and Housing Characteristics (DHC) demonstration file instead of swapping to quantify the impact of TDA on the reconstruction-abetted linkage attack. The central question was, How many fewer transgender kids are identified by the reconstruction-abetted linkage attack against TDA compared with swapping?

Flaxman presented the results in Table 8-1, concluding that linked data from the decennial censuses contain sensitive gender identity information and that the TDA improves on swapping for protecting sensitive information. He stated that the limitation of this work is that the components of the model are perhaps overly simplistic in the following ways:

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

TABLE 8-1 Results: Number of Transgender Kids Identified

  Transgender kids disclosed False positives Positive predictive value
Scenario 1: Extreme disclosure 6,200 0 100.00%
Scenario 2: No disclosure avoidance 657 69,527 0.94%
Scenario 3: Swapping for disclosure avoidance 605 77,426 0.78%
Scenario 4: TopDown Algorithm for disclosure avoidance 170 36,267 0.47%

SOURCE: Adapted from Abraham Flaxman workshop presentation, June 21, 2022.

  1. migration data that lack heterogeneity;
  2. mechanism of how gender maps to reported sex;
  3. assumption that race and ethnicity is reported identically in 2010 and 2020; and
  4. census block boundaries that change from 2010 to 2020.

However, the Census Bureau could investigate how the results of the simulation compare with results using real (but restricted) data. Flaxman offered a link to the full replication archive and draft report for this work.1

PERSPECTIVES FROM A PRIVACY PANEL

The workshop featured a panel discussion on privacy concerns moderated by danah boyd (Microsoft Research), with panelists John Davisson (senior counsel, Electronic Privacy Information Center [EPIC]), Margaret Hu (professor of law, William and Mary Law School), and Sharita Gruberg (vice president of economic justice at the National Partnership for Women & Families). The panelists offered several examples of why privacy protection is so important and leaves certain populations vulnerable to re-identification with adverse impacts.

Gruberg stated that the binary sex variable on the DHC File is more complicated than may be obvious initially. She explained that, among members of the LGBTQ+2 community, the idea of sex assigned at birth does not always align with how a person presents in the world, so this category can reveal a lot of information. Moreover, Gruberg noted that there is a lot of “pushback” from Generation Z, which thinks about gender in a very different way from other generations and is more likely to identify as

___________________

1https://github.com/aflaxman/linked_census_disclosure

2 LGBTQ = lesbian, gay, bisexual, transgender, and queer.

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

LGBTQ+, according to Gallup, which recently upped its estimate to around 20 percent. When digging deeper, Gruberg explained, one in four LGBTQ+ people identifies as nonbinary. She asserted that someone who has difficulty with the sex question on the census may not respond to this question, but other information they completed might be enough to identify the person as nonbinary.

In the context of a discussion about some individuals’ fear of filling out anything, boyd mentioned that confidentiality is important when there are ongoing conversations about measuring sexual orientation and gender identity. Gruberg stated she pushed every day for sexual orientation and gender identity questions to be added the census and other surveys because these data are so important. She noted that she worked on these issues at the Center for American Progress and that it is important to provide reassurance that information provided will not be identifiable. Gruberg stated that there were more than 300 bills proposed in states in 2022 attacking the ability of LGBTQ to survive having medical care withheld through “Don’t Say Gay” bills.

Gruberg mentioned a study with NORC that found LGBTQ+ people take “avoidance measures” to avoid discrimination based on sexual orientation, which can extend to hiding their personal relationships and affecting decisions about where to work and live. As a result, if people do not feel comfortable about their confidentiality in these federal surveys, they may lie or avoid filling anything out. This tension was summarized by boyd as, “We want high-quality data that also relies on people being willing to participate and really willing to engage.”

Hu responded to questions about citizenship and her work on immigration, and boyd referenced the citizenship question imposed by the Trump administration before it was blocked by the Supreme Court. Hu explained that these are equity issues, and the surveillance of communities of color is a concern for immigrants who feel they might be targeted. She explained in her workshop remarks that, as there is no master file of all U.S. citizens in the United States,

it is not surprising that in a post-9/11 environment that there’s a real push to be able to commandeer all government databases to create what is considered a comprehensive list of individuals who in a national security or domestic security sense would be considered ripe for targeting for an investigation.

Hu continued that the risks of re-identification and removing anonymity are tremendous with the type of artificial intelligence-driven capacities available to private corporations, or data brokers, who can collate aggregated data into something that allows for further targeting or otherwise

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

underscores these risks. Hu also noted that the sensitivity of data is not only a concern for national security but also for its potential to be used by foreign adversaries to, for example, interfere with U.S. elections.

Davisson discussed household relationships and risks, such as gay heads of households being outed or people who are in violation of tenancy rules. An example was offered by boyd about companies in New York City that try to identify people who are in violation of Section 8 housing by starting with the census data and pointing out more people live in buildings than are allowed by law. Davisson discussed that comprehensive data protection legislation that governs data by commercial entities and use of personal data by law enforcement is essential. He asked rhetorically, “Why is that not sufficient for privacy protection when we are talking about census data?” Davisson asserted that it is much easier to deny companies data in the first place than to restrict the use of that information once it is in their data lakes. Davisson stated that the United States has failed to pass a federal comprehensive data protection law for more than 20 years. An additional complication is that even the strongest U.S. data protection law is not going to stop companies that are beyond U.S. jurisdiction from deriving and using the data from census data products.

When discussing other sources of data that have been used to identify people based on Internet search queries or social media usage, Davisson stated it is essential both for the protection of privacy as a civil right and for the accuracy of census data products that the Census Bureau take every reasonable step in its power to avoid contributing to this ecosystem of data brokers. In response, boyd invited Davisson to reflect on the common claim that protection is unnecessary if the data are “already in data broker land anyhow?” Davisson argued that, while some data elements are available through data brokers, the scale and accuracy of census data make them especially rich targets for corroborating and enriching other data sets.

Margaret Hu discussed trade-offs between what data users want for a public good and the possibility for moral harms; for example, there is a desire to collect more detailed race information so a community can see itself and benefit from greater visibility, especially in civil rights enforcement and federal funding contexts. But there are potential harms—for example, the Census Bureau was asked by the U.S. Department of Homeland Security after 9/11 to use published data to produce a special tabulation of Arab Americans. Hu noted the long history in the United States of looking to national security justifications for targeting communities on the basis of race, from Chinese exclusion to Japanese internment. Hu emphasized that understanding vulnerability and past precedent is critical in contextualizing risks that, in the absence of law, amount to ethical obligations.

boyd asked for ideas for how the Census Bureau and its stakeholders can bridge the gap of communication in this next phase. Davisson suggested that

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

quantifying privacy risks is essential to the process. Gruberg explained that communicating stakeholder engagement also means partnering with trusted community groups. She stated the Leadership Conference for Civil and Human Rights has a census data taskforce comprised of trusted national and state organizations. Hu emphasized the importance of ensuring a baseline understanding that these privacy rights are civil rights, and these civil rights are democracy rights. She stated, “I think we have never been more aware about these concerns of data and democracy.”

DISCUSSION

Hawes asked the panel for their perspectives on the question, How protected is protected enough? Davisson responded that he was not advocating a particular epsilon value, but that EPIC’s objective is to make certain that confidentiality protection is on equal footing with the statistical obligations of the Census Bureau. John Abowd (U.S. Census Bureau) stated that the Census Bureau did not necessarily need a recommendation about the privacy-loss budget, but revealing the precise geolocation of the front door of an individual’s house would be a privacy violation. He said that the question “How noisy does an inference about the location you responded to the census from need to be relative to giving the latitude and longitude to six decimal places of accuracy?” is the kind of thing the Census Bureau has to think about.

Davisson stated that he did not come prepared to discuss technical standards but that draft privacy legislation in Congress defines geolocation data as including place information with precision greater than a zip code. Hu responded that the better way to think about it is in terms of framework and principles that allow for dialogue. She referred to her 10 years of service in the Civil Rights Division of the U.S. Department of Justice and the realization that the Civil Rights Act (1964) would not capture all forms of discrimination.

Mays asked whether there could be a version of “differential protection” when there is a push for granular data, which then increases vulnerability. She continued by wondering whether there were legal protections for how data privacy can be considered. Hu responded that the proposed Artificial Intelligence (AI) Act in the European Union may provide some guidance.

Jan Vink (Cornell University) commented on the balance between use and privacy. He asked how much accuracy is sufficient for same-sex couples at what level of geography? Gruberg responded that it is important to count same-sex couples versus same-sex heads of households and how many are raising kids, but it is early in this process of measurement so national or even state-level counts are great at this point. Webinar attendee

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

Jeremy Seeman (Pennsylvania State University) commented on current recommendations for the National Institutes of Health (NIH) on measuring sexual orientation and gender identity and what the implications are for the Census Bureau. Gruberg responded that the Fenway Institute has been involved in these discussions, but because health data have a higher level of protection, privacy considerations may be different compared with data collected by NIH subject to HIPAA.

Serving as moderator for the privacy panel, boyd raised concerns germane to privacy and the census. One notion was to question what it means to produce statistics as a public good and compared the origins of statistics to political arithmetic. Another concern raised by boyd was the danger in creating lists from data that can be used under various guises but nevertheless serve to threaten people if confidentiality is breached, thereby outing a person’s characteristic that could be used in harmful way. The Privacy Concerns panel closed with remarks by boyd urging the audience not to forget the role that proposed questions on citizenship played as the 2020 Census approached.

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×

This page intentionally left blank.

Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 85
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 86
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 87
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 88
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 89
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 90
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 91
Suggested Citation:"8 Privacy Concerns." National Academies of Sciences, Engineering, and Medicine. 2023. 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/26727.
×
Page 92
Next: 9 Observations on Use Cases and Needs »
2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop Get This Book
×
 2020 Census Data Products: Demographic and Housing Characteristics File: Proceedings of a Workshop
Buy Paperback | $26.00 Buy Ebook | $20.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

This proceedings summarizes the presentations and discussions at the Workshop on the 2020 Census Demographic and Housing Characteristics File, held June 21-22, 2022. The workshop was convened by the Committee on National Statistics of the National Academies of Sciences, Engineering, and Medicine to assist the U.S. Census Bureau with its new disclosure avoidance system for 2020 Census data products, which implements algorithms providing differential privacy. The workshop focused specifically on the Demographic and Housing Characteristics File, a major source of data for local governments, particularly those with small populations, and many other data users in the federal, state, academic, and business sectors. The intent was to garner feedback from users on the usability of the privacy-protected data by evaluating DHC demonstration files produced with the proposed TopDown Algorithm on 2010 Census data.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!