– 12 –
Summary of Breakout Discussion Sessions
To close the workshop, the planning committee sought a way to tap the feedback and impressions from the broader in-person audience than either a designated capstone speaker or an unstructured, large-group floor discussion would provide. To accomplish this, the workshop attendance list was divided into smaller groups for breakout sessions, one using part of the auditorium where the workshop presentations took place and the rest in other meeting rooms in the National Academy of Sciences building. The breakout groups were structured to include a mix of data users and Census Bureau staff in each room. A member of the planning committee or a staff member was asked to jot down bullet points from the discussion in these smaller breakouts. The workshop then resumed with a brief plenary session, allowing the breakout reporters to recap what they heard in the discussions.
Joe Hotz (Duke University) charged each of the breakout discussion groups to consider three questions:
- What are the key findings over the last two days of the workshop, and what did we learn about common features or challenges across the various use cases for census data?
- What are the key priorities for the Census Bureau (and the follow-up expert group meetings) to address and resolve, in the next several months?
- What trade-offs are possible in terms of data product content, geographic specificity, and quality standards, from the data user perspective?
With those directions, the breakout sessions ran for roughly one hour.
Constance Citro (Committee on National Statistics), with input from Joe Hotz, offered the following summary points from the group that convened in the Board Room:
- This workshop and the public availability of the 2010 Demonstration Data Products (DDP) were good, necessary first steps, but major communication work remains to be done. This includes questions such as what the census data quality operations of Count Review and Count Question Resolution mean in the context of a fully synthetic Microdata Detail File (MDF). What about challenges when numbers just don’t look very credible? The example of the American Community Survey (ACS) was raised, in which a series of pamphlets and guidebooks were prepared for different classes of data users to make them aware of the properties of ACS estimates. The same might be done for data under differential privacy.
- There remain major challenges on communicating the disclosure avoidance methodology and its effects to stakeholders and advocacy groups, in ways that they can understand.
- It would be very useful for the Census Bureau to make more widely public the timeline within which user feedback can still be useful in shaping the final disclosure avoidance approach.
- Discussions of the application of differential privacy tend to be totally off-putting to the people who are actively working to boost participation in the 2020 Census and mobilize groups to get as complete a count as possible. They seem to be asking why they should bother putting in all this effort if the end data are going to be so noisy.
- All those communication challenges noted, the breakout discussants also indicated genuine interest among the user community in working with the Census Bureau on solutions. It would be useful to generate a list of “talking points” on how disclosure avoidance really works.
- Data users are still not on the same page as the Census Bureau about the urgency of executing a complete overhaul of disclosure avoidance methodology for the 2020 Census. The breakout discussion group understood that the Census Bureau is committed to the approach, but there is still a major gulf there that needs to be understood.
- The workshop did not include representation from population segments that really do not want their data made available under any circumstances, one extreme end of the privacy argument, and that perspective warrants attention.
- Key fitness-for-use problems include that, by design, the privatized estimates for small areas and small groups are noisier than one would like. Particularly because the basic counts matter so much, the leaders and planners of these off-spine geographic levels that represent functioning
- governments aren’t going to readily accept being aggregated or averaged with their neighboring towns or villages: they need their numbers, alone.
- The issue of equity for demographic groups also came up in the breakout discussion, and this relates back to the broader communication issue because many of these small, impacted communities may well lack the resources to figure out how to function in this new world.
- The breakout discussion briefly touched on the notion of making total population invariant by block, despite the awareness that it may make the DAS work more computationally difficult. It would be important to study how much the things that aren’t yet settled (that is, the demographic characteristics) might be hurt if the total count was made invariant.
- The breakout group talked briefly about trade-offs, though it quickly ran into the problem that everything that one participant proposed as a sacrifice was shot down by someone else. There is an ongoing need for some information at the block level, not only for redistricting but also building off -spine geographies like Traffic Analysis Zones. But participants did suggest that maybe they don’t need as much detail at the block level. Age might be grouped more coarsely, and it was wondered whether the detailed race combinations need to be made available everywhere at the block level.
- A point was raised in the breakout group that it would have been nice to have held a workshop like this four years ago, leaving sufficient time to integrate user input into DAS development. The group briefly discussed that this user engagement should begin early as the Census Bureau moves to revised disclosure avoidance strategies for the ACS, with its greater subject matter content than the decennial census.
- The Census Bureau needs to address the time series properties of the DAS-protected data, particularly in light of the workshop presentations on the sensitivity of denominators for calculating important rates. In essence, the problem is helping communities determine whether population drops (and resulting impacts on rates) are noise or whether they are real.
- Acknowledging that there are state and local legal mandates regarding the use of census data, attention needs to be paid to explaining changes resulting from different disclosure avoidance to those local legislators and policymakers, if nothing else to inform changes that might be necessary to those legal mandates on the books.
Joe Salvo (New York City Department of City Planning) summarized the impressions from the breakout group that remained in part of the auditorium:
- There was a concern about how the Census Bureau is going to communicate, educate, and direct the resources (personnel and funds) that are going to be necessary to do this. Communication and education about the new
- approach is going to require a considerable investment in explaining and re-educating people about why their population numbers are different this time than previously.
- Understanding the properties of the disclosure avoidance process is definitely an iterative process, but it is also one that has an N of 1 right now, the single glimpse at how the process works for a single setting of parameters and constraints in the 2010 DDP. The suggestion was made that there needs to be ways to extend that knowledge base, including providing the data user community with more information about measures of uncertainty and with additional runs of the privatized data.
- Further work on uncertainty metrics and additional data runs and simulations would enable the user community to continue to provide the Census Bureau with feedback in ways that really can’t be done so long as N = 1.
- A major topic for this breakout group was how the privacy-loss budget ϵ will be allocated by geography, and which geographies will be covered by that privacy budget. The off-spine geographies do not currently get a direct allocation of the privacy-loss budget, but are of sufficient importance that some remedy to improve the precision of their counts must be found, if only to ward off every mayor of a small place calling up the Census Bureau and complaining about what happened to their numbers.
- The process of how privacy-protected census data will factor into the intercensal population estimates program, and be used to drive the Census Bureau’s other major surveys, is a major concern.
- The breakout group briefly discussed ongoing user engagement as Ron Jarmin had suggested (Section 11.3) and suggested that this direct input continue.
- The breakout group also briefly discussed the notion of holding total population invariant at the block level, and not just the counts of housing units and group quarters units, and the same need for information on the need for practical information on the feasibility of that step was noted.
- The breakout group did not delve much into trade-offs, but it did reach the conclusion that perhaps some of the content available for the finest levels of geography (blocks) might be pared back, if it would increase the amount of privacy-loss budget that would be available at higher levels.
The discussions in the third breakout session in NAS 250 were reported out by Eddie Hunsinger (California Department of Finance):
- Inconsistencies between the major components in the DAS process, in particular between the separate person and housing unit files, were raised
- as an issue. This is presumably something to be addressed in revising the post-processing routines.
- Post-processing in general is not yet well-explained or understood, and it is something that the data user community needs to learn more about.
- As had been suggested in presentations by Nicholas Nagle (Section 3.2) and others, getting a better handle on the effects of disclosure avoidance routines on state funding allocations is important.
- In terms of solutions going forward, it will be important to engage more people in the discussion of differential privacy and disclosure avoidance through our professional networks. But it also has to be done carefully: we shouldn’t be seeking to alarm people, but to make sure that they are well-informed and aware of the potential impacts on their communities.
- The question was raised as to whether more rounds of DDPs would be made available, especially as (hopefully) improvements are made to the methodology and post-processing.
- In terms of trade-offs, it would be useful to examine which combinations of race and ethnicity are actually needed and relevant in the P.L. 94-171 redistricting data, as that seems to be a potential spot to improve overall accuracy by coarsening the categories.
- Identifying acceptance criteria for the data resulting from the disclosure avoidance process is of keen interest, and it would be useful to have inputs from the data user and subject matter communities on developing such criteria.
In the remaining moments, Citro asked the audience whether there were any other points that participants wanted to get on the table as reactions and direct feedback to the Census Bureau teams. Mike Ratcliffe (Geography Division, U.S. Census Bureau) commented that when you look at the big picture, and not necessarily specific to implementations in the 2020 Census, there are really two types of geography. There are the legal, political, and administrative areas that the Census Bureau has no control over, which have existing and evolving boundaries that the Census Bureau is obliged to accept and work with. There are also the statistical areas over which the Census Bureau (based on specifying criteria and working with partners) does have control. An issue going forward is how best to work with the geographic layers and concepts in which there is flexibility and latitude for change. There should be discussion of changes that need to be made to the statistical geographic concepts to better meet user needs and facilitate work in a differential privacy-based disclosure avoidance system. Hunsinger added to the point from his breakout-session summary on engaging more people through professional network, encouraging particular work with the Census Bureau’s State Data Center Program. With that, Hotz offered another round of thanks to workshop participants and presenters, and the workshop adjourned.
This page intentionally left blank.