Page i Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Consensus Study Report

Page ii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001

This activity was supported by a contract between the National Academy of Sciences and the U.S. Census Bureau (#1333LB21D0000003/1333LB 21F00000248). Support of the work of the Committee on National Statistics is provided by a consortium of federal agencies through a grant from the National Science Foundation (No. 1560294) and several individual contracts. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.

International Standard Book Number-13: 978-0-309-70710-7
International Standard Book Number-10: 0-309-70710-2
Digital Object Identifier: https://doi.org/10.17226/27169
Library of Congress Control Number: 2023952292

This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.

Copyright 2024 by the National Academy of Sciences. National Academies of Sciences, Engineering, and Medicine and National Academies Press and the graphical logos for each are all trademarks of the National Academy of Sciences. All rights reserved.

Printed in the United States of America.

Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. https://doi.org/10.17226/27169.

Page iii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.

The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.

The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.

The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.

Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.

Page iv Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process and it represents the position of the National Academies on the statement of task.

Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.

Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.

For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.

Page v Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

PANEL TO CREATE A ROADMAP FOR DISCLOSURE AVOIDANCE IN THE SURVEY OF INCOME AND PROGRAM PARTICIPATION

TRIVELLORE RAGHUNATHAN (Chair), University of Michigan

SCOTT H. HOLAN, University of Missouri

V. JOSEPH HOTZ, Duke University

THOMAS KRENZKE, Westat

FANG LIU, University of Notre Dame

ROBERT A. MOFFITT, Johns Hopkins University

AMY PIENTA, Inter-university Consortium for Political and Social Research

NATALIE SHLOMO, University of Manchester

ALEKSANDRA (SEŠA) SLAVKOVIĆ, The Pennsylvania State University

HEEJU SOHN, Emory University

SALIL VADHAN, Harvard School of Engineering and Applied Sciences

JENNIFER VAN HOOK, The Pennsylvania State University

Staff

BRADFORD CHANEY, Study Director

DAVID JOHNSON, Senior Program Officer

NANCY KIRKENDALL, Senior Program Officer

MADELEINE GOEDICKE, Senior Program Assistant

JOSHUA LANG, Senior Program Assistant

Page vi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

COMMITTEE ON NATIONAL STATISTICS

KATHARINE ABRAHAM (Chair), Department of Economics, University of Maryland, College Park

MICK P. COUPER, Institute for Social Research, University of Michigan

DIANA FARRELL, JPMorgan Chase Institute, Washington, DC

ROBERT GOERGE, Chapin Hall at the University of Chicago

ERICA L. GROSHEN, School of Industrial and Labor Relations, Cornell University

DANIEL E. HO, Stanford Law School, Stanford University

HILARY HOYNES, Goldman School of Public Policy, University of California, Berkeley

DANIEL KIFER, Department of Computer Science and Engineering, The Pennsylvania State University

SHARON LOHR, School of Mathematical and Statistical Sciences, Arizona State University, Emerita

NELA RICHARDSON, ADP Research Institute, Roseland, NJ

C. MATTHEW SNIPP, School of the Humanities and Sciences, Stanford University

ELIZABETH A. STUART, Department of Mental Health, Johns Hopkins Bloomberg School of Public Health

Staff

MELISSA CHIU, Director

BRIAN HARRIS-KOJETIN, Senior Scholar

CONSTANCE F. CITRO, Senior Scholar

Page vii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Reviewers

This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.

We thank the following individuals for their review of this report:

Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by JOHN L. CZAJKA, Independent Consultant, and WILLIAM W. STEAD, Vanderbilt University

Page viii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Medical Center. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.

Page ix Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Acknowledgments

This Consensus Study Report reflects the invaluable contributions of many colleagues, whom the panel thanks for their generous time, effort, and expert guidance. On behalf of the panel, I extend my deepest appreciation to the sponsor of this work: the Census Bureau within the U.S. Department of Commerce. Without the Census Bureau’s support, including through briefings and responses to the panel’s information requests, this study would not have been completed. In particular, the panel thanks David Waddington, Division Chief of the Social, Economic, and Housing Statistics Division; Jason Fields, Senior Researcher for Demographic Programs and Survey of Income and Program Participation (SIPP); and Holly Fee of the Social, Economic, and Housing Statistics Division. The panel also thanks all of those who provided briefings on key issues to the panel. These include Gary Benedetto, Steve Clark, Aref Dajani, Holly Fee, Jason Fields, Benjamin Gurrentz, Adriana Hernández-Viver, Yerís H. Mayol-García, Robert Munk, Rolando Rodriguez, Rachel Shattuck, Phyllis Singer, Jordan Stanley, Sam Szelepka, Evan Totty, and Ashley Westra, all of the Census Bureau; Jerry Reiter, Duke University; danah boyd, Microsoft Research and Georgetown University; and Lars Vilhuber, Cornell University.

The panel also extends its gratitude to members of the staff of the National Academies of Sciences, Engineering, and Medicine for their significant contributions to this report. Kirsten Sampson Snyder and Bea Porter masterfully shepherded the report through the review and production process, and Marc DeFrancis provided useful editorial advice that streamlined the report. Joshua Lang and Madeleine Goedicke provided administrative and logistical support for numerous panel meetings.

Page x Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Brian Harris-Kojetin, senior scholar and former director of the Committee on National Statistics, and Melissa Chiu, current director of the Committee on National Statistics, had key roles in the original study design and selection and recruitment of the study panel, along with ongoing support of the panel and the preparation of the report. Bradford Chaney, study director and senior program officer, assisted in leading the panel and acquiring needed resources. Nancy Kirkendall and David Johnson, both senior program officers, provided valuable assistance based on their past experience with SIPP and the Census Bureau.

To my colleagues on the panel, I appreciate your diligence and expertise in examining the difficult issues raised in this study, and your spirit of cooperation in coming together to reach a consensus. Your shared wisdom from across a wide range of expertise areas, team spirit, and generosity of time brought innovative ideas to the discussions and produced this report. It was a great pleasure to work with you all. Thank you.

Trivellore Raghunathan, Chair
Panel to Create a Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation

Page xi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Boxes, Figures, and Tables

BOXES

S-1 Statement of Task

S-2 Methods of Adjusting the Data to Protect Confidentiality

1-1 Statement of Task

2-1 Relationship Categories Used in SIPP

2-2 SIPP Disclosure Avoidance Procedures

4-1 Sample Data Usage Agreement

9-1 Illustration of Feasibility: Descriptive Analysis Using Unique SIPP Content

9-2 Illustration of Feasibility: Longitudinal Analysis with Household Relational Data

FIGURES

S-1a Stages of disclosure avoidance

S-1b Disclosure avoidance approaches and tiers of access

6-1 Fully synthetic data

6-2 Selected variables are synthetic

6-3 Variables are synthesized for selected respondents

6-4 How a validation server works

9-1 Uses of SIPP data in the most cited and recent studies (percentage)

9-2 Unique SIPP content used in the most cited and recent studies (percentage)

9-3 Number of respondents reporting use of various SIPP modules

Page xviii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

9-4 Share of first authors at institutions or in cities with Federal Statistical Research Data Centers (FSRDCs)

9-5 Determinants and barriers to accessing restricted Census Bureau data through the current FSRDC system

E-1 Characteristics of respondents to call for information

E-2 Number of different SIPP data sources used by respondents to the call for information

E-3 Types of respondents to the call for information that used each format of SIPP data file

E-4 Number of respondents to the call for information who used each module or topic area within SIPP

E-5 Number of modules or topic areas used by respondents to the call for information

E-6 Types of analysis performed by respondents to the call for information

E-7 Difficulties experienced by respondents to the call for information when using the public-use files

E-8 Difficulties experienced by respondents to the call for information when using the synthetic data

E-9 Impact of encountering difficulties with accessing SIPP data

E-10 How the results from SIPP data were used

E-11 Fields in which SIPP data findings were published

E-12 Degree to which SIPP findings could be met by standardized tables

TABLES

1-1 List of Briefings Provided to the Panel

2-1 Data Collected in SIPP 2020, by Broad Category

2-2 SIPP Bibliographic References, 2000–2014, by Topic

3-1 Percentage of SIPP Households That Are Unique in Wave 1, Based on the Number of Types of Information Included

3-2 Percentage of SIPP Households That Are Unique Across Waves, Based on the Number of Types of Information Included

3-3 Percentage of SIPP Households That Are Unique, Based on the Number of Types of Information Included, and Replacing Occupation with Highest Level of Education

3-4 Key Areas in Which Three Commercial Databases Have Data That Correspond to SIPP Data

9-1 Matrix for Evaluating Feasibility with the Context of Various Modes of Access

9-2 Example of Evaluation of Accessibility by Mode of Access and User Type (1 = low to 4 = high)

C-1 Examples of Two Discrete Laplace Perturbation Vectors for ε = 1.5, δ = 0.00002 and ε = 0.5, δ = 0.008

Page xix Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Acronyms

ACS	American Community Survey
ASA/SRM	American Statistical Association’s Survey Research Methods
BLS	Bureau of Labor Statistics
CNSTAT	Committee on National Statistics
CPS	Current Population Survey
DHHS	U.S. Department of Health and Human Services
FISMA	Federal Information Security Modernization Act of 2014
FSRDC	Federal Statistical Research Data Center
GAN	generative adversarial network
ICAR	intrinsic conditional autoregressive
id	identifier
IRB	Institutional Review Board
IRS	Internal Revenue Service
LBD	Longitudinal Business Databases
MINT	Modeling Income in the Near Term
NF	normalizing flows
NSDS	National Secure Data Service
PIK	Protected Identification Key
PSID	Panel Study of Income Dynamics
QIDs	quasi-identifiers
RAP(s)	Remote Analysis Platform(s)
RDC	Restricted Data Center
SAE	small area estimation
SCHIP	State Children’s Health Insurance Program

Page xx Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

SDL	statistical disclosure limitation
SIPP	Survey of Income and Program Participation
SNAP	Supplemental Nutrition Assistance Program
SODA	secure online data access
SSA	Social Security Administration
SSB	SIPP Synthetic Beta
SSI	Supplemental Security Income
SUDA	Special Uniques Detection Algorithm
TANF	Temporary Assistance for Needy Families
USDA	U.S. Department of Agriculture
VAE	variational autoencoders
VDE	virtual data enclave
WIC	Special Supplemental Nutrition Program for Women, Infants, and Children

Page xxi Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Glossary

Added noise for privacy protection The altering of survey responses (e.g., by adding or subtracting some amount), which may vary across responses and may be randomized both in terms of which data are altered and in how much the data are altered.

Bottom-coding Setting a minimum value that may be released; for example, all values at or below $1,000 are set to $1,000.

Data perturbation Changes to the data to protect confidentiality; these include adding noise and data swapping.

Data suppression Reducing the amount of data that are released, such as by completely eliminating some measures, modifying the measures to make them less specific (e.g., top-coding, bottom-coding, and collapsing a continuous variable to become a categorical variable), and modifying what data are released (e.g., suppressed table cells based on fewer than three observations).

Data swapping Data items are swapped between two or more comparable respondents in order to protect confidentiality and to provide deniability if someone claims to have identified a respondent—for example, swapping the state of residence for two respondents. The swapping may be directed (i.e., designed to address a particular disclosure risk for a respondent) or random. The purpose is to maintain the same overall totals (and hopefully similar statistical relationships) while protecting the confidentiality of who gave what response.

Page xxii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

Data Use Agreement Specifies limitations on how data may be used and publicly released; for example, the data must be used only for statistical purposes and not to identify individuals.

Differential privacy Differential privacy is the leading form of formal privacy used by government agencies and by researchers on privacy methods. It is a framework for both quantifying the level of disclosure risk (under several related metrics) and developing disclosure limitation methods that control the risk of single and multiple releases under those metrics.

Disclosure Review Board A committee that sets limits on what data may be released—for example, by limiting what variables may be included in a public-use file or restricting what tables may be published.

Federal Statistical Research Data Center (FSRDC) These are created through partnerships between federal statistical agencies and research leader institutions. They provide secure access to restricted data, either on-site or virtually. There is an approval process for allowing access, both for individuals seeking access and for the project to be performed. There are also financial costs involved.

Formal privacy Formal privacy refers to any rigorous and unambiguous framework for quantifying disclosure risk in an internally consistent statistical framework that bounds the success probability of a wide class of potential attacks on privacy.

Gold Standard File A file containing original (nonsynthesized) data created by the Census Bureau as a step toward producing the synthetic data. It is also used to verify whether statistics based on the synthetic data are consistent with those using the original data. It is not a master file of all Survey of Income and Program Participation (SIPP) original data but rather was created specifically for SIPP synthetic data.

Institutional Review Board (IRB) A committee that reviews potential research studies on human subjects and monitors them to ensure that they comply with applicable regulations, meet commonly accepted ethical standards, follow institutional policies, and adequately protect research participants.

Microdata Data at the level of individual persons or respondents (as differentiated from summary statistical data).

Page xxiii Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

National Secure Data Service (NSDS) The creation of an NSDS has been proposed by the Commission on Evidence-Based Policymaking to support statistical evidence building through data sharing and linking, providing a pathway for those desiring data access and expertise. Currently the National Science Foundation is carrying out a demonstration project to inform whether and how an NSDS will be established in the future.¹

Privacy budget A limit on the total amount of data that may be released within the context of formal privacy; each published statistic draws from this budget, and at some point either no additional statistics may be released or the privacy budget must be changed.

Public-use file A file containing microdata that can be downloaded by anyone and may be analyzed and reported on without limitations.

Quasi-identifier A data value that doesn’t directly identify a person but that might be used to identify a person. For example, while a name or address would identify a person directly and be an identifier, a zip code would provide highly specific information that might help to identify a person and would therefore be a quasi-identifier.

Recoding Often used in disclosure avoidance to reduce the number of discrete values that appear. For example, a continuous variable such as household income might be converted into a categorical measure with only a few categories, or a categorical variable such as the state of residence might be converted to a measure of geographic region. Recoding is also used to make two different databases more consistent with each other.

Restricted-use file A file in which there are limitations in how the data may be analyzed and reported on. The restrictions may range from clicking on a user agreement concerning how the data will be used (with the data remaining available to anyone consenting to the user agreement) to a file in which an application process is designed to control who is allowed access and in which strong controls may be in place on what data can be accessed and what can be reported.

Secure online data access (SODA) A mechanism through which data may be accessed virtually (online) with controls to protect respondents’ privacy, such as a process to gain permission to work with the data, controls on

___________________

¹https://ncses.nsf.gov/about/national-secure-data-service-demo

Page xxiv Cite

Suggested Citation:"Front Matter." National Academies of Sciences, Engineering, and Medicine. 2024. A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation. Washington, DC: The National Academies Press. doi: 10.17226/27169.

×

which data are accessible, and controls on what data may be publicly released. Also called a virtual data enclave, the term is used here to differentiate it from virtual data access through FSRDCs, with less stringent controls on access and potentially less complete access than would be available through an FSRDC.

SIPP Synthetic Beta (SSB) A synthetic data product created by the Census Bureau that combines selected data from SIPP with administrative tax and benefit data. Early versions were partially synthetic; the latest version is fully synthetic.

Synthetic data Data that are created through a statistical modeling process to have the same statistical properties as the original data. The intention is to allow researchers to perform the kinds of statistical calculations and get results that are similar to what would be produced from the original data without allowing access to the original data. The data may be fully synthetic (all of the records are generated from the model) or partially synthetic (some records or variables are generated from the model, while others are identical to the original data).

Top-coding Setting a maximum value that may be released—for example, by setting all values at or above $10,000 to $10,000.

Verification and validation system A system designed to measure whether the results using altered data (due to disclosure avoidance procedures, particularly as applied to synthetic data) are comparable to those from the unaltered data and that provides researchers with validated results that may include added noise to protect confidentiality.

A Roadmap for Disclosure Avoidance in the Survey of Income and Program Participation (2024)

Chapter: Front Matter

PANEL TO CREATE A ROADMAP FOR DISCLOSURE AVOIDANCE IN THE SURVEY OF INCOME AND PROGRAM PARTICIPATION

Staff

COMMITTEE ON NATIONAL STATISTICS

Staff

Reviewers

Acknowledgments

Contents

Boxes, Figures, and Tables

BOXES

FIGURES

TABLES

Acronyms

Glossary

Welcome to OpenBook!

Get Email Updates