Skip to main content

Currently Skimming:

3 Using Multiple Data Sources to Enhance Data Equity
Pages 55-82

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 55...
... This report discusses the promise of using multiple data sources to augment information collected in federal surveys with data from government a­ dministrative records, private-sector data, and other data sources. But, as Giest & ­ ­Samuels (2020, p.
From page 56...
... Sections 3.2–3.7 examine ways that using multiple data sources can promote data equity: by increasing the representation of population groups that, historically, have been underrepresented in the data record (Sections 3.2 and 3.3) ; by producing model-based estimates for small populations (Section 3.4)
From page 57...
... 2. Feature equity focuses on the availability of variables needed to identify population subgroups or measure characteristics of inter est.
From page 58...
... The Federal Data Strategy lays out guidelines for making data produced by federal government agencies accessible to users: "Promote equitable and appropriate access to data in open, ma chine-readable form and through multiple mechanisms, including through both federal and nonfederal providers, to meet stakeholder needs while protecting privacy, confidentiality, and proprietary interests" (OMB, 2019a, p.
From page 59...
... Combining multiple datasets to obtain better representation can help with the last problem. aFor example, Angwin et al.
From page 60...
... This chapter concentrates on representation equity and feature equity because combining data sources can improve coverage, augment sample sizes, and add variables. As will be discussed in future reports, combining information from multiple sources raises new concerns about protecting privacy, which may affect decisions about data access.
From page 61...
... ; one component of such an analysis involves comparing estimates calculated from survey respondents to known characteristics of the population obtained from an external source, such as administrative records or the decennial census. Box 3-2 describes the use of an independent data source (the postenumeration survey)
From page 62...
... bThe U.S. Census Bureau also uses demographic analysis to assess census coverage, comparing census counts with those obtained from estimates produced using birth and death records, data on international migration, and other administrative records.
From page 63...
... They found that only about eight percent of the local food farms in the webscraped sample were missing from the NASS list frame, but that those were more likely to be small operations.2 Hyman, Sartore, & Young (2022) linked the records to assess the coverage of the two lists and increase the coverage of the combined samples, but multiple-frame surveys can also be used to improve coverage without explicitly linking data, as long as there is some way to identify entities that could appear in more than one of the samples (for example, telephone surveys that select landline and cell phone samples ask respondents about their landline and cell phone usage, thereby identifying the respondents who are in both frames)
From page 64...
... One focus is on using multiple data sources to measure progress toward the United Nations Sustainable Development Goals -- work that is closely related to data equity.3 Key to the efforts for obtaining information at finer levels of aggregation is "[u] sing a common list of administrative units across censuses and surveys, and including identical census questions in subsequent household surveys" (United Nations Inter-Secretariat Working Group on Household Surveys, 2022, p.
From page 65...
... Some programs, such as the Canadian Housing Statistics Program, have relied almost exclusively on administrative data. This program links data from existing administrative sources and the Canadian Census of Population to provide a comprehensive portrait of Canada's housing market, with the goal of including all residential properties in Canada and their owners (Arora, 2022b, p.
From page 66...
... 66 ENHANCING SURVEY PROGRAMS BY USING MULTIPLE DATA SOURCES FIGURE 3-1  Statistics Canada Disaggregated Data Action Plan. SOURCE: Statistics Canada, Catalogue no.
From page 67...
... For example, the Bureau of Transportation Statistics used cell phone location data to study travel patterns early in the COVID-19 pandemic.6 CONCLUSION 3-1: Many data sources include or represent only part of the population of interest. Multiple data sources can be used to assess and improve the coverage of underrepresented groups, and to enable the production of disaggregated statistics.
From page 68...
... In general, though, having predictor variables in administrative data that are highly correlated with outcome variables will produce small area estimates that are more accurate, on average, than estimates calculated using the survey data alone. 3.5  ASSESS AND REDUCE MEASUREMENT ERROR Record linkage can provide a cross-check on measurements of the same concept across data sources.
From page 69...
... Before the original version of Directive 15 was issued in 1977, each federal agency could use its own categories for race and ethnicity, making it difficult to compare statistics across datasets. For example, death rates for subpopulations are calculated by dividing the number of deaths from administrative records by the subpopulation size from the decennial census or intercensal population estimates; the categories must be defined the same way for these rates to be meaningful.b Uniform standards allow statistics to be compared and combined across datasets.
From page 70...
... . For example, participants in surveys and the decennial census are asked to select the race and ethnicity categories that best describe them.
From page 71...
... . But, over the three decades studied, only 51–55 percent of decedents who self-identified as American Indian or Alaska Native (AIAN)
From page 72...
... Census Bureau to study income inequality. Income tax data contain a great deal of information on various types of income, as well as adjustments used in calculating adjusted gross income, but the individual income tax form (Form 1040)
From page 73...
... Data linkage at the area level can attach state-, county-, or neighborhood-level variables to a dataset. The Urban Institute's Spatial Equity Data Tool (Narayanan, Stern, & Macdonald, 2021; Urban Institute, 2021b; Brown, 2022)
From page 74...
... They linked records from a probability sample of live births in 2009 with administrative data sources such as death records, records from child protective services agencies, and records from the Anchorage Police Department. Estimates of the incidence of child maltreatment (defined as having at least one report of maltreatment from the multiple sources in the six-year follow-up period)
From page 75...
... . Federal statistical agencies typically acquire data under a pledge of confi dentiality, promising survey respondents that the information they provide will be used for statistical purposes only and that the information will not be disclosed in identifiable form without respondents' consent (see Box 3-5; Appendix A of NASEM, 2021b describes federal laws protecting privacy and confidentiality of information)
From page 76...
... grows substantially." Combining data sources, and particularly combining data through record linkage, can add information to datasets that could potentially be used to identify individuals in the data even if the original data are anonymized. Even if individuals are not identified, Randall, Stern, & Su (2021, p.
From page 77...
... Interventions to advance equity may reveal inequities or new challenges, requiring continual efforts to improve data equity involving researchers and the communities themselves. CONCLUSION 3-2: Record linkage can merge information from ­separate data sources and add variables that are needed to produce dis­ aggregated statistics.
From page 78...
... to aid in the analysis of racial and ethnic disparities and in the development of targeted quality improvement strategies, recognizing the probabilistic and fallible nature of such indirectly estimated identifications." Administrative data sources can lack information that can be used to distinguish group membership. For example, one impediment to producing disaggregated data for Medicare beneficiaries has been the quality of race and ethnicity information in the administrative files, derived primarily from SSA records.
From page 79...
... As discussed in Chapter 5, many respondents leave income questions blank, and the missing income data may be imputed from model predictions, from another record in the data with similar characteristics, or from a separate data source. However, respondents 13 Other researchers have found similar patterns of accuracy, validating the Bayesian ­Improved Surname Geocoding method (e.g., LeRoy et al., 2013)
From page 80...
... BOX 3-5 Informed Consent and Data Ownership Increased use of multiple data sources raises ethical and legal questions about data ownership and consent for linking data, with implications for data equity. A key question is whether informed consent should be obtained from partici pants before linking data from separate sources.
From page 81...
... How is this right upheld when information comes from multiple data sources? In summary, data equity and informed consent need to be balanced.
From page 82...
... Documentation of equity aspects, including a discussion of the decisions to include or exclude population subgroup information and an evaluation of data quality for subpopulations of interest, will promote transparency. Development of standards for data equity, and procedures for regularly reviewing equity implications of statistical programs, would enhance efforts to improve data equity across the federal statistical system.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.