National Academies Press: OpenBook

Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources (2023)

Chapter: 9 Combining Data Sources for National Statistics: Next Steps

« Previous: 8 Using Multiple Data Sources for County-Level Crop Estimates
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

9

Combining Data Sources for National Statistics: Next Steps

In this series of reports, the Committee on National Statistics (CNSTAT) is laying out a vision for a reimagined data infrastructure—one that relies on multiple data sources in addition to probability surveys—for generating official statistics in the United States. The first report, Toward a 21st Century National Data Infrastructure: Mobilizing Information for the Common Good (NASEM, 2023), articulated key attributes of the envisioned infrastructure (see Box 1-2).

This report explored implications of using multiple data sources for expanding or replacing information currently collected in major survey programs. The panel examined recent activities in building frames at the U.S. Census Bureau, and explored aspects of current and potential future practices for combining data in four areas: income, health, crime, and agriculture. These areas were chosen to illustrate diverse methods, challenges, and uses of data combination.

Household surveys have been a fundamental means of data collection about both income and health, and these topics have also been the subject of detailed administrative data collections. This report discusses how record linkage has been used to improve measurement and to increase the number of data attributes associated with respondents to income and health surveys. Linking income survey responses with Internal Revenue Service (IRS) tax data and transfer program benefits data has provided valuable insights about the accuracy of survey responses and alternative perspectives on key measures such as poverty and income distribution. Linking health survey records with the National Death Index has allowed researchers to evaluate mortality risks associated with health conditions. Record linkage and

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

imputation have also enabled researchers to make use of large administrative databases such as Medicare claims data to produce statistics that are disaggregated by race and ethnicity. In these usages, the administrative data contain records for everyone participating in the program, but often do not contain accurate (or any) information about race, ethnicity, or other characteristics for which disaggregated statistics are desired. Linking with a source such as the decennial census or the American Community Survey attaches that information to the administrative records, and can identify characteristics of people who are eligible for the program but do not participate.

Crime statistics present additional challenges for integrating data. One of the major data sources, the National Crime Victimization Survey, is a household survey. The other major source, the Uniform Crime Reporting Program, compiles crime statistics from data submitted by states and individual law enforcement agencies. This program faces challenges similar to those of other programs that compile administrative records from states: missing data from states and agencies that do not make submissions, the need to assess and improve quality of data that are supplied, and the need to resolve measurement differences among data suppliers.

Obtaining accurate and timely statistics for agriculture exemplifies some of the challenges faced by establishment surveys. The nature of agricultural statistics also opens opportunities to rely more heavily on data from satellites and sensors in addition to administrative records. There is also potential to make use of the detailed data that many farm operators and agribusinesses collect in their precision agriculture programs. This report focuses on the use of small area models to combine data from various sources for producing county-level crop estimates.

Of course, statistical agencies in these and other areas have done a great deal of additional work on combining data sources. This report does not explore other topics or survey programs, but the examples in the report illustrate challenges and opportunities for other subject areas as well.

9.1 THEMES FOR COMBINING DATA

Each example studied in this report presents unique challenges and opportunities, but the examples share some common themes.

Multiple Data Sources Can Add Value for Official Statistics and Research

Some information, such as opinions and personal experiences, can be collected only through surveys. But there are growing demands for more timely, more granular, and more accurate data on an ever-increasing number of topics. Administrative records and other data sources, either combined with or in place of surveys, can help meet those demands.

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

Administrative records offer four main benefits for contributing to official statistics. The first benefit is the sheer size of many administrative datasets. Income tax records contain information about every tax filer; a state’s dataset for the Supplemental Nutrition Assistance Program contains records for all residents participating in the program; the National Death Index contains information on almost all deaths occurring after 1979. Second, some administrative datasets may have information on population members not represented in surveys, such as persons in nursing homes or survey nonrespondents. Third, administrative data are already being collected for other purposes, so the only costs for their use involve acquiring them, studying and documenting their properties, and repurposing them for producing statistics. Fourth, administrative data provide alternative perspectives on concepts measured in surveys, and thus can contribute to improved understanding of the measures in both data sources.

The previous National Academies of Sciences, Engineering, and Medicine report in this series (NASEM, 2023) explored the potential of using private-sector data to produce official statistics. Challenges of using private-sector data for official statistics are greater than the challenges of using government-collected administrative records, in part because of the limited history of public-private data cooperation. However, private-sector data such as those collected through precision agriculture programs or private health insurance companies could potentially improve federal statistics and create new data resources for social and economic research—if these data can be shown to be reliably available, accurate, and cost-effective sources of information.

There are multiple ways to take advantage of alternative data sources (see Chapter 2). When data sources contain high-quality information for identifying individual entities, data records can be linked. Data linkage is not the only way to combine data sources, however. Statistical models can be used to combine data for individuals, or to combine statistics for geographic areas or population subgroups. Small area models (discussed in Chapters 2, 7, and 8) can be a cost-effective way of providing useful estimates for small geographic areas because such models can “borrow strength” from similar areas and make use of correlated data from administrative records and other sources. For all these methods, however, the quality of estimates depends on the quality of the individual input data sources and the statistical properties of methods used to combine them.

Probability surveys have important strengths that in many cases cannot be entirely replaced by administrative records or information from other existing sources. These strengths range from the probabilistic design itself to the collection of information that can only be obtained by asking a sample member directly. Additional research is needed to identify specific ways that data from other sources can add value to probability surveys,

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

for example through providing information to improve survey design and measurement, augmenting survey information through data linkage, or reducing respondent burden.

Quality of Integrated Data and Statistics

Multiple data sources show great promise for improving official statistics and enhancing research, but using multiple sources is more complicated than producing statistics from a single source. It is challenging to evaluate the quality of data from a single source and to assess how various sources of uncertainty affect statistics. Evaluating the quality of statistics produced from combined data sources is even more challenging.

The Federal Committee on Statistical Methodology (2020) discussed factors that affect the accuracy of integrated data. These factors include the contributors to error from each source: sampling error (for surveys), undercoverage, missing data, measurement error, and processing error. Standard procedures exist for reporting sampling error for surveys, but assessing bias from undercoverage and nonresponse is much more challenging. Statistics from administrative records are usually reported without measures of uncertainty (as with Uniform Crime Reports through 2020), but they are affected by undercoverage, missing data, and measurement and processing error. For example, tax records from the IRS do not include everyone in the population, and certain types of income, such as self-employment income, may be underreported (see Chapter 5).

Additional factors affect accuracy of statistics computed from combined data sources. Linkage error can result, for example, in appending the wrong person’s race to a data record, or in coding a person as living when in fact that person is in the National Death Index but the link was missed. Harmonization error, which arises when sources have different units or definitions for data elements (e.g., pixels in satellite data might not match up with farms or fields in another dataset; data sources might report information for nonsynchronous time periods, or sources may use different definitions for seemingly identical concepts) can lead to bias in estimates. Modeling error, which can occur when an imputation or small area estimation model is a poor fit for part of the population, can cause model predictions to be inaccurate.

Combining data sources also affects other dimensions of data quality (see Figure 1-1). An administrative data source might have information that is more granular than survey data, but the data might not be available to the statistical agency soon enough to produce timely statistics. Many survey programs produce public-use datasets that can be downloaded from the internet, but administrative records are often available only to approved researchers in restricted settings (if available at all). To assess the overall

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

“fitness for use” of statistics computed from combined sources, one must understand the purposes for which each dataset was created, the populations covered, the quality and limitations of each data element, and the properties of the data-combination method used. Some of the elements in a data source may have limitations that make them unsuitable for use as outcome variables, but could be deemed useful for other purposes, such as nonresponse adjustment or small area estimation.

A previous National Academies’ report on combining data sources (NASEM, 2017c, p. 2) recommended: “Federal statistical agencies should systematically review their statistical portfolios and evaluate the potential benefits and risks of using administrative data.” As part of ongoing quality-improvement programs, such systematic reviews could also consider procedures to take advantage of new data sources as well as changes in existing data sources.1

The U.S. Office of Management and Budget (OMB, 2002, 2019b) provided guidance to federal agencies for implementing the Information Quality Act.2 During pre-dissemination reviews of data products, “each agency should consider the appropriate level of quality for each of the products that it disseminates based on the likely use of that information” (OMB, 2019b, p. 2). Additionally, agencies should “provide the public with sufficient documentation about each dataset released to allow data users to determine the fitness of the data for the purpose for which third parties may consider using it” (OMB, 2019b, p. 4). As work on combining data sources progresses, it is important to continue to invest in improving the individual data sources—probability surveys, administrative records, and other data—that feed into a new data infrastructure.

In some cases, metrics and standards used to evaluate survey data may be adapted to apply to other data sources, but new methods and standards are needed to evaluate the quality of statistics produced from multiple sources. The first report in this series provided examples of standards that would be useful for a new data infrastructure (NASEM, 2023, Appendix 3B). The large volume of data from alternative sources could be further mined to build analytics that may provide additional insights into data-quality issues and that could be used to guide data collections that are consistent, reliable, and aligned with relevant fitness-of-use criteria.

___________________

1Principles and Practices for a Federal Statistical Agency (NASEM, 2021b, p. 43) listed continual improvement and innovation as one of the five principles for federal statistical agencies: “Federal statistical agencies must continually seek to improve and innovate their processes, methods, and statistical products to better measure an ever changing world.”

2 Treasury and General Government Appropriations Act of 2001, P. L. 106-554, § 515(a) (2000) (as codified at 44 U.S.C. § 3516, note).

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

CONCLUSION 9-1: The quality of statistics produced from multiple data sources depends on properties of the individual sources as well as the methods used to combine them. A new framework of quality standards and guidelines is needed for evaluating such data sources’ fitness for use.

Transparency and Documentation

CNSTAT’s Principles and Practices for a Federal Statistical Agency emphasized the importance of transparency and documentation of data products:

Federal statistical agencies must have credibility with those who use their data and information. The value of a statistical agency rests fundamentally on the accuracy and credibility of its data products. Because few data users have the resources to verify the accuracy of statistical information, users rely on an agency’s reputation to disseminate high quality, objective, and useful statistics in an impartial manner (NASEM, 2021b, p. 31).

A statistical agency must be transparent about how it acquires data and produces statistics and be open about the strengths and limitations of its data…. Openness requires that statistical releases from an agency include a full description of the purpose of the program; the methods and assumptions used for data collection, processing, and estimation; information about the quality and relevance of the data; analysis methods used; and the results of research on the methods and data (NASEM, 2021b, p. 95).

Chapter 7 of the National Academies’ report Transparency in Statistical Information for the National Center for Science and Engineering Statistics and All Federal Statistical Agencies (NASEM, 2022e) described best practices for documenting, retaining, releasing, and archiving data. Tables 7.1–7.6 in that report outlined that panel’s recommendations regarding the information that a statistical program should retain or archive, the documentation that should be available internally for program staff, and the documentation that should be made available to the public. Table 7-4 of that same report focused on information that should, in that panel’s opinion, be made available to the public when record linkage is used:

A description of the specific data files that were matched should be provided routinely as part of the technical reports or versioned data documentation. Study-specific information and technical reports should be made available to the public. A description of the techniques used for record linkage, the variables used to match on, and a description of how the matching algorithm is implemented, including how uncertain matches are

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

treated, should be made available to the public as part of technical reports on data quality. If available, the estimated error rates for the record linkage routine in this environment should be provided, and if not available, any information on the quality of such a match should be provided instead (NASEM, 2022e, p. 161).

When preparing the current report, this panel found that the amount of detail provided in documentation varies across data collections. Documentation about datasets and data integration from the U.S. Census Bureau, the U.S. Bureau of Labor Statistics, and the U.S. National Center for Health Statistics was relatively easy to find on agency websites. Methodology reports for surveys provided well-organized and detailed descriptions of data collection, processing, and estimation procedures. The panel found similar high-quality documentation for administrative records systems coordinated by these agencies, such as the National Vital Statistics System. Methodology reports from these agencies could serve as models for other agencies that are developing documentation for current programs, and such reports could be a starting point for developing documentation guidelines to assess fitness for use and to address data-equity concerns.

CONCLUSION 9-2: Transparency and documentation of component datasets and of methods used to combine datasets are essential for producing trust in information created from multiple data sources, particularly as new types of data are used.

Data Equity

The use of multiple data sources to advance data equity is a major theme of this report. As Leary (2022) concluded from the workshop presentations: “It is clear that the future in many ways is equitable data science, and that equitable access to government programs and services requires, as we’ve heard, data that are timely, appropriately granular, and as [Robert] Santos and his colleagues … put it, ‘good enough and fit for purpose.’”

The introduction of probability surveys in the 1930s and 1940s was motivated in part by equity considerations, even though the term “equity” was not featured in the writings of the time. Hansen, Hurwitz, and Madow (1953, p. 9) wrote: “When the determination of the individuals to be included in a sample involves personal judgment, one cannot have an objective measure of the reliability of the sample results, because the various individuals may have differing and unknown chances of being drawn.” When the sampling frame is complete and there is no nonresponse, every population member has a known probability of being in the probability sample. This guarantees that the sample is representative of all subpopulations and

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

thus promotes representation equity. The survey designer can “[d]ecide what information is really needed,” define the population of interest, and lay plans “for eliciting clear, intelligible information” (Deming, 1950, p. 5). The survey instrument can measure categories for which disaggregated statistics are desired, promoting feature equity. Even when there is nonresponse, a probability sample has the advantage that the initial sample is selected randomly and is thus not subject to sources of bias that can affect other sample-selection methods.

In the panel’s view, probability samples will continue to have an important role for producing equitable and representative data in a new data infrastructure. Alternative data sources, however, can enhance data-equity aspects of survey programs. In some situations, the information currently collected in a survey can be obtained more efficiently, and with more granularity, from another source. Chen (2022, slide 13) highlighted the potential of “[s]caling up the use of objective measurement technologies” for promoting data equity. Administrative data can contribute to statistics about small geographic areas or small demographic groups, and may make it possible to produce statistics for previously unstudied populations by increasing their representation. Some populations, such as persons in nursing homes or prisons, are excluded from many surveys but included in some administrative records. Better representation is also important when data are used to develop algorithms to make decisions about hiring, creditworthiness, criminal justice, or medicine (see Box 3-1). Beyond that, using multiple data sources can help identify areas in which subpopulations are underrepresented or mismeasured in surveys or administrative data sources. Record linkage can add variables needed for producing statistics that are disaggregated by race, ethnicity, or other characteristics measured in a linked data source.

While combining data sources can enhance knowledge about subpopulations, there is also the potential that combining data will increase bias. Records with less information available for linkage are more likely to have linkage errors, and linkage rates vary by participants’ age, gender, race, ethnicity, and health and socioeconomic status (see Chapters 2 and 3; Bohensky et al., 2010). Small area and imputation models may also be less accurate for certain population subgroups and geographic areas.

Groves (2022) emphasized that concern about data equity has to be explicitly addressed in day-to-day practice. This includes equity of measurement, applicability of concepts, and coverage across diverse subgroups. In the panel’s judgment, data-equity considerations should be a key component of data-collection planning and of regular program reviews. Methodology reports for surveys and other data sources typically include assessment of the quality of the data for producing national statistics. As stated in Chapter 3, several of these quality dimensions map to equity aspects. Addressing

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

equity issues in data documentation will promote transparency and enhance data equity across the federal statistical system (see Conclusion 3-3).

Improving data equity across the federal statistical system will be challenging and will require a broad-based approach that integrates perspectives of federal statistical agencies, other data producers, data users, and community members. Possible short-term activities include

  • Developing standards for equitable data, such as revising U.S. Office of Management and Budget (1997) standards for collecting data on race and ethnicity (currently under way; see Box 3-3) and implementing best practices for measuring sexual orientation and gender identity (NASEM, 2022c);
  • Adding standardized items to surveys and administrative data collections to measure characteristics for which disaggregated statistics are desired and to facilitate linkage;
  • Increasing subpopulation sample sizes in selected surveys;
  • Facilitating increased federal-state-local data sharing; and
  • Researching equity impacts of data-collection and record-linkage methods (including investing in training necessary for equity assessment).

In the longer term, new statistical methods may need to be developed to promote data equity when combining data. Specifically, research is needed on methods for producing disaggregated statistics for small population groups while protecting confidentiality, and for ensuring that informed consent to data collection includes all possible uses of the data (see Box 3-5).

Wardell (2022) noted that equitable data is still a new concept and that it encompasses much more than just adding a new variable to a data collection. Executive Order 13985 (2021) charges agencies to understand disparities in the programs they administer and to identify roadblocks for accessing federal services. Data are essential for understanding programs’ impact and reach, and can be used to establish “feedback loops” where data inform changes to programs and services at federal, state, and local levels. Wardell (2022) also stressed the importance of building capacity for robust equity assessment—bringing in people with appropriate training and skill sets to work with federal agencies and also building capacity and infrastructure at the local level.

9.2 FUTURE CHALLENGES AND OPPORTUNITIES

This report, building on the framework for a vision of a new data infrastructure in the previous report in this series (NASEM, 2023), concentrates on methods and examples in which using multiple data sources can improve

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

statistics currently collected through major survey programs. Box 9-1 provides a list of all conclusions from this report. There is much work to be done, and future reports in this series will address other aspects of a new data infrastructure: governance and information technology structure, protecting confidentiality, and allowing public use of blended-data products.

For most of the examples in this report, agencies and researchers have already obtained access to the data sources, and the challenges involve how to use them. But of course, one of the primary impediments for using multiple data sources is the difficulty of acquiring or accessing the data. The Uniform Crime Reporting Program exhibits some of the challenges for computing representative statistics when a nonrepresentative part of the population contributes data (see Chapter 7). The previous report in this series (NASEM, 2023) discussed legal issues and incentives for sharing data to produce official statistics, arguing that organizations holding data will be more likely to share those data if they directly benefit from doing so.

Even when data sources are acquired, however, there is no guarantee that data elements collected now will continue to be collected in the future, or that agencies or private-sector organizations that are willing to contribute data now will keep sharing their data (or, if data are purchased, that the price will remain affordable). If administrative records or private-sector data are used for programs in which it is important to compare statistics over time, continued availability and consistency of information from the data sources are crucial. Federal statistical agencies can play an important role in increasing coordination in this area, both in terms of facilitating access and promoting standard definitions and protocols for measurement (Advisory Committee on Data for Evidence Building, 2022).

Creating useful statistics and data products from combined data sources requires skills in addition to those needed to produce estimates from probability surveys. A new data infrastructure requires investment not only in data sources but also in the people who can work with those data. Section 2.3 lists some of the technical challenges for combining data, and some of the areas of expertise needed to address them. Beyond the technical challenges, there are challenges for promoting data equity and public trust in data, and these areas require additional resources and expertise. Statistical agencies will need investments in personnel, training, and computer infrastructure to take advantage of new data resources.

CONCLUSION 9-3: Use of multiple data sources is expected to play a major role in the future production of statistical information in the United States, but additional technical expertise and resources are needed to address the challenges involved in producing and assessing the quality of integrated data and statistics.

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×

Groves (2022) noted that the workshop presentations (Appendix A) were the work of pioneers, and that one component of CNSTAT’s vision of a redesigned national data infrastructure concerns how to “make this type of work, now done by pioneers, routine rather than cutting-edge.”

Today’s data world contains amounts of digital information that were inconceivable when the theory of probability sampling was developed in the 1930s. Arora (2022b, p. 24) argued that the ability to use data to answer societal questions is “the real value proposition of a national statistical office. It’s not just about putting more data out there. It’s trying to make sense of what’s happening in society and showing how different parts of it are intricately connected. If we don’t do that, someone else will.” Groves (2022) concluded the workshop with a vision of the new data infrastructure:

We are in an unprecedented moment in history—in the history of digital data and information derived from those data…. We seek a vision, in sum, that will protect the privacy of Americans while simultaneously producing for them, and their common good, better statistical information.

Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 187
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 188
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 189
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 190
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 191
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 192
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 193
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 194
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 195
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 196
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 197
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 198
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 199
Suggested Citation:"9 Combining Data Sources for National Statistics: Next Steps." National Academies of Sciences, Engineering, and Medicine. 2023. Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources. Washington, DC: The National Academies Press. doi: 10.17226/26804.
×
Page 200
Next: References »
Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources Get This Book
×
Buy Paperback | $35.00 Buy Ebook | $28.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Much of the statistical information currently produced by federal statistical agencies - information about economic, social, and physical well-being that is essential for the functioning of modern society - comes from sample surveys. In recent years, there has been a proliferation of data from other sources, including data collected by government agencies while administering programs, satellite and sensor data, private-sector data such as electronic health records and credit card transaction data, and massive amounts of data available on the internet. How can these data sources be used to enhance the information currently collected on surveys, and to provide new frontiers for producing information and statistics to benefit American society?

Toward a 21st Century National Data Infrastructure: Enhancing Survey Programs by Using Multiple Data Sources, the second report in a series funded by the National Science Foundation, discusses how use of multiple data sources can improve the quality of national and subnational statistics while promoting data equity. This report explores implications of combining survey data with other data sources through examples relating to the areas of income, health, crime, and agriculture.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!