Skip to main content

Currently Skimming:

4 Blended Data: Implications for a New National Data Infrastructure and Its Organization
Pages 75-108

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 75...
... This chapter describes diverse data assets that can be combined for statistical purposes; the criteria that govern data acquisition, access, and use; and the implications of blended data on the components and capabilities of a 21st century national data infrastructure, as well as the associated privacy and ethical challenges. The chapter ends with a consideration of various organizational structures that may facilitate cross-sector data access and use.
From page 76...
... Principal Federal Statistical Agencies and Units In the panel's vision of a new data infrastructure, the existing and future data assets of designated statistical agencies and units (as shown previously in Box 3-3) should be available for blending, subject to strong privacy protections and ethical considerations, with other data.
From page 77...
... These data are often termed administrative data. Blending administrative data with data collected by statistical agencies is an active area of innovation in federal statistical agencies and in the research community.
From page 78...
... requires agencies to look for alternative data sources before conducting a new ­survey.3 However, despite the spirit of the Evidence Act's directive to expand use of existing data for statistical purposes, the Act does not override current statutory prohibitions regarding sharing. For example, the Internal Revenue Code 6103(j)
From page 79...
... Paycheck Protection Program5 added. The Evidence Act addressed a major barrier to data access by providing statistical agencies with a broader statutory basis for accessing and using data assets of other federal agencies (U.S.
From page 80...
... For example, local governments and cities are u­ sing data to make smarter, more informed decisions.6 In the panel's vision, a new data infrastructure should include such state, tribal, territory, and local government data assets, creating blended statistics of greater value. Provision of funding to states, tribal lands, local governments, and terri­ tories could incentivize such sharing by helping these data holders to use information, establishing two-way data sharing, and thus adding value for local decisionmaking (Moyer, 2021)
From page 81...
... The panel concludes, like CEP, the Markle Foundation, and earlier National Academies' Com mittee on National Statistics reports, that a new data infrastructure should include state, tribal, territory, and local government data assets, creating blended statistics of greater value. 9 The directory is compiled by the Office of Child Support Enforcement in the U.S.
From page 82...
... In an earlier report, the National Academies recommended that "Federal statistical agencies should systematically review their statistical portfolios and evaluate the potential benefits of using private sector data source" (the National Academies, 2017a, p.
From page 83...
... For all the promise of commercial data, private sector data are not without limitations. Like administrative data, private sector data are ­collected for a purpose different from that of data for use in a national data infrastructure.
From page 84...
... Some data brokers, like CoreLogic, blend diverse data sources to ­develop inno­vative products. CoreLogic blends collected data from 5.5 billion property records -- more than a billion visual records including aerial photos, home tours, and interactive floor plans -- and several hundred a­ nalytical models that extrapolate raw data into an entire portfolio of products that CoreLogic sells to companies and government agencies, including statistical agencies.13 Experian, Transunion, and Equifax assemble data from con­sumers' credit-related actions and provide reports to individuals, as well as to other businesses for advertising and marketing purposes (Irby, 2022)
From page 85...
... emphasized the importance of statistical agencies maintaining good relationships with third-party data holders, to fulfill the legal obligations of the statistical agency. Data-broker data assets and the many issues they raise warrant careful evaluation before inclusion in a new data infrastructure.
From page 86...
... . Crowdsourced or Citizen-Science Data Holders The use of crowdsourced data or volunteered data purposefully collected and assembled by the public to support information assets has emerged as an increasingly significant source of data that can also be used to guide official decisionmaking.
From page 87...
... Box 4-2 lists the data holders whose data should be available for possible inclusion in a new data infrastructure. The Evidence Act, once fully enacted, will make the federal statistical agency and federal program and administrative data assets available to the data infrastructure, when not prohibited by law.
From page 88...
... (Conclusion 4-1) In the panel's ideal vision, easily accessible, comprehensive catalogs of data assets would be a key feature of a new data infrastructure.
From page 89...
... . Which of the key data-holding groups should be part of a new data infrastructure, and which of their held data should be prioritized in a new infrastructure?
From page 90...
... established a Working Group on Transparent Quality Reporting in the Integration of Multiple Data Sources, to identify best practices associated with data-quality measurement and reporting for blended data products. This work was motivated by the in creasing use of alternative and blended data by statistical agencies and by the Committee on National Statistics' Panel on Improving Federal Statistics for Policy and Social Science Research Using Multiple Data Sources and State-of-the Art Estimation, described in Chapter 2.
From page 91...
... Data Minimized to Satisfy Pre-Specified Purposes In the panel's view, a new data infrastructure should not result in the unbridled harvesting of all digital data that exists in the country. Instead, the data-acquisition request -- the records, data elements, data granularity, and frequency -- should be limited to the information needed to satisfy the proposed statistical purpose.
From page 92...
... Reliability, a related concept, characterizes the consistency of results when the same phenomenon is measured or estimated more than once under similar conditions. Coherence Coherence is the ability of the data products to maintain common definitions, classifications, and methodological processes, to align with external statistical standards, and to maintain consistency and comparability with other relevant data.
From page 93...
... A new data infrastructure should actively engage data holders to develop a range of possible approaches that could help ensure responsible data exchange. Prioritize Easily Acquired Data That Provide Tangible Benefits While the most important criteria for inclusion of data in a new data infra­structure involve utility to the country's informational needs, some data access may require unusually complicated logistical challenges.
From page 94...
... For potentially valuable data assets lacking usable metadata, the metadata need to be developed and available to possible data users. A data infrastructure entity may collaborate with the data holder to develop the necessary documentation.
From page 95...
... understanding of the meaning of data items and the limitations of each data asset. Box 4-3 summarizes the criteria of data that, according to the panel's vision, should be included in a 21st century national data infrastructure.
From page 96...
... . The report discusses 28 An important consideration for a new data infrastructure is the attitudes of data subjects regarding linkage of their data, as presented in Fobia et al.
From page 97...
... ; and • Encourage federal agencies to develop partnerships with academia and encourage external research organizations to develop methods needed for design and analysis using multiple data sources. BLENDED DATA REQUIRE NEW STATISTICAL DESIGNS As a new data infrastructure evolves, in the panel's view no single data-sourcing strategy will be optimal for all informational needs.
From page 98...
... Box 4-4 lists work by the United Nations' Economic Commission on Europe's High-Level Data Group for Modernization of Statistical Production and Services related to a Common Statistical Data Architecture (CSDA) , an initiative aimed at consistently describing the data aspects of statistical production.29 The group identified high-level capabilities required by a new data infrastructure to realize the promise of blending multiple data sources.
From page 99...
... For example, full engagement of the academic sector can provide critical ­capacities like analytical expertise, upskilling existing organizational skills, and educating the future workforce, so that agencies can operate nimbly and dynamically in a new data infrastructure. BLENDED DATA POSE NEW PRIVACY AND ETHICAL CHALLENGES A 21st century national data infrastructure cannot succeed without ensuring ethical exchange of data; trust in institutions involved in data
From page 100...
... While these values must be fundamental to a data infrastructure, laws provide another framework for addressing the range of social, cultural, and reputational issues at play. In the 20th century, lawmakers passed numerous bills restricting government data collection and use.
From page 101...
... Meanwhile, however, the conversation has evolved. Federal government agencies and academia are speaking of trustworthy AI,32 data ethics,33 and data equity.34 Practitioners in the industry use similar language, cognizant of how "trust" is dependent on being seen as "responsible." Data holders are engaged in robust conversations about "data governance," while representatives of data subjects are asking to be included in governing mechanisms.
From page 102...
... In the panel's opinion, CEP's recommendations are consistent with the necessary attributes of a new data infrastructure but are insufficient to form the foundation of this infrastructure. CEP recommended broader access to federal administrative data for statistical purposes and sharing of statistical data resources among the federal statistical agencies, and also recommended that the National Secure Data Service (NSDS)
From page 103...
... CEP and ACDEB have focused on using federal, state, and local data for evidence building, proposing the establishment of NSDS to bring these data assets together. In the panel's opinion, a comprehensive vision of a new data infrastructure is incomplete without addressing how the blending of private sector data with other data assets might improve the country's understanding of its current situation and prospects.
From page 104...
... Organizational Models to Facilitate Cross-Sector Data Access and Use The panel's vison of a new data infrastructure should tap assets as necessary, from all sectors of society that produce digital data about the state of the country. Such an infrastructure was not anticipated by the organizational structure of the current federal statistical system.
From page 105...
... Option 1: NSDS Coordinates Access to All Data Sources In this vision, NSDS (a combined, comprehensive new entity within the data infrastructure, established with the guidance of ACDEB, with rule­ making mandated by the Evidence Act) would have authority over data access from all sectors.
From page 106...
... Other locations are possible if the FFRDC has a federal sponsor and is delegated the authority to provide all NSDS-like services to federal, state, tribal, territory, and local government data holders, as well as to private sector and other data h ­ olders. In the panel's view, if this approach is pursued, the FFRDC should have all the rights and responsibilities of a federal statistical agency, coverage under CIPSEA, and access to all data holdings.
From page 107...
... . SUMMARY This chapter discussed the data assets of a 21st century national data infrastructure, including how those assets are sourced and evaluated.
From page 108...
... How? • Statistical agencies now have the authority to retain data assets used for sta tistical purposes -- will establishment of the entity change statistical agencies' ability to retain newly acquired data?


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.