Skip to main content

Currently Skimming:

4 Using Private-Sector Data for Federal Statistics
Pages 55-72

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 55...
... . Indeed, a whole set of new enterprises are using large digital data resources as the basis of their business models (e.g., Uber, AirBnB, LinkedIn)
From page 56...
... In this chapter we first review the different kinds of private-sector data that are available and how the characteristics of these data affect their potential utility and usability for federal statistics. Next we briefly review efforts by national statistical offices around the world to examine and experiment with using these data sources to produce official statistics.
From page 57...
... surveys o Commercial • Mobile phone • Data from computer • Internet blogs and comments • Marketing transactions location sensors systems • Documents research o Banking and • Global o Logs • Pictures (Instagram, Flickr, surveys stock records Positioning o Web logs Picasa, etc.) • Media use o Credit card System sensors • Mobile phone • Videos (YouTube, etc.)
From page 58...
... release of the employment situation each month.3 The other three categories for private-sector data sources, shown in the last three columns of Table 4-1, vary in the structure of the data and how difficult they are to clean and transform into usable numeric form to produce statistics. By structured data we mean numeric data, often ordered into r ­ ectangular or fixed relational formats.
From page 59...
... Hence, there is a need to blend these new data resources with traditional survey data in new statistical analyses if they are to be used to improve any existing official statistics. Although blending data sources holds the potential to improve federal statistics, there is no guarantee that it will do so; thus, careful evaluation of data sources is necessary (see below)
From page 60...
... The national statistical offices of countries similar to the United States5 were most interested in using big data for "faster, more timely statistics" (88%) , "reducing response burden" (75%)
From page 61...
... These estimates can be produced extremely quickly if needed. In early 2016, the Netherlands experienced glazed frost, and Statistics Netherlands was able to produce estimates of how the glazed frost had affected traffic within 2 days.7 Another example of using high-dimensional data for national statistics comes from a partnership with private-sector mobile phone service pro 7  See http://nos.nl/artikel/2079372-helft-minder-verkeer-door-ijzel.html [November 2016]
From page 62...
... Marchetti and colleagues (2015) created estimates of poverty for small areas by blending mobile phone data with other data from the national statistical office in Italy.
From page 63...
... Finally, about 10 coders read through these remaining articles and identify and extract information relevant to the program, including date, personal information, and location. This information is then checked -- as noted above -- by conducting a survey with both law enforcement agencies and medical examiners to confirm the case is in fact an arrest-related death.
From page 64...
... RECOMMENDATION 4-1 Federal statistical agencies should systematically review their statistical portfolios and evaluate the potential benefits of using private-sector data sources. CHALLENGES TO USING PRIVATE-SECTOR DATA SOURCES FOR FEDERAL STATISTICS Given the many different data types shown in Table 4-1 (above)
From page 65...
... Thus, by looking at the difference in mobility between the hypothetical model without health alerts and actual mobility with the health alerts, Telefónica was able to gather information about the effectiveness in reduction of infectious diseases due to health alerts, which it subsequently shared with public agencies. In the second approach listed above, transfer of datasets is a sharing agreement that involves the physical transfer of databases to the statistical agency under a strict protocol that clearly specifies the terms and conditions and includes each party's responsibilities and penalties for not following the agreement.
From page 66...
... Public-private partnerships offer a number of potential benefits to statistical agencies in that they permit access to private data sources, but there are also important risks and challenges in using those sources. Most of the private data provided in some form to statistical offices from public-private partnerships contain important business data about a firm's customers and strategy that could have negative effects for the data provider if accidently released or breached.
From page 67...
... The statistical office would likely be unable to compensate the private firm sufficiently to keep it from also selling the index to other companies in the private sector.9 The second possibility is for a company to sell its raw credit card data to the statistical agency to analyze and combine with the agency's other information. In this approach, the company and the statistical agency could then each develop their own separate indexes, and the company could sell its index to others without necessarily revealing the same information the statistical agency would publish.
From page 68...
... Data Quality We began this chapter noting a wide range of domains in which alternative data sources have the potential to contribute to federal statistics, but these sources are not typically simple substitutes for federal surveys, and careful evaluations of quality are needed. Google Flu Trends was designed to predict influenza incidence reports from the Centers for Disease Control and Prevention (CDC)
From page 69...
... . Even seemingly objective and straightforward scanner data can be fraught with measurement issues (see Box 4-4)
From page 70...
... Federal statistical agencies should provide annual public reports of these activities. We provide some additional discussion of data quality issues for alternative data sources in Chapter 6, and the panel will address this issue more deeply in its second report.
From page 71...
... • If such access is sustained, how can federal statistical agencies detect changes in the data created by the data holders, which may affect statistical estimates?


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.