National Academies Press: OpenBook
« Previous: 4 Data Processing and Analytic Issues
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

5

Data Dissemination

This chapter looks at several aspects of the dissemination of American Community Survey (ACS) data. The first section describes the ACS data products and dissemination methods. The second section looks in depth at the dissemination challenges facing the ACS and includes the panel’s recommendations.

ACS DATA PRODUCTS AND DISSEMINATION METHODS

Based on the ACS, the Census Bureau publishes annual 1-year ACS estimates for geographic entities with populations of at least 65,000, 3-year estimates for geographic entities with populations of at least 20,000, and 5-year estimates for all statistical and legal entities, including areas as small as census block groups. Table 5-1 provides an overview of the data that have been released between 2006 and 2013 by type of estimate and population threshold.

After the data are edited and any necessary imputation and weighting procedures are completed, they are reviewed by the Census Bureau’s Disclosure Review Board (DRB) to ensure that any data products released will maintain the confidentiality of individual responses. The DRB reviews the data product specifications of what characteristics will be included at what level of geography: it may require revisions to the specifications if the sample size or population size in a geographic area is small and could lead to the disclosure of individual respondents’ identities. The 1- and 3-year data are also reviewed for precision, and tables are only produced if the sample size is sufficiently large to support statistically precise estimates.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

TABLE 5-1 ACS Data Availability by Type of Estimate

Data Product Population Threshold Year of Release
2006 2007 2008 2009 2010 2011 2012 2013
1-Year Estimates 65,000+ 2005 2006 2007 2008 2009 2010 2011 2012
3-Year Estimates 20,000+     2005-2007 2006-2008 2007-2009 2008-2010 2009-2011 2010-2012
5-Year Estimates All areas         2005-2009 2006-2010 2007-2011 2008-2012

SOURCE: American Community Survey Design and Methodology Report; available at https://www.census.gov/acs/www/methodology/methodology_main/ [September 2014].

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

Estimates based on the 5-year data are released for all geographic areas, regardless of sample size, as long as they pass the DRB review with regard to confidentiality disclosure.

The variety of geographic areas for which data products are available are defined with the goal to meet the most important data user needs. The geographic areas include legal, administrative, and statistical areas, such as states, American Indian and Alaska Native areas, counties, minor civil divisions, incorporated places, congressional districts, block groups, census tracts, and census designated places. The Census Bureau works with state and local governments to define the boundaries of geographic areas. The Census Bureau’s Geography Division updates the boundaries of legal areas (e.g., incorporated places) to reflect such changes as annexations, detachments, or mergers with other areas. The annual ACS estimates are produced on the basis of the geographic boundaries as of January 1 of the sample year, while the multiyear estimates reflect the boundaries as of January 1 of the final year of data collection.

The initial ACS data products were designed to be comparable to the census long-form data products, and they have undergone only relatively minor revisions based on feedback provided by data users. In recent years, as part of a comprehensive program review, the Census Bureau sponsored several data user workshops, with both federal and nonfederal data users, including a workshop of nonfederal data users (National Research Council, 2013). In 2013 a new, externally managed ACS Data User Group (ACS DUG) was formed with the goal of providing a platform for information exchange related to the data. The ACS DUG also held a data user workshop in 2014, and it is expected to provide further input to the Census Bureau on data user needs. These efforts have already enriched understanding of the many uses of the survey and pinpointed a few areas for improvement. However, a systematic evaluation of the use of the various data products has never been conducted.

Main Data Products

A large volume of data products are available based on the ACS, ranging from tables targeted at users who just need to find a quick estimate for a geographic area to the Public Use Microdata Sample (PUMS) files for more advanced users who want to create their own estimates. The range of products is modeled primarily on what was available from the decennial census long-form survey. It is important to note that not all releases include all of these products. For example, the 5-year data release does not include comparison profiles, state ranking tables, or selected population profiles.

Some of the key products are summarized in Table 5-2 and described below.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

TABLE 5-2 Key American Community Survey (ACS) Data Products

Data Product Description
Data Profiles Broad social, economic, housing, and demographic profiles
Narrative Profiles Summary of the information in the data profiles using concise, nontechnical text
Selected Population Profiles Broad social, economic, and housing profiles for a large number of race, ethnic, ancestry, and country or region of birth groups
Ranking Tables State rankings of estimates across 86 key variables
Subject Tables Similar to data profiles (above) but include more detailed ACS data, classified by subject
Detailed Tables The most detailed tabular ACS data and cross-tabulations of ACS variables
Geographic Comparison Tables Comparison of geographic areas other than states (e.g., counties or congressional districts) for key variables
Thematic Maps Interactive, online maps that can be used to display ACS data
Custom Tables Rows of data from the ACS detailed tables that can be specified and extracted by users
Summary Files Detailed tables that are accessed through a series of comma-delimited text files on the Census Bureau’s file transfer protocol site
Public Use Microdata Sample Files ACS microdata that can be accessed by data users with SAS and SPSS software experience

SOURCE: U.S. Census Bureau Data Product Descriptions, available at http://www.census.gov/acs/www/data_documentation/product_descriptions/ [September 2014].

  • Data profiles are high-level reports of demographic, social, economic, and housing characteristics for a given geographic area. The Census Bureau publishes a comparison profile that compares the sample year’s estimates with estimates from the 4 previous years. The profiles also include the margins of error of the estimates.
  • Narrative profiles are descriptive reports based on the data profiles. They summarize information using nontechnical language and graphics on 15 topics for a geographic area.
  • Selected population profiles provide some of the characteristics from the data profiles for specific population groups. These products are provided for 1- and 3-year estimates.
  • Ranking tables provide state rankings for approximately 90 estimates. These tables are produced based only on the 1-year data.
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
  • Subject tables are similar to data profiles, but they include more detailed information on frequently requested topics, such as education attainment by race and age. Approximately 70 subject tables are produced each year.
  • Detailed tables provide distributions and cross-tabulations of demographic, social, economic, and housing characteristics, and they are the foundation for other data products. The tables display the estimates, along with the associated margins of error. There are more than 1,470 detailed tables based on the 2012 1-year data alone.
  • Geographic comparison tables contain the same estimates as the ranking tables, as well as an additional 100 demographic measures, for states and some substate geographies. These tables are produced based on both the 1-year and the multiyear datasets.
  • Thematic maps show mapped values for geographic areas.
  • Custom tables are tables produced by the Census Bureau on a cost-reimbursable basis, to meet data user needs that are not met with the existing products.
  • Summary files are comma-delimited text files that provide access to all detailed tables based on 1-, 3-, and 5-year estimates. These can be viewed using a spreadsheet or statistical software.
  • PUMS files contain samples of individual records, with identifying information removed.

Microdata Access

Microdata access to individual records is provided through the PUMS files. PUMS files are extracts from the microdata file, which have undergone disclosure avoidance review and enable researchers to create custom tables that are not otherwise available. The extracts contain all characteristics data available in the full microdata file, but the only geographic information is region, division, state, and Public Use Microdata Area (PUMA). PUMAs are nonoverlapping areas that partition a state and contain populations of 100,000 or more. PUMS files are available based on each of the 1-, 3-, and 5-year datasets: the multiyear PUMS files consist of the combined annual PUMS files. The main limitation of PUMS files is that the level of geographic detail is not refined enough for many data applications. Moreover, PUMAs often do not coincide with geographies of interest for many data users.

Data Dissemination Methods

The primary dissemination mechanism for tables and maps is the American FactFinder (AFF). Summary files and PUMS files are available through the Census Bureau’s File Transfer Protocol (FTP) site, which allows users

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

TABLE 5-3 Current Data Dissemination Methods

Methods Description Data Products Available
American FactFinder Web access tool for American Community Survey data products Detailed tables, data profiles, selected population profiles, subject tables, geographic comparison tables, 1-year ranking tables, 1-year comparison profiles
Summary Files and Public Use Microdata Sample (PUMS) Files Web links for direct access to data files Summary files and PUMS files
DataFerrett Data analysis and extraction tool with recoding capabilities Summary files and PUMS files
File Transfer Protocol Site Site that allows users to download data for analysis Summary files and PUMS files
Application Programming Interface Interface that lets developers create custom web and mobile apps 5-year summary files, 1-year data profiles for congressional districts
Easy Stats Interactive tool that lets users search for select statistics by geography 5-year summary files, 1-year data profiles for congressional districts
QuickFacts Summary profiles showing frequently requested data items for the nation, states, counties, and places 5-year data profiles
Dwellr Mobile application that allows users to find places based on preferences they specify 5-year summary files
POP Quiz App that tests statistical literacy 5-year summary files

to download data as Excel, PDF, or text files, and DataFerrett, which is an analysis tool that also offers recoding capabilities. In recent years, the Census Bureau has added a series of new dissemination methods, focused on new technologies. They include the Application Programming Interface (API), which allows web users and developers to design new ways to access and present data. The Easy Stats is one such app based on the API.1

Table 5-3 shows the data products available through the main current dissemination methods. Block-group level data are available from the FTP site, DataFerrett, and API. Tract-level data are available from AFF, the FTP site, DataFerrett, and API.

______________

1A list of additional APIs is available at http://www.census.gov/data/developers/data-sets.html.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

DATA PRODUCT AND DISSEMINATION CHALLENGES

Rates with Zeros and Small Numbers

As outlined above, a major portion of ACS data releases take the form of tables of rates or proportions, typically expressed as percentages. Because these rates are based on samples, they are subject to sampling error, whose likely magnitude is represented by a published margin of error, which represents the half-width of a 90 percent confidence interval. This method works well for some estimates, but for many others, especially those with small proportions, it results in a confusing and uninformative presentation. In this section, we describe this problem and suggest some directions for solutions.

Computing standard errors and confidence intervals for a rate or proportion, p, is one of the oldest problems in statistics and has been a subject of ongoing research (for a review, see Brown et al., 2001). Unfortunately, no one solution handles every case of this complex problem. In particular, the coverage properties differ depending on the sample size n and whether p is near to or far from the boundary values of 0 or 1. Approaches for the ACS are further complicated because the sample under consideration does not constitute a simple random sample.

In the context of a simple random sample, the standard (maximum likelihood and unbiased) estimate for a binomial proportion p is images = X / n, where X denotes the total number of successes, and n is the sample size (i.e., the sample proportion of successes). The standard error (SE) for images is estimated by images , and its square is an unbiased estimator of variance (ignoring finite population corrections). For large sample sizes, a symmetric 100(1 – α) percent confidence interval can be expressed as

images

where za/2 denotes the 100(1 – α/2) percentile of the standard normal distribution. This interval is typically justified by the normal approximation (central limit theorem) to the binomial distribution.

This estimated SE for images under simple random sampling is not directly applicable to ACS estimates, as the ACS sample is collected under a complex design and further adjusted with calibration weights. Instead, the direct variance estimates are computed using the successive difference replication method (Wolter, 1984; Judkins, 1990; Fay and Train, 1995), as described and summarized in U.S. Census Bureau (2009). To obtain intervals with coverage close to the nominal level, adjustments are needed to reflect the design-based variance estimates. One such adjustment adapts the standard intervals under simple random sampling by replacing the observed sample

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

size n with the effective sample size, say n* (Gilary et al., 2012). Using n* in place of n attempts to account for the design effect, deff, where deff is defined to be the ratio of the variance of images under the complex sampling design to that under simple random sampling, and n* = n/deff.

Current Practices and Deficiencies

Confidence intervals for the ACS are currently reported using symmetric intervals as described above, characterized by their half-widths or margins of error as described by the U.S. Census Bureau (2009, Ch. 12). This approach of relying on normal or Wald approximations is problematic for constructing confidence intervals for small or large proportions (those close to 0 or 1), for two reasons. (The cases of p ≈ 0 and p ≈ 1 are essentially equivalent since one can replace a characteristic such as poverty with its complement, nonpoverty.)

The most obvious problem is that these intervals may include values that are outside “logical” boundaries (negative values or values that exceed 100%). Although the U.S. Census Bureau (2009, Ch. 12) cautions users to consider logical boundaries when creating confidence intervals, the crude approach of truncating the interval at 0 or 1 is also unsatisfactory as these intervals may include zero as a “plausible” value for a proportion even though respondents with the characteristic of interest were found in that area. A second, more subtle, problem is that due to the discreteness of rate estimates, which are ratios of counts, the coverage of these intervals is a discontinuous function of the population proportion. Brown et al. (2001) note this property for a variety of interval estimators with simple binomial data and suggest that approximating nominal levels for coverage averaged over a range of population proportions is a suitable criterion: however, this issue has not yet been well studied for data from complex survey designs.

In the extreme case where there are zero sample (observed) rates or counts, these standard approaches for constructing confidence intervals clearly break down. In such cases, computation of estimated SEs using images will result in an estimated SE of zero—even if n* is used in place of n. Furthermore, any symmetric interval around images = 0 will include negative values. Aside from estimated counts of zero, the approach currently used by the ACS makes use of only one form of the SE estimate, which is only valid for large samples and proportions close to 0.5. In the case of estimated counts of zero, ACS uses a model-based approach (see U.S. Census Bureau, 2009, pp. 12-4, 12-5).

Another issue arises in the case of zero rates of counts from the current ACS practice of assigning a coefficient of variation (CV) of 100 percent to any images estimated to be zero. If the median CV is greater than 61 percent for

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

the estimates in a given table, then that table is not released (U.S. Census Bureau, 2009, Ch. 13). This rule ignores the fact that some zeros are more informative than others. An estimate of zero successes when the sample denominator is large might provide powerful evidence that the rate in question is small, although it cannot establish that it is exactly zero. Conversely, a sampling zero with a small sample denominator might be consistent with a wide range of population proportions. Nevertheless, in both cases images = 0 and the same CV is assumed. The practice of filtering tables based on this rule removes information that is potentially useful for data users.

Presentation of Uncertainty

The measures of error now included in ACS data products are actually used in two distinct ways. First, they are used to provide interval estimates of the form (point estimate) ± margin of error (MoE). As discussed above, these symmetrical intervals have poor properties for proportions that are close to zero or one. The MoE has another purpose for which it is more suited, however, namely, providing information for use in aggregation across areas. The squared MoE is proportional to an approximately unbiased estimate of variance. The variance of the estimated rate for an aggregation of independently sampled areas (such as a nonstandard combination of tracts of interest to local users) is a weighted combination of the variances of estimates for the individual areas. That combination might be estimated with adequate precision even if the estimates are not very accurate for some of the component areas with small counts. Therefore, as long as users are creating their own aggregations, they need access to estimates of sampling variances. Alternatively, the ACS could make available an online analysis system that can calculate point and interval estimates for the desired aggregates.

Many or perhaps most users of tables, however, are interested in the accuracy of the individual estimates in the tables. For this purpose, a number of methods are available to generate sensible intervals, although more research is needed before they can be applied to ACS data. Brown et al. (2001) describe several methods for calculating confidence intervals for a rate (proportion) under simple random sampling, with desirable properties, including approximately nominal coverage, lying entirely within the logical range from 0 to 1, and including zero if and only if the observed sample rate is zero. Among the alternatives that are reasonably simple to implement are the score interval, believed to have been proposed by Wilson (1927), the interval of Agresti and Coull (1998), and the equal-tailed interval under a noninformative Jeffreys prior for a binomial proportion. For n ≤ 40, Brown et al. (2001) recommend using either the Wilson or Jeffreys interval: they indicate that the two intervals are similar in terms of absolute error. They

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

recommend the Agresti-Coull interval for n ≥ 40 as the easiest to present. Liu and Kott (2009) compare several alternative methods for constructing such intervals.

An alternative to these generic methods is to tailor the interval calculation to incorporate information from neighboring areas, previous time periods, or both. For example, a prior distribution might be defined for each tract reflecting the distribution of the rate in question over some collection of nearby or otherwise a priori similar tracts; the posterior credible intervals given the tract’s data could be reported and would possess the desirable properties listed above. This approach is essentially an application of small area estimation (see further discussion above). Importantly, further gains may be possible by leveraging strength through spatial, time series, and spatio-temporal models that incorporate exogenous information from multiple data sources, including administrative records. Gilary et al. (2012) compare the performance of some intervals constructed in this way to those constructed without a small area estimation component. However, the methods compared do not consider formal spatial or spatio-temporal models.

Constructing confidence intervals for proportions when the data arise from a complex survey is another area of ongoing research. For example, Slud (2012) examines methods for creating upper confidence bounds from several small area estimation models. Janicki and Malec (2013) consider a design-adjusted approach, which incorporates a probability model for the finite population along with information regarding the survey design.

Implementing any changes to the way measures of uncertainty are presented to data users would require testing and a minor redesign of some of the user interfaces; consequently, it is not cost neutral. However, exploring options for these types of improvements would be worthwhile because they would increase the clarity and value of the information presented to users.

RECOMMENDATION 19: The Census Bureau should continue research into alternative approaches for constructing and presenting measures of uncertainty for the American Community Survey that are suitable for data from complex survey designs and with small proportions or samples, with the objective of rapidly adopting new methods without the defects apparent in current practice.

RECOMMENDATION 20: The data disseminated from the American Community Survey should include both interval estimates (confidence or credible intervals) and approximately unbiased variance estimates, although the latter become less important if a suitable system for aggregation of estimates is introduced. (See also Recommendation 24.)

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

Volume of Data Products and Access Options

Although the frequency of the data releases is one of the main benefits of the ACS, the volume of data products based on the three datasets (1-year, 3-year, and 5-year estimates) can be overwhelming to users who are not very familiar with the range of products. In addition, most of the data products can be accessed through a variety of different means, and in navigating the Census Bureau’s website it is not always obvious which method is the most efficient to use for a particular purpose, further increasing confusion. Some of the dissemination methods (such as the DataFerrett and API) are primarily targeted at advanced users and are challenging to use without training. More importantly, as discussed below, the production of a large volume and wide range of products places significant burden on the ACS staff.

Data Suppression

Despite the apparent abundance of data products from the ACS, many of the actual estimates cannot be made available to data users due to the sample size limitations. As noted in Chapter 1, the 1- and 3-year ACS data releases are subject to population thresholds: 1-year estimates are only released for areas with populations of at least 65,000, and 3-year estimates are only released for areas with populations of at least 20,000. (There is no minimum population threshold for the 5-year estimates.) These thresholds were developed in the early ACS design stages, based on the assumptions available at the time about potential future sample sizes, and they have not been reexamined since then.

The population sizes used to apply the thresholds are determined based on the Population Estimates Program. This means that some areas could receive data based on the 1-year or 3-year threshold in a given year and not receive the same data products the next year. However, if data for an area were reported for a given year, then data are also published the following year, even if the population dropped below the threshold, as long as the drop was not more than 5 percent over the course of the 1 year.

Data Quality Filtering

In addition to the population thresholds, the 1- and 3-year estimates are also subject to data quality filtering. The data-quality filtering process identifies data products with the highest concentration of estimates that have low precision and prevents their publication. In the case of detailed tables, filtering is applied by calculating the median CV of all detailed lines in a table, excluding total and subtotal lines: a table is filtered out if the median CV is greater than 0.61. In a given table, only estimates at the low-

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

est level of detail are included in the calculation of the median CV. A cell with an estimate of zero is considered to have a CV of 1. In many cases, if a detailed table does not meet the data quality criteria, then a collapsed version of the same table may be available.

The impact of the filtering that is applied to the detailed tables is carried over to the products based on these tables. Ratio tables are filtered out if the numerator or denominator estimates are filtered out. Cells in data profiles, subject tables, ranking tables, and geographic comparison tables are filtered out if the data used as the source of the cells are filtered out, although tables with means can have some lines filtered out and some not filtered out. Subject tables featuring specific population groups (available above certain population thresholds) and “iterated” selected population profiles (population profiles reproduced for selected population groups), which are generated directly from microdata, are filtered the same way as detailed tables, except that filtering is applied to the subpopulation groups rather than the whole table. However, if half or more of the lines are filtered out in a selected population profile, then the whole table is filtered out. For derived measures, such as medians, aggregates, ratios, and rates, if the standard error is zero, then the cell is suppressed if the estimated weighted total of the universe is less than 3,000. A table containing multiple derived measures may be made available in part.

Table 5-4 shows the filtering rates by population size for the 2012 1-year data. Overall, 29 percent of the tables and 39 percent of the estimates were filtered out. A higher proportion of tables are filtered out for smaller geographic areas than larger geographic areas: the smallest areas that received 1-year data had 52 percent of their estimates filtered out; the rate was 9 percent for the largest areas.

Table 5-5 shows the filtering rates by population size for the 2010-2012 3-year data. The average filtering rate across all geographies that receive data is similar to the filtering rate for the 1-year data (29 percent of the tables and 38 percent of the associated estimates were filtered out). Again, the filtering rates decrease as population sizes increase. As one would expect, areas that are large enough to also receive (heavily filtered) 1-year data are somewhat less affected by filtering in the 3-year release, although the filtering rates are still high for most areas, and they are particularly high for the smallest areas.

Appendixes C and D contain further detail about the filtering rates in the 1-year data for 2012 and the 3-year data for 2010-2012. In both the 1-year and 3-year data, the estimate types that are most likely to be filtered out are population and household counts. In the 1-year data, the topics that are most affected by filtering include ancestry (77 percent), earnings (59 percent), citizenship (57 percent), occupation/industry (54 percent), income (53 percent), Hispanic origin (53 percent), and grandparents as

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

TABLE 5-4 Filtering Rates by Population Size, 1-Year Data for 2012

Population Size (thousands) Total Tables Tables Published Tables Filtered (%) Total Estimates Estimates Published Estimates Filtered (%)
65-100 1,751,234 1,067,379 39.0 39,909,829 19,291,764 51.7
100-125 1,966,257 1,287,854 34.5 44,809,734 24,043,292 46.3
125-150 1,208,010 825,828 31.6 27,530,081 15,719,855 42.9
150-200 1,152,557 824,418 28.5 26,266,161 16,039,225 38.9
200-250 375,531 279,605 25.5 8,558,133 5,578,457 34.8
250-500 621,152 496,951 20.0 14,155,726 10,318,896 27.1
500-1,000 889,384 763,853 14.1 20,267,929 16,472,240 18.7
1,000+ 372,576 345,598 7.2 8,491,263 7,734,926 8.9
Total 8,336,701 5,891,486 29.3 189,988,856 115,198,655 39.4

SOURCE: Table prepared by the Census Bureau at the panel’s request.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

TABLE 5-5 Filtering Results by Population Size, 3-Year Data for 2010-2012

Population Size (thousands) Total Tables Tables Published Tables Filtered (%) Total Estimates Estimates Published Estimates Filtered (%)
20-25 2,292,125 1,270,957 44.6 52,258,481 22,483,960 57.0
25-30 1,629,040 956,564 41.3 37,140,976 17,350,205 53.3
30-25 1,136,568 691,950 39.1 25,912,856 12,783,515 50.7
25-40 932,920 591,489 36.6 21,269,888 11,062,808 48.0
40-45 774,237 502,187 35.1 17,652,173 9,492,242 46.2
45-50 614,451 409,548 33.3 14,009,021 7,834,622 44.1
50-55 532,773 361,871 32.1 12,146,781 6,946,337 42.8
55-60 449,893 309,545 31.2 10,257,201 5,997,933 41.5
60-65 409,664 289,336 29.4 9,339,924 5,671,709 39.3
65-100 1,695,308 1,245,080 26.6 38,651,948 24,918,062 35.5
100-125 2,036,363 1,569,758 22.9 46,427,431 32,423,184 30.2
125-150 1,177,927 932,211 20.9 26,856,089 19,559,930 27.2
150-200 1,148,381 933,777 18.7 26,182,277 19,800,966 24.4
200-250 367,013 307,743 16.1 8,367,621 6,585,390 21.3
250-500 619,178 541,800 12.5 14,116,824 11,765,706 16.7
500-1,000 892,792 814,946 8.7 20,353,956 18,087,234 11.1
1,000+ 367,318 351,181 4.4 8,373,107 7,947,806 5.1
Total 17,075,951 12,079,943 29.3 389,316,554 240,711,609 38.2

SOURCE: Table prepared by the Census Bureau at the panel’s request.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

caregivers (52 percent). In the 3-year data, the topics most often filtered out are ancestry (75 percent), race (74 percent), earnings (58 percent), citizenship (57 percent), income (52 percent), and Hispanic origin (52 percent).

Estimates in tables by race and Hispanic origin (“iterated” tables) are more than twice as likely to be filtered out as estimates in tables for the full population. Filtering rates are extremely high for rate iteration groups with the smallest populations, such as American Indian and Alaska Native and Native Hawaiian and Other Pacific Islander, with nearly all iterated count table estimates filtered out in both the 1- and the 3-year data.

For many of the tables, the Census Bureau produces “collapsed” versions that contain fewer details and therefore are less likely to be filtered out. The filtering affects approximately one in five of the uncollapsed tables for which no collapsed table exists, close to one-half of the uncollapsed tables for which a collapsed table exists, and a little over one in four of the collapsed tables.

Suppression for Confidentiality Reasons

In addition to data quality filtering, some of the ACS data are suppressed because of confidentiality rules. The Census Bureau’s DRB reviews the data to ensure that the identity of an individual respondent could not be ascertained on the basis of the responses. The main DRB rules for the ACS data products are summarized below (U.S. Census Bureau, 2013):

  • For selected population profiles, there must be at least 50 cases in the geographic area. If not, the DRB requires complementary suppression on the other columns: in other words, the suppression of data in other columns to prevent users from deriving sensitive data from the nonsensitive data that would otherwise not be suppressed. In practice the ACS Office suppresses whole tables instead of performing complementary suppression.
  • Tables involving geographic areas other than current residence crossed with characteristics other than current residence must have at least 40 cases in the geographic area.
  • For means and aggregates, the estimate must be based on either zero cases or three or more cases in a geographic area. Again, the DRB requires complementary suppression if this requirement is not met, but in practice the ACS Office suppresses the whole table.
  • Tables with more than 100 independent lines cannot be released for block groups. In addition, some tables with sensitive topics cannot be released for block groups, even if they contain fewer than 100 lines (e.g., tables containing characteristics of people living in group quarters).
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

The filtering rules are applied at the table level, rather than at the cell level, so either the whole table is published for a geographic area or the whole table is filtered out. This means that more data are filtered out than is necessary on the basis of either data quality or confidentiality reasons. This practice simplifies the production process. It is also possible that the ACS Office assumes that the outcome is more convenient for users who might otherwise encounter suppressed cells scattered across many of the tables they are attempting to use. However, a systematic evaluation of how the data products are used has not been conducted, so it is unclear whether such an assumption is true for all users or even the majority of users.

It is likely that the current suppression practices that are due to concerns about the precision of the estimates are unduly limiting the analyses that can be conducted and therefore limiting the usefulness of the data. Although the precision concerns are certainly valid, if users could be provided with adequate information about data quality, then they could decide for themselves whether the data are suitable for their specific analytic needs. Indeed, as discussed above, in many cases the coefficient of variation is a poor yardstick for the usefulness of the estimates. Even if in many cases the answer is still “no” for data that would have been otherwise suppressed, there are likely to be situations where these data would be useful. In addition, access to currently suppressed data may enable and encourage users to develop new methodologies that result in more accurate 1- and 3-year estimates, for example, by iterating estimates from these files with information from other data sources. Although making additional data available to users would require a small redesign of some of the dissemination systems and processes, the change would increase the usefulness of the survey to data users.

RECOMMENDATION 21: The Census Bureau should revise the suppression practices for the American Community Survey: rather than suppressing data due to concerns about lack of precision, users should be provided with access to all data that pass confidentiality review. The Census Bureau will have to be proactive about user education and provide adequate information about the precision of the data to enable users to decide whether the data are suitable for use to meet their specific analytic needs.

As discussed above, the data releases are also subject to population thresholds that were developed when the ACS was first conceptualized (65,000 or more for 1-year estimates and 20,000 or more for 3-year estimates). Reexamining these thresholds with current sample sizes and data needs could also point to potential ways of increasing the usefulness and reach of the data.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

RECOMMENDATION 22: The Census Bureau should evaluate whether the data release population thresholds of 65,000 or more for 1-year estimates and 20,000 or more for 3-year estimates are still optimal for the American Community Survey. This question should be revisited periodically.

Production Burden

It is important to note that although the range of data products, access, and analysis options do not necessarily meet all data user needs, the production of the literally billions of “cells” is a very large undertaking. As detailed above, the ACS data products were initially designed to facilitate comparison with data products from the census long-form survey. After the first 1-year release, the products for the 3- and 5-year releases were added based on the same general model. With each new dataset, the volume of data products and the associated resource-intensive production activities grew.

Table 5-6 shows the timeline of the main production and dissemination activities over the course of a typical year. The overlaps are naturally

TABLE 5-6 Data Production Timeline and Activities

September October November December January
Release 1-year data Release 3-year data Prepare 5-year data release Release 5-year data Release 5-year PUMS files
 
Conduct 3-year data AFF-UAT Release 1-year PUMS files Produce 5-year PUMS files and start verification Release 3-year PUMS files Release 1-, 3-, and 5-year PRCS in Spanish
 
Prepare 3-year data release Conduct 5-year AFF-UAT   Submit product changes for next data year  
 
Conduct reasonableness review for 5-year data products
 
Produce 3-year PUMS files and start verification

NOTES: AFF-UAT, American Fact Finder User Acceptance Testing; PRCS, Puerto Rico Community Survey; PUMS, Public Use Microdata Sample.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

stretching the Census Bureau’s capacity, which raises concerns about the potential for errors to be introduced.

Because a thorough evaluation of how the data products are used has never been conducted, there is little information about which products are most useful and which may not be used at all. It is possible that the approach of publishing a very large number of tables for all possible combinations of variables that could be of interest is becoming increasingly obsolete for many users. Now that all of the ACS data products have been published for at least a few consecutive years, a formal evaluation of how the data products are used would provide information about which products are most useful, what might be missing, and whether there are any that could be dropped in order to reallocate resources to meet unfulfilled data user needs.

A mechanism for ongoing feedback from a data user group and subject-matter specialists is essential because what is useful could change as the ACS evolves and as policy needs change. Information about usage patterns on the Census Bureau’s website and download statistics could also be useful in determining whether there are tables that are rarely used. Secondary distributors of the data (such as the Inter-university Consortium for Political and Social Research, the Population Reference Bureau, and the Minnesota Population Center) can also provide insight into what products are used most frequently.

RECOMMENDATION 23: The Census Bureau should evaluate whether the current range of tables produced provides optimal value to data users and whether the table production could be limited to a core set in order to allocate resources for other projects.

User-Defined Estimates

As discussed in Chapter 4, the MoEs associated with many of the estimates for small areas and groups can be very large, which often makes the data unusable at the local level, even after 5 years of aggregation. The Census Bureau’s general guidance on this matter is to combine estimates across geographic areas or population subgroups to improve precision. However, the Census Bureau does not provide a tool to facilitate data aggregation, and even experienced data users struggle with calculating the MoEs for the aggregated estimates. While the instructions made available for performing the calculations are useful (U.S. Census Bureau, 2009), they are not always straightforward to implement, and the process is tedious and prone to error.

On the basis of information from data users, a web-based tool is needed to facilitate the construction of ACS tables for specified nonstandard com-

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

binations of geographic areas, possibly collapsing levels of variables in nonstandard ways, and, most importantly, to facilitate the calculation of the standard errors for the new estimates. Enabling data users to perform geographic aggregations or collapse categories on the 5-year data quickly and efficiently would greatly improve the ability of the ACS to meet the small area data needs of many users.

One option the Census Bureau could pursue for implementing this type of system would be to use the existing 5-year disclosure-reviewed tables as building blocks and perhaps integrate them with the existing dissemination modes, such as American FactFinder. Another option would be to design a more advanced system that relies on the microdata. Given that a replication method is used for variance estimation, the system would have to work with data that preserved replicate information to make the variance estimation possible.

As noted above, currently microdata access is available primarily in the form of PUMS files, which allows researchers to conduct custom analysis, but only for predefined PUMAs with populations of at least 100,000 or for larger geographic areas, such as states or the nation. In recent years, a small group of researchers in the Census Bureau has been working on developing a new dissemination tool to provide data users with the ability to conduct analyses on the microdata without access to the underlying data files. This Microdata Analysis System (MAS) is in its early design stages, but the basic features are based on the Advanced Query System (AQS), which was part of the American FactFinder for a limited time after the 2000 census. The AQS was not widely advertised because of concerns that it would overload the Census Bureau’s servers at the time. When a Census Bureau contract with IBM ended, the AQS was also terminated.

According to the current plans, the MAS would be integrated with the existing DataFerrett tool. The first iteration would enable users to generate special tabulations for nonstandard geographies or for phenomena that occur with low frequencies, possibly at the subcounty level, as long as the data have passed confidentiality disclosure review. The second iteration of MAS would be for Census Bureau staff, to run statistical models. The third iteration of the MAS could enable external users to run some statistical models as well. It is unclear what the parameters for this type of analysis would be, but it appears that as currently envisioned, the plans are fairly modest because it is assumed that researchers might still need to apply for access to a Research Data Center2 after perhaps testing their models using the MAS.

______________

2Research Data Centers are secure Census Bureau facilities where qualified researchers with approved projects receive restricted access to selected nonpublic Census Bureau data files.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

The development of a robust query tool for nonstandard user-defined analysis deserves serious consideration as part of the Census Bureau’s approach to data dissemination. Enabling data users to perform geographic aggregations or to collapse categories on the 5-year data would greatly improve the utility of the ACS, and it is a priority from a data user perspective. A next step would be to investigate the possibility of integrating the 1- and 3-year data into a MAS-type system.

As discussed, the current approach to data products generates an astounding volume of tables, yet for the 1- and 3-year data, approximately 30 percent of these tables and close to 40 percent of the estimates are suppressed. This approach to dissemination does not seem particularly efficient from the Census Bureau perspective or satisfying from a user perspective. If the Census Bureau did not suppress data due to precision concerns (see Recommendation 21, in Chapter 4), then one option for making all of the data that passed confidentiality review available to users would be through a query system.

In the long run, data users would benefit most from a query system that had more flexibility for performing analyses based on the underlying continuous data. One option for increasing the flexibility of the aggregations would be to examine the possibility of adding new, higher-level geographies, which could be larger than tracts but smaller than PUMAs, to address the need for generating estimates with higher levels of precision for reasonably small geographic areas. The Census Bureau could also consider adding additional features to a robust microdata analysis tool, such as regression analysis capability.

Data user needs would have to be systematically evaluated, but it appears that those who have limited experience with ACS products have difficulty navigating the many options to determine which data products best meet their needs, while more advanced users feel constrained by the limited flexibility and features associated with the current tables. The overarching goal of the query system would be to make typical computations with ACS data easy for users, whether these analyses are aggregated counts or regression models. A long-term strategy could involve limiting the production of tables to a core set of the most useful tables (see Recommendation 23, above) and relying on the query system to meet the needs of more sophisticated users.

The Census Bureau’s current efforts to develop MAS are being spearheaded by the Center for Disclosure Avoidance Research and Data Web and Applications staff. To maximize that value of such a system to ACS data users, the ACS staff would need to take the lead in developing the query system, working in collaboration with other Census Bureau offices, including subject-matter experts. The active involvement of a data users’

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×

working group, State Data Centers,3 and experts in user interface would also be essential in the development of the specifications for the system.

The panel acknowledges that the development of these tools would require a substantial investment of resources, but for data users, these types of features appear to be the most valuable. These dissemination tools can also have long-term payoff for the Census Bureau, not only in terms of stakeholder satisfaction, but possibly also in the form of increased use of the data.

RECOMMENDATION 24: As a priority, the Census Bureau should develop a tool that enables data users to aggregate geographies and collapse categories, as well as to calculate the standard errors for the new estimates. To support a greater range of analyses, a microdata access system with additional capabilities should also be considered. The American Community Survey (ACS) Office should take the lead in developing these tools, working in collaboration with other Census Bureau offices. The Census Bureau should also involve a working group of ACS data users, State Data Centers, and user interface experts from the early stages of the process.

______________

3State Data Centers are partnerships between states and the Census Bureau created with the goal to make data available locally to the public through a network of state agencies, universities, libraries, and regional and local governments.

Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 78
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 79
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 80
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 81
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 82
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 83
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 84
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 85
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 86
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 87
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 88
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 89
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 90
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 91
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 92
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 93
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 94
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 95
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 96
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 97
Suggested Citation:"5 Data Dissemination." National Research Council. 2015. Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities. Washington, DC: The National Academies Press. doi: 10.17226/21653.
×
Page 98
Next: 6 Survey Content »
Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities Get This Book
×
 Realizing the Potential of the American Community Survey: Challenges, Tradeoffs, and Opportunities
Buy Paperback | $56.00 Buy Ebook | $44.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The American Community Survey (ACS) was conceptualized as a replacement to the census long form, which collected detailed population and housing data from a sample of the U.S. population, once a decade, as part of the decennial census operations. The long form was traditionally the main source of socio-economic information for areas below the national level. The data provided for small areas, such as counties, municipalities, and neighborhoods is what made the long form unique, and what makes the ACS unique today. Since the successful transition from the decennial long form in 2005, the ACS has become an invaluable resource for many stakeholders, particularly for meeting national and state level data needs. However, due to inadequate sample sizes, a major challenge for the survey is producing reliable estimates for smaller geographic areas, which is a concern because of the unique role fulfilled by the long form, and now the ACS, of providing data with a geographic granularity that no other federal survey could provide. In addition to the primary challenge associated with the reliability of the estimates, this is also a good time to assess other aspects of the survey in order to identify opportunities for refinement based on the experience of the first few years.

Realizing the Potential of the American Community Survey provides input on ways of improving the ACS, focusing on two priority areas: identifying methods that could improve the quality of the data available for small areas, and suggesting changes that would increase the survey's efficiency in responding to new data needs. This report considers changes that the ACS office should consider over the course of the next few years in order to further improve the ACS data. The recommendations of Realizing the Potential of the American Community Survey will help the Census Bureau improve performance in several areas, which may ultimately lead to improved data products as the survey enters its next decade.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!