Skip to main content

Currently Skimming:

Appendix F: Methods for Treating Missing Data
Pages 433-454

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 433...
... item nonresponse, where a household provides responses for some but not all items on the questionnaire, ant!
From page 434...
... It is generally the case that the ciata that one receives from responclents are different clistributionally from the data that would have been provided by nonresponclents. This is why so-callecl completecase analysis is problematic, since the restriction of the analysis to those cases that have a complete response fails to aclequately represent the contribution from those that have missing data.
From page 435...
... Tow. In other words, improper treatment of missing data in the estimation of variances will result in statistics that are represented!
From page 436...
... into three categories: (1) missing data are missing completely at random in this case the indicator variable for whether or not a response is proviclec!
From page 437...
... not result in a statistical bias for assessment of average commuting distance. On the other hancI, if these missing values are missing at random but are not missing completely at random in particular, that conclitional on income, nonresponse for commuting distance is inclepenclent of other items on the census questionnaire then using a hot fleck procedure to substitute random commuting distances from other respondents with the same income level wouIc!
From page 438...
... In aciclition, failure to correctly moclel the nonresponse, say by using the assumption that the missing data are missing completely at random when they are only missing at ranclom, or by conditioning on the wrong variables when assuming that missing values are missing at random, not only causes bias in the estimates, but also causes bias in estimates of the variance. F.1.b Implementation Considerations A key constraint concerning treatment of missing values in the decennial census is that the Census Bureau needs to provide data products (population counts, cross-tabulations, averages, and public use microclata samples)
From page 439...
... In aciclition to the multivariate aspect of missing ciata, especially in the Tong-form sample, there are computational considerations that have historically helpec! to select the missing data treatments.
From page 440...
... to weight Tong-form-sample population counts to complete population counts as a variance reduction technique that also treats unit nonresponse. With respect to item nonresponse, the Census Bureau makes use of sequential hot-cleck imputation.
From page 441...
... The U.S. Census Bureau's sequential hot-cleck imputation methodology is somewhat relater!
From page 442...
... other summary statistics, such as means, appropriately represent the contribution to variance from unit nonresponse, assuming the missing-at-ranclom assumption is reasonable. However, no ciata products from the clecennial census Tong-form sample inclucle an estimate of the contribution to variance from item nonresponse.
From page 443...
... for multivariate clepenclence through use of the matching variables ant! the various imputation rules.
From page 444...
... Finally, the most important deficiency with the current method that the Census Bureau uses to address item nonresponse is the failure to represent this nonresponse in its estimates of the variance of its data products. This failure is probably negligible for much complete-count-based statistical output, but will not be negligible for many long-form-sample-based statistics nationally, and for many other long-form-sample statistics for local areas or subpopulations.
From page 445...
... be relatively easy to implement is that of fractionally weighted imputation, proposed by Fay (1996) , in which for some small integer m, m matching donors provide imputations, which are averaged, each receiving weight 1/m.
From page 446...
... With a proximate matched donor, the imputation is likely to have minimal bias unless the matching variables are poorly chosen but it can have a substantial variance. The reason for the opportunity for additional bias with the model-based approach is that a model can provide poor imputations not only if the covariates are chosen poorly (in the same way as if matching variables are chosen poorly)
From page 447...
... Thibaucleau's framework also provides immediate variance estimation through use of the posterior predictive distribution from his moclel, which also provides a basis for evaluating his procedure in comparison with sequential hot-cleck imputation. The entire fitting process took 12 hours on a computer that can currently be emulatecl by many standard desktop computers; however, the population size of the ciress rehearsal was only one-two thousandths that of the Uniter!
From page 448...
... Then the EM algorithm can be informally described, for well-behaved data distributions, as an iterative application of the following two steps, continuing until convergence: E-Step: Fill in missing portions of the sufficient statistics due to the missing data with their expectation given the observed data and the current estimated values for the parameters of the data distribution M-Step: Using these estimated sufficient statistics, carry out a standard maximum-likelihood calculation to update the estimates of the parameters of the data distribution. SO, very crudely, the parameters help one to identify the contribution of missing data to the sufficient statistics, and then the parameters are reestimated given the updated estimated sufficient statistics.
From page 449...
... (2001) propose repeating the process of randomly drawing from the posterior predictive clistribution for the missing values to form multiple imputations for variance estimation.
From page 450...
... the wicle acceptance by users of acquiring ciata sets electronically rather than in paper version reduce the relevance of this criticism. Another objection to the use of multiple imputation was raiser!
From page 451...
... Given the degree of item nonresponse in the 2000 census Tongform sample, the issue of variance estimation to incorporate the variance clue to missing values is the key missing value problem facing the Census Bureau heacling into either the implementation of a 2010 Tong-form sample or the ACS. As cliscussec!
From page 452...
... examine the quality of the resulting imputations in comparison to the values omitted. Second, to see whether the mechanism for census long-form-item nonresponse is ignorable, use reinterview studies ant!
From page 453...
... PUMS for the same purpose. Finally, with respect to shifting long-form-type data collection from the decennial census to the ACS, it is useful to point out that the missing data problem is somewhat different for that survey.
From page 454...
... . The Census Bureau distinguishes between "imputation" (or "allocation" in the Bureau's terminology)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.