The National Academies Press

Currently Skimming:

7 Statistical Analysis of Observational Data
Pages 118-161

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 118... ... These decompositions are basically descriptive but are nevertheless an important tool for understanding what factors are related to observed differences as well as for measuring the magnitude of racial differences. In the next section, we continue with an outline of the fundamental issues that must be addressed to draw causal inferences about racial discrimination from statistical analyses of observational data. Read the entire page →
From page 119... ... This mathematical presentation is necessary to make clear what statistical decompositions of racial differences measure. It is also needed for precision regarding the role of models as descriptions of the ways in which outcomes are determined in the presence of discrimination, the role of models and assumptions in drawing causal inferences regarding discrimination from observational data, the nature of the biases that arise when those assumptions are violated, and the ways in which alternative study designs can reduce those biases. Read the entire page →
From page 120... ... Overall, the lack of credence given by courts to statistical evidence and the complexities of drawing inferences about racial discrimination from such data appear to be detrimental to plaintiffs. In both periods examined by Nelson and Bennett, plaintiffs lost to defendants more than three times to one, and it is becoming increasingly more difficult for plaintiffs to convince courts that their claims are valid. Read the entire page →
From page 121... ... is the contribution of group differences in the observed characteristics X to the race gap in Y For example, studies of the wage gap Read the entire page →
From page 122... ... . Race-Specific Regression Models The above description covers a great deal of early research on discrimination that served as the basis for further work on measuring discrimination and explaining racial differences in a variety of social, political, and economic outcomes.1 For the past 30 years, however, researchers have used a more general statistical model of such differences (or gaps) Read the entire page →
From page 123... ... is sometimes referred to as the "share due to discrimination." This is misleading terminology, however, because if any important control variables are omitted, one or more of the coefficients, including the intercept, will be affected. The second component therefore captures both the Read the entire page →
From page 124... ... the effects of discrimination. On the other hand, omitted variables that are correlated with X will influence the coefficients , potentially caus Read the entire page →
From page 125... ... That is, one cannot distinguish the contribution to the overall unexplained gap of racial differences in the coefficients on region of the country from the contribution of racial differences in the coefficients on city size. See Jones (1983) Read the entire page →
From page 126... ... The second term is the effect of changes over time in the coefficients for group 1, holding differences in observed characteristics fixed. These first and second terms' two factors capture the change over time in the ex plained portion of the wage gap that would be expected given changes in the characteristics of the two groups and the coefficients on those char acteristics for whites in periods t and t. Read the entire page →
From page 127... ... basic results indicates what one can learn from their type of analysis. Using data from the Current Population Survey, they find that, between 1979 and 1987, changes in levels of education and experience reduced the black�white wage gap (in logs) Read the entire page →
From page 128... ... (2002) standardize for the effects of income by reweighting the sample of whites to have the same income distribution as the sample of blacks.2 Summary: Decomposition and Residual "Effects" as Racial Discrimination The use of multivariate regression and related techniques to decompose racial differences in some outcome of interest into a portion due to differences in the distribution of observed characteristics and a portion not explained by those characteristics is an essential tool for describing racial differences. Read the entire page →
From page 129... ... Finally, we discuss two of the most important sources of bias in observation studies of discrimination -- omitted variables bias and sample selection bias. Developing Statistical Models According to Sir Ronald Fisher, as quoted by Cochran (1965:252) Read the entire page →
From page 130... ... Below we discuss some of the specific issues that must be addressed in such models and their assumptions to draw causal inferences. Example: Hiring Decisions in the Labor Market In this section, we lay out a generic framework that underlies many statistical approaches to measuring discrimination. Read the entire page →
From page 131... ... has the virtue of specifying precisely what the decision criteria would be for a rational firm seeking to hire the most productive candidates. In particular, a rational firm will base its hiring decisions on the expected productivity of the applicants, given the information it has, X1 and X2. Read the entire page →
From page 132... ... We now turn to the hiring decision itself. For a rational, nondiscriminating firm, the hire probability is an increasing function of E(P \| X1,X2) Read the entire page →
From page 133... ... , it will enter with a coefficient of zero, and the researcher will typically find that for a nondiscriminating firm there is no evidence of a difference in the hiring rates of members of different racial groups who have the same values of X1 and X2. (To focus on the key ideas, we assume throughout this section that samples are large enough that we can ignore sampling error in estimates.) Read the entire page →
From page 134... ... First, the researcher must have a solid understanding of how the firm would behave in the absence of discrimination. The above model assumes that a nondiscriminating firm would hire on the basis of expected productivity in the firm. Read the entire page →
From page 135... ... What would the sources of such knowledge be? To continue with the hiring example, in some relatively rare situations the researcher may have deep knowledge of how hiring decisions are made and have access to nearly the same information as the firm (see the example in Box 7-3) Read the entire page →
From page 136... ... For example, the authors did not have access to data reflecting candidates' performance during interpersonal interactions with screeners, either on the telephone or in person. However, the interviews with the screening personnel and the detailed data collected from the application forms allow for a much closer match between the statistical models used and the process under study than is typical in observational studies. Read the entire page →
From page 137... ... . PROBLEMS WITH MEASURING DISCRIMINATION BY FITTING STATISTICAL MODELS TO OBSERVATIONAL DATA In addition to the concept of manipulability discussed in Chapter 5, any causal analysis that fits multiple regression models to observational data must address several issues. Read the entire page →
From page 138... ... Aggregating the data across departments, however, the college-wide admission rates are 0.59 for individuals from racial group I and 0.67 for individuals from racial Omitted variables bias poses a serious problem for the large share of studies of racial differences in surveys (e.g., the Current Population Survey or decennial census long-form sample) having only a limited set of the characteristics that may reasonably factor into the processes under study. Read the entire page →
From page 139... ... More generally, this example demonstrates the complexity and subtlety of analyses of the presence of discrimination and the need to carefully scrutinize statistical models used for this purpose. that difference into a portion that reflects discrimination and a portion that reflects the association between race and omitted variables that also affect the outcome. Read the entire page →
From page 140... ... The discussion below provides a foundation for possible solutions to the problem of omitted variables bias. Omitted variables affect the estimation of as follows.7 The researcher attempts to estimate by regressing y on X1 and R Read the entire page →
From page 141... ... One would need to know the effect of discrimination at the hiring stage on the racial mix of who applies. POSSIBLE SOLUTIONS TO PROBLEMS OF USING STATISTICAL MODELS TO INFER DISCRIMINATION There are many situations in the social sciences in which the researcher is confronted with an omitted variables problem that is parallel to that discussed above. Read the entire page →
From page 142... ... . Given our inability to manipulate race in observational studies, what can be done about omitted variables and sample selection bias? Read the entire page →
From page 143... ... Consequently, the firm has nothing to gain from resorting to statistical discrimination on the basis of R (See Chapter 4 for a discussion of statistical discrimination.) Read the entire page →
From page 144... ... In other circumstances, productivity data do not solve the omitted variables problem. If the firm does not statistically discriminate and the special condition of equation (7.15) Read the entire page →
From page 145... ... We want to emphasize that, even here, strong assumptions about how the hiring process operates must be made to infer the effect of race on hiring decisions. In Annex 7-1, we consider how productivity data can be used to detect adverse impact discrimination, which we define as adopting hiring criteria in ways that are not justified by productivity considerations and that are harmful to a minority group. Read the entire page →
From page 146... ... As compared with multiple regression, matching methods reduce the risk of imposing an inappropriate functional form on the relationship between the outcome y and the observed covariates. Multiple regression models use all the data. Read the entire page →
From page 147... ... However, these methods do not help with the key problems of omitted variables bias or sample selection bias because matching is performed on the basis of observed variables only.8 In the same spirit as matching, stratification on relevant variable(s) can also be used to achieve some measures of control on nonracial factors. Read the entire page →
From page 148... ... . Natural Experiments Another approach to addressing the problem of omitted variables and limited understanding of how a nondiscriminating firm would make decisions is to exploit so-called natural experiments. Read the entire page →
From page 149... ... Because there is some degree of control, the assumptions made for natural experiments to support a causal inference need not be as strong as those required for uncontrolled observational studies; however, natural experiments fall short of randomized controlled experiments. (For more detail on natural experimental designs, see Campbell and Stanley, 1963; Meyer, 1995; and Shadish et al., 2002.) Read the entire page →
From page 150... ... Basically, Goldin and Rouse used regression models to estimate whether an individual advanced from one round of auditions to the next and whether an individual was hired in the final round as a function of three things: (1) type of audition (blind versus not blind) Read the entire page →
From page 151... ... The authors used multivariate regression models to implement a triple-differencing strategy to distinguish the effect of deregulation from fixed characteristics of states and wage and employment trends at the state level that happen to be correlated with deregulation. The strategy amounts to taking the difference between the growth in wages of bank and nonbank employees in states that undergo deregulation at a certain point in time and comparing it with the corresponding difference in wage growth rates for bank and nonbank employees in states that did not undergo deregulation at that point in time. Read the entire page →
From page 152... ... provide some examples of natural experiments in the education domain that can be used in research to examine the effect of racial differences in educational inputs on relative outcomes. One type of natural experiment in the education domain looks at discriminatory educational policies and practices and assesses their effects on education outcomes. Read the entire page →
From page 153... ... . Holzer and Ludwig conclude that natural experiments are valuable tools for determining whether observed racial differences in inputs constitute racial discrimination and for measuring the effects of such differences. Read the entire page →
From page 154... ... (See Holzer and Ludwig, 2003, on the use of natural experiments to study discrimination; see Shadish et al., 2002, and Meyer, 1995, for a general discussion of the strengths and weaknesses of these designs.) Summary of Possible Solutions to Problems of Using Statistical Models to Infer Discrimination It should be obvious that more accurate and complete data collection efforts are critical to reducing the key problem of omitted variables bias. Read the entire page →
From page 155... ... The reason is that past discrimination led disadvantaged racial groups to be underrepresented among the pool of potential referrers, thus reducing the chances of attracting disadvantaged racial groups through referrals. To measure the effect of both past and current discrimination on current outcomes in this dynamic context, the researcher must model the effect of past discrimination on current X variables. Read the entire page →
From page 156... ... However, if there is racial discrimination in the educational domain, controlling education will understate the total effect of all racial discrimination in analyses of labor market discrimination alone. Developing and validating statistical models of these broader processes is one way to gain insight into the presence or absence of discrimination in these other areas. Read the entire page →
From page 157... ... If the firm obeys the law, it will not apply the interaction variable in making decisions about hiring, and the interaction variable will not enter significantly into hiring decisions. (The interaction will show up in a productivity regression.) Read the entire page →
From page 158... ... However, such decompositions using data sets with limited numbers of explanatory variables, such as the Current Population Survey or the decennial cen sus, do not accurately measure the portion of those differences that is due to current discrimination. Matching and related techniques provide a useful alternative to race gap decompositions based on multivariate regression in some circumstances. Read the entire page →
From page 159... ... Data on performance relevant to a particular domain, such as productivity in the labor market context or academic success in the educational arena, are extremely valuable in dealing with the problem of omitted variables bias, in permitting the testing of key assumptions of a statistical model, and in studying adverse impact discrimination (see Annex 7-1 below) Read the entire page →
From page 160... ... Conclusion: Despite limitations, natural experiments -- in which a legal change or some other change forces a reduction in or the complete elimination of discrimination against some groups -- can provide useful data for measuring discrimination prior to the change and for groups not affected by the change. Recommendation 7.1. Read the entire page →
From page 161... ... Unfortunately, even a noisy indicator of productivity is unavailable in most of the data sets used to study racial differences. Read the entire page →

From page 118...

... These decompositions are basically descriptive but are nevertheless an important tool for understanding what factors are related to observed differences as well as for measuring the magnitude of racial differences. In the next section, we continue with an outline of the fundamental issues that must be addressed to draw causal inferences about racial discrimination from statistical analyses of observational data.

7 Statistical Analysis of Observational Data Pages 118-161

7 Statistical Analysis of Observational Data
Pages 118-161