Skip to main content

Currently Skimming:

Reference Guide on Multiple Regression--Daniel L. Rubinfeld
Pages 303-358

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 303...
... 322 1.  What evidence exists that the explanatory variable causes changes in the dependent variable?
From page 304...
... Specifying the regression model, 337 2. Regression line, 337 C. Interpreting Regression Results, 339 D .
From page 305...
... Salary would be the dependent variable to be explained; the years of experience would be the explanatory variable. Multiple regression analysis is sometimes well suited to the analysis of data about competing theories for which there are several possible explanations for the relationships among a number of explanatory variables.3 Multiple regression typically uses a single dependent variable and several explanatory variables to assess the statistical data pertinent to these theories.
From page 306...
... in forecasting what a particular effect would be, but for an intervening event. In a patent infringement case, for example, a multiple regression analysis could be used to determine (1)
From page 307...
... . 9. Multiple regression analysis was used in suits charging that at-large areawide voting was instituted to neutralize black voting strength, in violation of section 2 of the Voting Rights Act, 42 U.S.C.
From page 308...
... ultiple regression analyses, designed to determine the effect of several independent variables on a dependent variable, which in this case is hiring, are an accepted and common method of proving disparate treatment claims."14 However, the court affirmed the district court's findings that the "E.E.O.C.'s regression analyses did not ‘accurately reflect Sears' complex, nondiscriminatory decision-making processes'" and that the "‘E.E.O.C.'s statistical analyses [were] so flawed that they lack[ed]
From page 309...
... Kaye & David A Freedman, Reference Guide on Statistics, Section V.B.3, in this manual.
From page 310...
... One must also look for empirical evidence that there is a causal relationship. Conversely, the fact that two variables are correlated does not guarantee the existence of a relationship; it could be that the model -- a characterization of the underlying causal theory -- does not reflect the correct interplay among the explanatory variables.
From page 311...
... A typical regression model will include one or more dependent variables, each of which is believed to be causally related to a series of explanatory variables. Because we cannot be certain that the explanatory variables are themselves unaffected or independent of the influence of the dependent variable (at least at the point of initial study)
From page 312...
... 1. Choosing the dependent variable The variable to be explained, the dependent variable, should be the appropriate variable for analyzing the question at issue.26 Suppose, for example, that pay dis 25.  In the literature on natural and quasi-experiments, the explanatory variables are characterized as "treatments" and the dependent variable as the "outcome." For a review of natural experiments in the criminal justice arena, see David P
From page 313...
... Choosing the additional explanatory variables An attempt should be made to identify additional known or hypothesized explanatory variables, some of which are measurable and may support alternative substantive hypotheses that can be accounted for by the regression analysis. Thus, in a discrimination case, a measure of the skills of the workers may provide an alternative explanation -- lower salaries may have been the result of inadequate skills.29 appropriate.
From page 314...
... . 32. Technically, the omission of explanatory variables that are correlated with the variable of interest can cause biased estimates of regression parameters.
From page 315...
... Suppose also that a regression analysis of the wage rate of employees (the dependent variable) on years of experience and a variable reflecting the sex of each employee (the explanatory variable)
From page 316...
... Note, however, that the inclusion of explanatory variables that are irrelevant (i.e., not correlated with the dependent variable) reduces the precision of the regression results.
From page 317...
... 1021 (1990) , the defendant relied on a regression model in which a dummy variable reflecting gender appeared as an explanatory variable.
From page 318...
... for allegedly discriminating against female instructional staff in the payment of salaries. One approach of the plaintiff 's expert was to use multiple regression analysis.
From page 319...
... In either analysis, the expert may want to evaluate a specific hypothesis, usually relating to a question of liability or to the determination of whether there is measurable impact of an alleged violation. Thus, in a sex discrimination case, an expert may want to evaluate a null hypothesis of no discrimination against the alternative hypothesis that discrimination takes a par 862 F
From page 320...
... 46. The t-test is strictly valid only if a number of important assumptions hold. However, for many regression models, the test is approximately valid if the sample size is sufficiently large.
From page 321...
... 48. The use of 1%, 5%, and, sometimes, 10% levels for determining statistical significance remains a subject of debate. One might argue, for example, that when regression analysis is used in a price-fixing antitrust case to test a relatively specific alternative to the null hypothesis (e.g., price fixing)
From page 322...
... In the multiple regression framework, the expert often assumes that changes in explanatory variables affect the dependent variable, but changes in the dependent variable do not affect the explanatory variables -- that is, there is no feedback.50 In making this assumption, the expert draws the conclusion that a correlation between a covariate and the dependent outcome variable results from the effect of the former on the latter and not vice versa. Were it the case that the causality was reversed so that the outcome variable affected the covariate, and not vice versa, spurious correlation is likely to cause the expert and the trier of fact to reach the wrong conclusion.
From page 323...
... . Feedback Demand Price Cost Advertising As a general rule, there are no basic direct statistical tests for determining the 6-1.eps direction of causality; rather, the expert, when asked, should be prepared to defend his or her assumption based on an understanding of the underlying behavior evidence relating to the businesses or individuals involved.52 Although there is no single approach that is entirely suitable for estimating models when the dependent variable affects one or more explanatory variables, one possibility is for the expert to drop the questionable variable from the regression to determine whether the variable's exclusion makes a difference.
From page 324...
... T It is essential in multiple regression analysis that the explanatory variable of interest not be correlated perfectly with one or more of the other explanatory variables. If there were perfect correlation between two variables, the expert could not separate out the effect of the variable of interest on the dependent variable from the effect of the other variable.
From page 325...
... It is useful to view the cumulative effect of all of these sources of modeling error as being represented by an additional variable, the error term, in the multiple regression model. An important assumption in multiple regression analysis is that the error term and each of the explanatory variables are independent of each other.
From page 326...
... To the extent that large firms advertise more than small firms, the regression errors would be large for the large firms and small for the small firms. A third possibility is that the dependent variable varies at the individual level, but the explanatory variable of interest varies only at the level of a group.
From page 327...
... One generally useful diagnostic technique is to determine to what extent the estimated parameter changes as each data point in the regression analysis is dropped from the sample. An influential data point -- a point that causes the estimated parameter to change substantially -- should be studied further to determine whether mistakes were made in the use of the data or whether important explanatory variables were omitted.58 5.
From page 328...
... Any individual with substantial training in and experience with multiple regression and other statistical methods may be qualified as an expert.61 A doctoral degree in a discipline that teaches theoretical or applied statistics, such as economics, history, and psychology, usually signifies to other scientists that the proposed expert meets this preliminary test of the qualification process. The decision to qualify an expert in regression analysis rests with the court.
From page 329...
... It will also be advantageous to minimize any ex parte contact with 62.  Judge Posner notes in In re High Fructose Corn Syrup Antitrust Litig., 295 F.2d 651, 665 (7th Cir., 2002) , "the judge and jury can repose a degree of confidence in his testimony that it could not repose in that of a party's witness.
From page 330...
... In evaluating the admissibility of statistical evidence, courts should consider the following issues: 1. Has the expert provided sufficient information to replicate the multiple regression analysis?
From page 331...
... 4. A party proposing to offer an expert's regression analysis at trial should ask the expert to fully disclose (a)
From page 332...
... 6. The expert should report investigations into errors associated with the choice of variables and assumptions underlying the regression model.
From page 333...
... Often, visual displays are used to describe the relationship between variables that are used in multiple regression analysis. Figure 2 is a scatterplot that relates scores on a job aptitude test (shown on the x-axis)
From page 334...
... . No Correlation Job Performance Rating Job Aptitude Test Score Multiple regression analysis goes beyond the calculation of correlations; it is a 6-3.eps method in which a regression line is used to relate the average of one variable -- the ­ dependent variable -- to the values of other explanatory variables.
From page 335...
... , a is the intercept of the line with the y-axis when X equals 0, and b is the slope -- the change in the dependent variable associated with a 1-unit change in the explanatory variable. In Figure 4, for example, when the aptitude test score is 0, the predicted (average)
From page 336...
... B Linear Regression Model When there are an arbitrary number of explanatory variables, the linear regression model takes the following form: Y = β0 + β1X1 + β2X2 + .
From page 337...
... 1. Specifying the regression model Suppose an expert wants to analyze the salaries of women and men at a large publishing house to discover whether a difference in salaries between employees with similar years of work experience provides evidence of discrimination.73 To begin with the simplest case, Y, the salary in dollars per year, represents the dependent variable to be explained, and X1 represents the explanatory variable -- the number of years of experience of the employee.
From page 338...
... Similarly, the slope of the line measures the (average) change in the dependent variable associated with a unit increase in an explanatory variable; the slope β1 also is shown.
From page 339...
... b. Nonlinearities Nonlinear models account for the possibility that the effect of an explanatory variable on the dependent variable may vary in magnitude as the level of the explanatory variable changes.
From page 340...
... D Determining the Precision of the Regression Results Least squares regression provides not only parameter estimates that indicate the direction and magnitude of the effect of a change in the explanatory variable on the dependent variable, but also an estimate of the reliability of the parameter estimates and a measure of the overall goodness of fit of the regression model.
From page 341...
... On this basis, the expert also could calculate the overall mean price of gasoline to be $1.25 and the standard deviation to be $0.04. Least squares regression generalizes this result, by calculating means whose values depend on one or more explanatory variables.
From page 342...
... For relatively large samples (often, thirty or more data points will be sufficient for regressions with a small number of explanatory variables) , the probability that the estimate of a parameter lies within an interval of 2 standard errors around the true parameter is approximately .95, or 95%.
From page 343...
... If the null hypothesis b equals 0 is true, using a 95% confidence level will cause the expert to falsely reject the null hypothesis 5% of the time. Consequently, results often are said to be significant at the 5% level.83 As an example, consider a more complete set of regression results associated with the salary regression described in equation (9)
From page 344...
... 2. Goodness of fit Reported regression results usually contain not only the point estimates of the parameters and their standard errors or t-statistics, but also other information that tells how closely the regression line fits the data.
From page 345...
... Its value ranges from 0 to 1. An R2 of 0 means that the explanatory variables explain none of the variation of the dependent variable; an R2 of 1 means that the explanatory variables explain all of the variation.
From page 346...
... Outliers associated with relatively extreme values of explanatory variables are likely to be especially influential. See, e.g., Fisher v.
From page 347...
... In the lower portion of Table 1, note that the parameter estimates, the standard errors, and the t-statistics match the values given in equation (12) .89 The variable "Intercept" refers to the constant term b0 in the regression.
From page 348...
... are jointly equal to 0 -- that there is no linear association between the dependent variable and any of the explanatory variables.This is equivalent to the null hypothesis that R2 is equal to 0. In this case, the F-ratio of 174.71 is sufficiently high that the expert can reject the null hypothesis with a very high degree of confidence (i.e., with a 1% level of significance)
From page 349...
... A more complete model with additional explanatory variables would result in a lower SEF and a smaller 95% interval for the prediction. A danger exists when using the SEF, which applies to the standard errors of the estimated coefficients as well.
From page 350...
... The defense expert then presented a regression analysis that added an additional explanatory variable (i.e., a covariate) , the years of experience of each police officer (EXP)
From page 351...
... , but also in the reward they get as they accumulate more and more experience. The debate between the experts continued, focusing less on the statistical interpretation of any one particular regression model, but more on the model choice itself, and not simply on statistical significance, but also with regard to practical significance.
From page 352...
... A useful statistic in hypothesis testing. dependent variable.
From page 353...
... A prediction about the values of the dependent variable that go beyond the sample; consequently, the forecast must be based on predictions for the values of the explanatory variables in the regression model. explanatory variable.
From page 354...
... A regression model in which the effect of a change in each of the explanatory variables on the dependent variable is the same, no matter what the values of those explanatory variables. mean (sample)
From page 355...
... . A statistic that measures the percentage of the variation in the dependent variable that is accounted for by all of the explanatory variables in a regression model.
From page 356...
... statistical significance. A test used to evaluate the degree of association between a dependent variable and one or more explanatory variables.
From page 357...
... Campbell, Regression Analysis in Title VII Cases: Minimum Standards, Comparable Worth, and Other Issues Where Law and Statistics Meet, 36 Stan.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.