Skip to main content

Currently Skimming:

Reference Guide on Statistics--David H. Kaye and David A. Freedman
Pages 211-302

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 211...
... Randomized controlled experiments, 220 3. Observational studies, 220 4.
From page 212...
... The regression model, 279 2. Standard errors, t-statistics, and statistical significance, 281 Glossary of Terms, 283 References on Statistics, 302 212
From page 213...
... This section covers estimation, standard errors and confidence intervals, p-values, and hypothesis tests. • Section V shows how associations can be described by scatter diagrams, correlation coefficients, and regression lines.
From page 214...
... The hearsay rule rarely is a s ­erious barrier to the presentation of statistical studies, because such studies may be offered to explain the basis for an expert's opinion or may be admissible under the learned treatise exception to the hearsay rule.2 Because most statistical methods relied on in court are described in textbooks or journal articles and are capable of producing useful results when properly applied, these methods generally satisfy important aspects of the "scientific knowledge" requirement in Daubert v. Merrell Dow Pharmaceuticals, Inc.3 Of course, a particular study may use a method that is entirely appropriate but that is so poorly executed that it should be inadmissible under Federal Rules of Evidence 403 and 702.4 Or, the method may be inappropriate for the problem at hand and thus lack the "fit" spoken of in Daubert.5 Or the study might rest on data of the type not reasonably relied on by statisticians or substantive experts and hence run afoul of Federal Rule of Evidence 703.
From page 215...
... 9.  See The Evolving Role of Statistical Assessments as Evidence in the Courts, supra note 1, at 164 (recommending that the expert be free to consult with colleagues who have not been retained 215
From page 216...
... . The National Research Council also recommends that "if a party gives statistical data to different experts for competing analyses, that fact be disclosed to the testifying expert, if any." The Evolving Role of Statistical Assessments as Evidence in the Courts, supra note 1, at 167.
From page 217...
... ; Kaye et al., supra note 7, §§ 8.7.2 & 12.5.1. See also Matrixx Initiatives, supra, at 1322 (listing "a temporal relationship" in a single patient as one indication of "a reliable causal link")
From page 218...
... ; Zeisel & Kaye, supra note 1, at 66–67. There are problems in measuring exposure to electromagnetic fields, and results are inconsistent from one study to another.
From page 219...
... Synonyms for independent variables are risk factors, predictors, and explanatory variables. 17.  For example, a confounding variable may be correlated with the independent variable and act causally on the dependent variable.
From page 220...
... People in the treatment group would know they were subject to the death penalty for murder; the 19.  Randomization of subjects to treatment or control groups puts statistical tests of significance on a secure footing. Freedman et al., supra note 12, at 503–22, 545–63; see infra Section IV.
From page 221...
... • The association holds when effects of confounding variables are taken into account by appropriate methods, for example, comparing smaller groups that are relatively homogeneous with respect to the confounders.23 • There is a plausible explanation for the effect of the independent variable; alternative explanations in terms of confounding should be less plausible than the proposed causal link.24 21. A procedure often used to control for confounding in observational studies is regression analysis. The underlying logic is described infra Section V.D and in Daniel L
From page 222...
... , or did it depend on the judgment of the investigator? If the data came from an observational study or a nonrandomized controlled experiment, • How did the subjects come to be in treatment or in control groups?
From page 223...
... 28.  See Shari Seidman Diamond, Reference Guide on Survey Research, Sections III, IV, in this manual.
From page 224...
... In this hypothetical case, the fit between the sampling frame and the population would be excellent. In other situations, the sampling frame is more problematic.
From page 225...
... Lists that overrepresented the affluent had worked well in earlier elections, when rich and poor voted along similar lines, but the bias in the sampling frame proved fatal when the Great Depression made economics a salient consideration for voters. 33.  See Freedman et al., supra note 12, at 337–39.
From page 226...
... 1997) (discussed supra note 32)
From page 227...
... Reliability can be ascertained by measuring the same quantity several times; the measurements must be made independently to avoid bias. Given independence, the correlation coefficient (infra Section V.B)
From page 228...
... . 42. As the discussion of the correlation coefficient indicates, infra Section V.B, the closer the coefficient is to 1, the greater the validity.
From page 229...
... Freedman, Modeling Selection Effects, in Social Science Methodology 225 (Steven Turner & William Outhwaite eds., 2007) ; Howard Wainer & David Thissen, True Score Theory: The Traditional Method, in Test Scoring 23 (David Thissen & Howard Wainer eds., 2001)
From page 230...
... Randomness in the technical sense also justifies calculations of standard errors, confidence intervals, and p-values (infra Sections IV–V)
From page 231...
... The number of petty larcenies reported in Chicago more than doubled one year -- not because of an abrupt crime wave, but because a new police commissioner introduced an improved reporting system.48 For a time, police officials in Washington, D.C., "demonstrated" the success of a law-and-order campaign by valuing stolen goods at $49, just below the $50 threshold then used for inclusion in the Federal Bureau of Investigation's Uniform Crime Reports.49 Allegations of manipulation in the reporting of crime from one time period to another are legion.50 Changes in data collection procedures are by no means limited to crime statistics. Indeed, almost all series of numbers that cover many years are affected by changes in definitions and collection methods.
From page 232...
... . The reported 99.5% accuracy rate conceals a crucial fact -- the company had virtually no data with which to measure the rate of false positives.56 54.  Id.
From page 233...
... . 58.  For assistance in coping with percentages, see Zeisel, supra note 12, at 1–24.
From page 234...
... The analogous statistic used in epidemiology is called the relative risk. See Green et al., supra note 13, Section III.A.
From page 235...
... = 19/99. The odds ratio for rejection instead of acceptance is the same, except that the order is reversed.63 Although the odds ratio has desirable mathematical properties, its meaning may be less clear than that of the selection ratio or the simple difference.
From page 236...
... Figure 1 Figure 2 5-1 fixed image 5-2 fixed image 2. How are distributions displayed?
From page 237...
... See Freedman et al., supra note 12, at 31–41. 68. As the width of the bins decreases, the graph becomes more detailed, but the appearance becomes more ragged until finally the graph is effectively a plot of each datum.
From page 238...
... These were some 10 times larger than the median awards described in briefs defending the system of punitive damages. Michael Rustad & Thomas Koenig, The Supreme Court and Junk Social Science: Selective Distortion in Amicus Briefs, 72 N.C.
From page 239...
... The median is the 50th percentile. 81. When the distribution follows the normal curve, about 68% of the data will be within 1 standard deviation of the mean, and about 95% will be within 2 standard deviations of the mean.
From page 240...
... Randomness and cognate terms have precise technical meanings; it is randomness in the technical sense that justifies the probability calculations behind standard errors, confidence intervals, and p-values (supra Section II.D, infra Sections IV.A–B)
From page 241...
... The question is, by how much? The precision of an estimate is usually reported in terms of the standard error and a confidence interval.
From page 242...
... The standard deviation (supra Section III.E) of the 500 sample values was $2200.
From page 243...
... The standard error gives the likely magnitude of this random error, with smaller standard errors indicating better estimates.87 In our example of the Nixon papers, the standard error for the sample average can be computed from (1) the size of the sample -- 500 boxes -- and (2)
From page 244...
... (We are assuming a large sample; the confidence levels correspond to areas under the normal curve and are approximations; the "population average" just means the average value of all the items in the population.89) In summary, • To get a 68% confidence interval, start at the sample average, then add and subtract 1 standard error.
From page 245...
... Thus, 1 SE is one standard error, 2 SE is twice the standard error, and so forth. With a large sample and an estimate like the sample average, a 68% confidence interval is the range estimate – 1 SE to estimate + 1 SE.
From page 246...
... If the standard error is small, random error probably has little effect. If the standard error is large, the estimate may be seriously wrong.
From page 247...
... The confidence level does not express the chance that repeated estimates would fall into the confidence interval.91 With the Nixon papers, the 95% confidence interval should not be interpreted as saying that 95% of all random samples will produce estimates in the range from $36 million to $44 million. Moreover, the confidence level does not give the probability that the unknown parameter lies within the confidence interval.92 For example, the 95% confidence level should not be translated to a 95% probability that the total value of the papers is in the range from $36 million to $44 million.
From page 248...
... An interval that runs from $34 million to $44 million is one thing, but –$10 million to $90 million is something else entirely. Statements about confidence without mention of an interval are practically meaningless.94 Standard errors and confidence intervals are often derived from statistical models for the process that generated the data.The model usually has parameters -- ­ numerical constants describing the population from which samples were drawn.
From page 249...
... When a model does not fit the data collection process, estimates and standard errors will not be probative. Standard errors and confidence intervals generally ignore systematic errors such as selection bias or nonresponse bias (supra Sections II.B.1–2)
From page 250...
... In a typical jury discrimination case, small p-values help a defendant appealing a conviction by showing that the jury panel is not like a random sample from the relevant population; large p-values hurt. In the usual employment context, small p-values help plaintiffs who complain of discrimination -- for example, by showing that a disparity in promotion rates is too large to be explained by chance; conversely, large p-values would be consistent with the defense argument that the disparity is just due to chance.
From page 251...
... Sometimes standard errors will be part of the analysis; other times they will not be. Sometimes a difference of two standard errors will imply a p-value of about 5%; other times it will not.
From page 252...
... Statisticians distinguish between statistical and practical significance to make the point. When practical significance is lacking -- when the size of a disparity is negligible -- there is no reason to worry about statistical significance.102 It is easy to mistake the p-value for the probability of the null hypothesis given the data (supra Section IV.B.1)
From page 253...
... By inquiring into the magnitude of an effect, courts can avoid being misled by p-values. To focus attention on more substantive concerns -- the size of the effect and the precision of the statistical analysis -- interval estimates (e.g., confidence intervals)
From page 254...
... Some commentators have claimed that the cutoff for significance should be chosen to equalize the chance of a false positive and a false negative, on the ground that this criterion corresponds to the more-probable-than-not burden of proof. The argument is fallacious, because a and b do not give the probabilities of the null and alternative hypotheses; see supra Sections IV.B.1–2; supra note 34.
From page 255...
... Likewise, p-values may be difficult to compute for hypotheses of interest.109 3. Small samples may be unreliable, with large standard errors, broad confi dence intervals, and tests having low power.
From page 256...
... .  But see Freedman et al., supra note 12, at 547–50. One-tailed tests at the 5% level are viewed as weak evidence -- no weaker standard is commonly used in the technical literature.
From page 257...
... See authorities cited, supra note 21.
From page 258...
... 117.  Operating characteristics include the expected value and standard error of estimators, probabilities of error for statistical tests, and the like. 118.  In speaking of "frequentist statisticians" or "Bayesian statisticians," we do not mean to suggest that all statisticians fall on one side of the philosophical divide or the other.
From page 259...
... Such analyses have rarely been used in court, and the question of their forensic value has been aired primarily in the academic literature. Some statisticians favor Bayesian methods, and some commentators have proposed using these methods in some kinds of cases.122 The frequentist view of statistics is more conventional; subjective Bayesians are a well-established minority.123 121.  Here, confidence has the meaning ordinarily ascribed to it, rather than the technical interpretation applicable to a frequentist confidence interval.
From page 260...
... Such models have been offered in court to prove disparate impact in discrimination cases, to estimate damages in antitrust actions, and for many other purposes. Sections V.A, V.B, and V.C cover some preliminary material, showing how scatter diagrams, correlation coefficients, and regression lines can be used to summarize relationships between variables.124 Section V.D explains the ideas and some of the pitfalls.
From page 261...
... A correlation coefficient of 0 indicates no linear association between the variables. The maximum value for the coefficient is +1, indicating a perfect linear relationship: The dots in the scatter diagram fall on a straight line that slopes up.
From page 262...
... The correlation coefficient has a number of limitations, to be considered in turn. The correlation coefficient is designed to measure linear association.
From page 263...
... In more realistic examples, the lurking variable is harder to identify.127 5-9 fixed et al., supra note 13, Section IV.C, provides one such example. 127.  Green image 263
From page 264...
... . 128.  See also Rubinfeld, supra note 21.
From page 265...
... The slope of the regression line has the same limitations as the correlation coefficient: (1) The slope may be misleading if the relationship is strongly nonlinear and (2)
From page 266...
... Cf. Green et al., supra note 13, Section II.B.4 (suggesting that ecological studies of exposure and disease are "far from conclusive" because of the lack of data on confounding variables (a much more general problem)
From page 267...
... The slope would be interpreted as the difference between the white turnout rate and the black turnout rate for the white candidate. Furthermore, the intercept would be interpreted as the black turnout rate for the white candidate.132 The validity of such estimates is contested in the statistical literature.133 the Voting Rights Act.
From page 268...
... A regression model attempts to combine the values of certain variables (the independent variables) to get expected values for another variable (the dependent variable)
From page 269...
... In either case, the actual length will differ from expected, by a random error e. In standard statistical terminology, the e's for different observations on the spring are assumed to be independent and identically distributed, with a mean of zero.
From page 270...
... See supra Section V.C.1; Freedman et al., supra note 12, at 208–10. The method of least squares was developed by Adrien-Marie Legendre (France, 1752–1833)
From page 271...
... However, as noted earlier, such an interpretation is wrong: p merely represents the probability of getting a large test statistic, given that the model is correct and the true coefficient for gender is zero (see supra Section IV.B, infra Appendix, Section D.2)
From page 272...
... ; see supra note 21 for references to a range of academic opinion. More recently, some investigators have turned to graphical models.
From page 273...
... (A1) 145.  But see supra note 123 (on "objective Bayesianism")
From page 274...
... It yields the conditional probability of hypothesis H0 given that event A has occurred. For a stylized example in a criminal case, H0 is the hypothesis that blood found at the scene of a crime came from a person other than the defendant; H1 is the hypothesis that the blood came from the defendant; A is the event that blood from the crime scene and blood from the defendant are both type A
From page 275...
... See supra note 123. 149. For problematic assumptions of independence in litigation, see, e.g., Wilson v.
From page 276...
... . The standard error is n× ( )
From page 277...
... Probability histogram for the number of women in a random sample  of 350 people drawn from a large population that is 40% female and 60% male. The normal curve is shown for comparison.
From page 278...
... • The standard error for the sample average equals N − n σ (A12)
From page 279...
... The histogram follows the normal curve. That is why confidence levels can be based on the standard error, with confidence levels read off the normal curve -- for estimators that are essentially unbiased, and obey the central limit theorem (supra Section IV.A.2, Appendix Section B)
From page 280...
... is the expected value for salary, given the explanatory variables (education, experience, gender)
From page 281...
... 2. Standard errors, t-statistics, and statistical significance Statistical proof of discrimination depends on the significance of the estimated coefficient for the gender variable.
From page 282...
... If the model is wrong, the standard error, t-statistic, and significance level are rather difficult to interpret. Even if the model is granted, there is a further issue.
From page 283...
... alternative hypothesis. A statistical hypothesis that is contrasted with the null hypothesis in a significance test. See statistical hypothesis; significance test.
From page 284...
... coefficient of variation. A statistic that measures spread relative to the mean: SD/mean, or SE/expected value. See expected value; mean; standard devia tion; standard error.
From page 285...
... An association between the dependent and independent variables in an observational study may not be causal, but may instead be due to confounding. See controlled experiment; observational study.
From page 286...
... covariance.  A quantity that describes the statistical interrelationship of two vari ables. Compare correlation coefficient; standard error; variance.
From page 287...
... general linear model.  Expresses the dependent variable as a linear combination of the independent variables plus an error term whose components may be dependent and have differing variances. See error term; linear combination; variance.
From page 288...
... . In a regression model, the independent variables are used to predict the dependent variable.
From page 289...
... multicollinearity.  Also, collinearity. The existence of correlations among the independent variables in a regression model.
From page 290...
... normal distribution.  Also, Gaussian distribution. When the normal distribution has mean equal to 0 and standard error equal to 1, it is said to be "standard normal." The equation for the density is then y = e − x /2/ 2π 2 where e = 2.71828. .
From page 291...
... Compare confounding variable; controlled experiment. observed significance level. A synonym for p-value.
From page 292...
... Compare confidence interval; interval estimate. Poisson distribution. A limiting case of the binomial distribution, when the number of trials is large and the common probability is small.
From page 293...
... See regression model. Compare multiple correlation coefficient; standard error of regression.
From page 294...
... See normal curve; standard error. randomization.  See controlled experiment; randomized controlled experiment.
From page 295...
... The predicted value comes typically from a regression equation, and is better called the fit ted value, because there is no real prediction going on. See regression model; independent variable.
From page 296...
... More generally, estimate = true value + bias + sampling error Sampling error is also called chance error or random error. See standard error.
From page 297...
... significant.  See p-value; practical significance; significance test. simple random sample.  A random sample in which each unit in the sampling frame has the same chance of being sampled.
From page 298...
... Compare expected value; standard deviation. standard error of regression.  Indicates how actual values differ (in some aver age sense)
From page 299...
... t-statistic.  A test statistic, used to make the t-test. The t-statistic indicates how far away an estimate is from its expected value, relative to the standard error.
From page 300...
... See statistical hypothesis; significance test; t-test. t-test.  A statistical test based on the t-statistic.
From page 301...
... Compare standard error; covariance. weights.  See stratified random sample.
From page 302...
... . National Research Council, The Evolving Role of Statistical Assessments as Evi dence in the Courts (Stephen E


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.