3
DNA Typing: Statistical Basis for Interpretation
Can DNA typing uniquely identify the source of a sample? Because any two human genomes differ at about 3 million sites, no two persons (barring identical twins) have the same DNA sequence. Unique identification with DNA typing is therefore possible provided that enough sites of variation are examined.
However, the DNA typing systems used today examine only a few sites of variation and have only limited resolution for measuring the variability at each site. There is a chance that two persons might have DNA patterns (i.e., genetic types) that match at the small number of sites examined. Nonetheless, even with today's technology, which uses 35 loci, a match between two DNA patterns can be considered strong evidence that the two samples came from the same source.
Interpreting a DNA typing analysis requires a valid scientific method for estimating the probability that a random person might by chance have matched the forensic sample at the sites of DNA variation examined. A judge or jury could appropriately weigh the significance of a DNA match between a defendant and a forensic sample if told, for example, that ''the pattern in the forensic sample occurs with a probability that is not known exactly, but is less than 1 in 1,000" (if the database that shows no match with the defendant's pattern is of size 1,000).
To say that two patterns match, without providing any scientifically valid estimate (or, at least, an upper bound) of the frequency with which such matches might occur by chance, is meaningless.
Substantial controversy has arisen concerning the methods for estimating
the population frequencies of specific DNA typing patterns.^{1}^{,}^{2}^{,}^{3}^{,}^{4}^{,}^{5}^{,}^{6}^{,}^{7}^{,}^{8}^{,}^{9}^{,}^{10}^{,}^{11}^{,}^{12}^{,}^{13}^{,}^{14} Questions have been raised about the adequacy of the population databases on which frequency estimates are based and about the role of racial and ethnic origin in frequency estimation. Some methods based on simple counting produce modest frequencies, whereas some methods based on assumptions about population structure can produce extreme frequencies. The difference can be striking: In one Manhattan murder investigation, the reported frequency estimates ranged from 1 in 500 to 1 in 739 billion, depending on how the statistical calculations were performed. In fact, both estimates were based on extreme assumptions (the first on counting matches in the databases, the second on multiplying lower bounds of each allele frequency). The discrepancy not only is a question of the weight to accord the evidence (which is traditionally left to a jury), but bears on the scientific validity of the alternative methods used for rendering estimates of the weight (which is a threshold question for admissibility).
In this chapter, we review the issues of population genetics that underlie the controversy and propose an approach for making frequency estimates that are independent of race and ethnic origin. This approach addresses the central purpose of DNA typing as a tool for the identification of persons.
ESTIMATING THE POPULATION FREQUENCY OF A DNA PATTERN
DNA "exclusions" are easy to interpret: if technical artifacts can be excluded, a nonmatch is definitive proof that two samples had different origins. But DNA "inclusions" cannot be interpreted without knowledge of how often a match might be expected to occur in the general population. Because of that fundamental asymmetry, although each new DNA typing method or marker can be used for investigation and exclusion as soon as its technical basis is secure, it cannot be interpreted with regard to inclusion until the population frequencies of the patterns have been established. We discuss the issues involved in estimating the frequency of a DNA pattern, consisting of pairs of alleles at each of several loci.
Estimating Frequencies of DNA Patterns by Counting
A standard way to estimate frequency is to count occurrences in a random sample of the appropriate population and then use classical statistical formulas to place upper and lower confidence limits on the estimate. Because estimates used in forensic science should avoid placing undue weight on incriminating evidence, an upper confidence limit of the frequency should be used in court. This is especially appropriate for forensic DNA typing, because any loss of power can be offset by studying additional loci.
To estimate the frequency of a particular DNA pattern, one might count the number of occurrences of the pattern in an appropriate random population sample. If the pattern occurred in 1 of 100 samples, the estimated frequency would be 1%, with an upper confidence limit of 4.7%. If the pattern occurred in 0 of 100 samples, the estimated frequency would be 0%, with an upper confidence limit of 3%. (The upper bound cited is the traditional 95% confidence limit, whose use implies that the true value has only a 5% chance of exceeding the upper bound.) Such estimates produced by straightforward counting have the virtue that they do not depend on theoretical assumptions, but simply on the sample's having been randomly drawn from the appropriate population. However, such estimates do not take advantage of the full potential of the genetic approach.
Estimating Frequencies of DNA Patterns with the Multiplication Rule (Product Rule)
In contrast, population frequencies often quoted for DNA typing analyses are based not on actual counting, but on theoretical models based on the principles of population genetics. Each matching allele is assumed to provide statistically independent evidence, and the frequencies of the individual alleles are multiplied together to calculate a frequency of the complete DNA pattern. Although a databank might contain only 500 people, multiplying the frequencies of enough separate events might result in an estimated frequency of their all occurring in a given person of 1 in a billion. Of course, the scientific validity of the multiplication rule depends on whether the events (i.e., the matches at each allele) are actually statistically independent.
From a statistical standpoint, the situation is analogous to estimating the proportion of blond, blueeyed, fairskinned people in Europe by separately counting the frequencies of people with blond hair, people with blue eyes, and people with fair skin and calculating their proportions. If a population survey of Europe showed that 1 of 10 people had blond hair, 1 of 10 had blue eyes, and 1 of 10 had fair skin, one would be wrong to multiply these frequencies to conclude that the frequency of people with all three traits was 1 in 1,000. Those traits tend to cooccur in Nordics, so the actual frequency of the combined description is probably higher than 1 in 1,000. In other words, the multiplication rule can produce an underestimate in this case, because the traits are correlated owing to population substructure—the traits have different frequencies in different population groups. Correlations between those traits might also be due to selection or conceivably to the action of some genes on all three traits. In any case, the example illustrates that correlations within subgroups—whatever their origin—bear on the procedures for estimating frequencies.
Unlike many of the technical aspects of DNA typing that are validated by daily use in hundreds of laboratories, the extraordinary populationfrequency estimates sometimes reported for DNA typing do not arise in research or medical applications that would provide useful validation of the frequency of any particular person's DNA profile. Because it is impossible or impractical to draw a large enough population to test calculated frequencies for any particular DNA profile much below 1 in 1,000, there is not a sufficient body of empirical data on which to base a claim that such frequency calculations are reliable or valid per se. The assumption of independence must be strictly scrutinized and estimation procedures appropriately adjusted if possible. (The rarity of all the genotypes represented in the databank can be demonstrated by pairwise comparisons. Thus, in a recently reported analysis of the FBI database, no exactly matching pairs of profiles were found in fivelocus DNA profiles, and the closest match was a single threelocus match among 7.6 million basepair comparisons.)^{13}
The multiplication rule has been routinely applied to bloodgroup frequencies in the forensic setting. However, that situation is substantially different: Because conventional genetic markers are only modestly polymorphic (with the exception of human leukocyte antigen, HLA, which usually cannot be typed in forensic specimens), the multilocus genotype frequencies are often about 1 in 100. Such estimates have been tested by simple empirical counting. Pairwise comparisons of allele frequencies have not revealed any correlation across loci. Hence, the multiplication rule does not appear to lead to the risk of extrapolating beyond the available data for conventional markers. In contrast, highly polymorphic DNA markers exceed the informative power of protein markers, so multiplication leads to estimates that are less than the reciprocal of the size of the databases.
Validity of Multiplication Rule and Population Substructure
The multiplication rule is based on the assumption that the population does not contain subpopulations with distinct allele frequencies—that each individual's alleles constitute statistically independent random selections from a common gene pool. Under this assumption, the procedure for calculating the population frequency of a genotype is straightforward:

Count the frequency of alleles. For each allele in the genotype, examine a random sample of the population and count the proportion of matching alleles—that is, alleles that would be declared to match according to the rule that is used for declaring matches in a forensic context. This step requires only the selection of a sample that is truly random with reference to the genetic type; it does not appeal to any theoretical models.

It is essential that the forensic matching rule be precise and objective—otherwise it would be impossible to apply it in calculating the proportion of individuals with matching alleles in the population databank. And it is essential that the same rule be applied to count frequencies in the population databank, because this is the only way to determine the proportion of random individuals that would have been declared to match in the forensic context. (In the context of forensic applications, an estimate of the probability of a match in DNA typing has been termed conservative if on the average it is larger than the actual one, so that any weight applied to the estimate would favor the suspect. Thus, some laboratories use a more conservative rule for counting population frequencies than for forensic matches—an acceptable approach, because it overestimates allele frequency. The converse would not be acceptable.)

Calculate the frequency of the genotype at each locus. The frequency of a homozygous genotype a1/a1 is calculated to be p_{a1}^{2} where p_{a1} denotes the frequency of allele a1. The frequency of a heterozygous genotype a1/a2 is calculated to be 2p_{a1}p_{a2}, where p_{a1} and p_{a2} denote the frequencies of alleles a1 and a2. In both cases, the genotype frequency is calculated by simply multiplying the two allele frequencies, on the assumption that there is no statistical correlation between the allele inherited from one's father and the allele inherited from one's mother. The factor of 2 arises in the heterozygous case, because one must consider the case in which allele a1 was contributed by the father and allele a2 by the mother and vice versa: each of the two cases has probability p_{a1}p_{a2}. When there is no correlation between the two parental alleles, the locus is said to be in HardyWeinberg equilibrium. We should note that in forensic DNA typing, a slight modification is used in the case of apparently homozygous genotypes. When one observes only a single allele in a sample, one cannot be certain that the individual is a homozygote; it is always possible that a second allele has been missed for technical reasons. To be conservative, most forensic laboratories do not calculate the probability that the sample has two copies of the allele (which is p_{a1}^{2}), but rather the probability that the sample has at least one copy (which is 2p_{a1}) leaving open the possibility of a second allele. We endorse this procedure.)

Calculate the frequency of the complete multilocus genotype. The frequency of a complete genotype is calculated by multiplying the genotype frequencies at all the loci. As in the previous step, this calculation assumes that there is no correlation between genotypes at different loci; the absence of such correlation is called linkage equilibrium. (Some authors prefer to reserve the term linkage equilibrium for loci on the same chromosome and to use the term gametic phase equilibrium for loci on different chromosomes.) Suppose, for example, that a person has genotype a1/a2, b1/b2, c1/

c1. If a random sample of the appropriate population shows that the frequencies of a1, a2, b1, b2, and c1 are approximately 0.1, 0.2, 0.3, 0.1, and 0.2, respectively, then the population frequency of the genotype would be estimated to be [2(0.1)(0.2)][2(0.3)(0.1)][(0.2)(0.2)] = 0.000096, or about 1 in 10,417.
Again, the validity of the multiplication rule depends on the absence of population substructure, because only in this special case are the different alleles statistically uncorrelated with one another.
In a population that contains groups with characteristic allele frequencies, knowledge of one allele in a person's genotype might carry some information about the group to which the person belongs, and this in turn alters the statistical expectation for the other alleles in the genotype. For example, a person who has one allele that is common among Italians is more likely to be of Italian descent and is thus more likely to carry additional alleles that are common among Italians. The true genotype frequency is thus higher than would be predicted by applying the multiplication rule and using the average frequency in the entire population.
To illustrate the problem with a hypothetical example, suppose that a particular allele at a VNTR locus has a 1% frequency in the general population, but a 20% frequency in a specific subgroup. The frequency of homozygotes for the allele would be calculated to be 1 in 10,000 according to the allele frequency determined by sampling the general population, but would actually be 1 in 25 for the subgroup. That is a hypothetical and extreme example, but illustrates the potential effect of demography on gene frequency estimation.
Basis of Concern About Population Substructure
The key question underlying the use of the multiplication rule is whether actual populations have significant substructure for the loci used for forensic typing. This has provoked considerable debate among population geneticists: some have expressed serious concern about the possibility of significant substructure,^{2}^{,}^{4}^{,}^{9}^{,}^{10} and others consider the likely degree of substructure not great enough to affect the calculations significantly.^{1}^{,}^{3}^{,}^{6}^{,}^{8}^{,}^{11}^{,}^{12}^{,}^{13}
The population geneticists who urge caution make three points:

Population genetic studies show some substructure within racial groups for genetic variants, including protein polymorphisms, genetic diseases, and DNA polymorphisms. Thus, North American Caucasians, blacks, Hispanics, Asians, and Native Americans are not homogeneous groups. Rather, each group is an admixture of subgroups with somewhat different allele frequencies. Allele frequencies have not yet been homogenized, because people tend to mate within these groups.

For any particular genetic marker, the degree of subpopulation differentiation cannot be predicted, but must be determined empirically.

For the loci used for forensic typing, there have been too few empirical investigations of subpopulation differentiation.
In short, those population geneticists believe that the absence of substructure cannot be assumed, but must be proved empirically (see Lewontin and Hartl^{10}). Other population geneticists, while recognizing the possibility or likelihood of population substructure, conclude that the evidence to date suggests that the effect on estimates of genotype frequencies are minimal (see Chakraborty and Kidd^{12}). Recent empirical studies concerning VNTR loci^{13}^{,}^{14} (Weir, personal communication, 1991) detected no deviation from independence within or across loci. Moreover, pairwise comparisons of all fivelocus DNA profiles in the FBI database showed no exact matches; the closest match was a single threelocus match among 7.6 million pairwise comparisons.^{13} These studies are interpreted as indicating that multiplication of gene frequencies across loci does not lead to major inaccuracies in the calculation of genotype frequency—at least not for the specific polymorphic loci examined.
Although mindful of the controversy, the committee has chosen to assume for the sake of discussion that population substructure may exist and provide a method for estimating population frequencies in a manner that adequately accounts for it. Our decision is based on several considerations:

It is possible to provide conservative estimates of population frequency, without giving up the inherent power of DNA typing.

It is appropriate to prefer somewhat conservative numbers for forensic DNA typing, especially because the statistical power lost in this way can often be recovered through typing of additional loci, where required.

It is important to have a general approach that is applicable to any loci used for forensic typing. Recent empirical studies pertain only to the population genetics of the VNTR loci in current use. However, we expect forensic DNA typing to undergo much change over the next decade—including the introduction of different types of DNA polymorphisms, some of which might have different properties from the standpoint of population genetics.

It is desirable to provide a method for calculating population frequencies that is independent of the ethnic group of the subject.
Assessing Population Substructure Requires Direct Sampling of Ethnic Groups
How can one address the possibility of population substructure? In principle, one might consider three approaches: (1) carry out population
studies on a large mixed population, such as a racial group, and use statistical tests to detect the presence of substructure; (2) derive theoretical principles that place bounds on the possible degree of population substructure; and (3) directly sample different groups and compare the observed allele frequencies. The third offers the soundest foundation for assessing population substructure, both for existing loci and for many new types of polymorphisms under development.
In principle, population substructure can be studied with statistical tests to examine deviations from HardyWeinberg equilibrium and linkage equilibrium. Such tests are not very useful in practice, however, because their statistical power is extremely low: even large and significant differences between subgroups will produce only slight deviations from HardyWeinberg expectations. Thus, the absence of such deviations does not provide powerful evidence of the absence of substructure (although the presence of such deviations provides strong evidence of substructure).
The correct way to detect genetic differentiation among subgroups is to sample the subgroups directly and to compare the frequencies. The following example is extreme and has not been observed in any U.S. population, but it illustrates the difference in power. Suppose that a population consists of two groups with different allele frequencies at a diallelic locus:

A 
a 
Group I 
0.5 
0.5 
Group II 
0.9 
0.1 
If there is random mating within the groups, HardyWeinberg equilibrium within the groups will produce these genotype frequencies:

AA 
Aa 
aa 
Group I 
0.25 
0.50 
0.25 
Group II 
0.81 
0.18 
0.01 
Suppose that Group I is 90% of the population and Group II is 10%. In the overall population, the observed genotype frequencies will be
AA = (0.9)(0.25) + (0.1)(0.81) = 0.306
Aa = (0.9)(0.50) + (0.1)(0.18) = 0.468
aa = (0.9)(0.25) + (0.1)(0.01) = 0.226
If we were unaware of the population substructure, what would we expect under HardyWeinberg equilibrium? The average allele frequencies will be
A = (0.9)(0.5) + (0.1)(0.9) = ).54
a = (0.9)(0.5) + (0.1)(0.1) = 0.46
which would correspond to the HardyWeinberg proportions of
AA = (0.54)(0.54) = 0.2916
Aa = 2(0.54)(0.46) = 0.4968
aa + (0.46)(0.46) = 0.2116
Even though there is substantial population substructure, the proportions do not differ greatly from HardyWeinberg expectation. In fact, one can show that detecting the population differentiation with the HardyWeinberg test would require a sample of nearly 1,200, whereas detecting it by direct examination of the subgroups would require a sample of only 22. In other words, the HardyWeinberg test is very weak for testing substructure.
The lack of statistical power to detect population substructure makes it difficult to detect genetic differentiation in a heterogeneous population. Direct sampling of subgroups is required, rather than examining samples from a large mixed population.
Similarly, population substructure cannot be predicted with certainty from theoretical considerations. Studies of population substructure for protein polymorphisms cannot be used to draw quantitative inferences concerning population substructure for VNTRs, because loci are expected to show different degrees of population differentiation that depend on such factors as mutation rate and selective advantage. Differences between races cannot be used to provide a meaningful upper bound on the variation within races. Contrary to common belief based on difference in skin color and hair form, studies have shown that the genetic diversity between subgroups within races is greater than the genetic variation between races.^{15} Broadly, the results of the studies accord with the theory of genetic drift: the average allele frequency of a large population group (e.g., a racial group) is expected to drift more slowly than the allele frequencies of the smaller subpopulations that it comprises (e.g., ethnic subgroups).
In summary, population differentiation must be assessed through direct studies of allele frequencies in ethnic groups. Relatively few such studies have been published so far, but some are under way.^{16} Clearly, additional such studies are desirable.
The Ceiling Principle: Accounting for Population Substructure
We describe here a practical and sound approach for accounting for possible population substructure: the ceiling principle.^{9} It is based on the following observation: The multiplication rule will yield conservative estimates, even for a substructured population, provided that the allele frequencies used in the calculation exceed the allele frequencies in any of the population subgroups. Accordingly, applying the ceiling principle involves two steps: (1) For each allele at each locus, determine a ceiling frequency that is an upper bound for the allele frequency that is independent of the
ethnic background of a subject; and (2) To calculate a genotype frequency, apply the multiplication rule, using the ceiling frequencies for the allele frequencies.
How should ceiling frequencies be determined? We must balance rigor and practicality. On the one hand, it is not enough to sample broad populations defined as "races" in the U.S. census (e.g., Hispanics), because of the possibility of substructure. On the other hand, it is not feasible or reasonable to sample every conceivable subpopulation in the world to obtain a guaranteed upper bound. The committee strongly recommends the following approach: Random samples of 100 persons should be drawn from each of 1520 populations, each representing a group relatively homogeneous genetically; the largest frequency in any of these populations or 5%, whichever is larger, should be taken as the ceiling frequency. The reason for using 5% is discussed later.
We give a simplified example to illustrate the approach. Suppose that two loci have been studied in three population samples, with the following results:

Population 1 
Population 2 
Population 3 
Locus 1 

Allele a 
1% 
5% 
11% 
Allele b 
5% 
8% 
10% 
Locus 2 

Allele c 
3% 
4% 
4% 
Allele d 
2% 
15% 
7% 
For the genotype consisting of a/b at locus 1 and c/d at locus 2, the ceiling principle would assign ceiling values of 11% for allele a, 10% for allele b, 5% for allele c, and 15% for allele d and would apply the multiplication rule to yield a genotype frequency of [2(0.11)(0.10)][2(0.05)(0.15)] = 0.00033, or about 1 in 3,000. Note that the frequency used for allele c is 5%, rather than 4%, to reflect the recommended lower bound of 5% on allele frequencies. Because the calculation uses an upper bound for each allele frequency, it is believed to be conservative given the available data, even if there are correlations among alleles because of population substructure and even for persons of mixed or unknown ancestry. This is more conservative, and preferable, to taking the highest frequency calculated for any of the three populations.
The ceiling principle reflects a number of important scientific and policy considerations:

The purpose of sampling various populations is to examine whether some alleles have considerably higher frequencies in particular subgroups than in the general population—presumably because of genetic drift. It is

matches at such alleles that might be accorded too much evidentiary weight, if the general population frequency were used in calculating the probability of a match.

Determining whether an allele has especially high frequency does not require a very large sample. A collection of 100 randomly chosen people provides a sample of 200 alleles, which is quite adequate for estimating allele frequencies.

Genetically homogeneous populations from various regions of the world should be examined to determine the extent of variation in allele frequency. Ideally, the populations should span the range of ethnic groups that are represented in the United States—e.g., English, Germans, Italians, Russians, Navahos, Puerto Ricans, Chinese, Japanese, Vietnamese, and West Africans. Some populations will be easy to sample through arrangements with blood banks in the appropriate country; other populations might be studied by sampling recent immigrants to the United States. The choice and sampling of the 1520 populations should be supervised by the National Committee on Forensic DNA Typing (NCFDT) described in Chapter 2.
We emphasize, however, that it is not necessary to be comprehensive. The goal is not to ensure that the ethnic background of every particular defendant is represented, but rather to define the likely range of allele frequency variation.

Because only a limited number of populations can be sampled, it is necessary to make some allowance for unexamined populations. As usual, the problem is rare alleles. Genetic drift has the greatest proportional effect on rare alleles and may cause substantial variation in their frequency. Even if one sees allele frequencies of 1% in several ethnic populations, it is not safe to conclude that the frequency might not be fivefold higher in some subgroups.
To overcome this problem, we recommend that ceiling frequencies be 5% or higher. We selected this threshold because we concluded that allele frequency estimates that were substantially lower would not provide sufficiently reliable predictors for other, unsampled subgroups. Our reasoning was based on population genetic theory and computational results, and we aimed at accounting for the effects of sampling error and for genetic drift. The latter consideration was especially important, because it scales inversely with effective population size (i.e., small populations have larger drift) and because it accumulates over generations. The use of such a ceiling frequency would correspond to a lower bound of 5% on allele frequencies. Even if one observed allele frequencies of about 1%, one would guard against the possibility that the frequency in a subpopulation had drifted higher by using the lower bound of 5%. Thus, the lowest frequency attrib
utable to any single locus would be 1/400 (1/20 × 1/20). In any case, it seems reasonable not to attach much greater weight to any single locus.

The ceiling principle yields the same frequency for a genotype, regardless of the suspect's ethnic background, because the reported frequency represents a maximum for any possible ethnic heritage. Accordingly, the ethnic background of an individual suspect should be ignored in estimating the likelihood of a random match. The calculation is fair to suspects, because the estimated probabilities are likely to be conservative in their incriminating power.
Some legal commentators have pointed out that frequencies should properly be based on the population of possible perpetrators, rather than on the population to which a particular suspect belongs.^{17}^{,}^{18} Although that argument is formally correct, practicalities often preclude use of that approach. Furthermore, the ceiling principle eliminates the need for investigating the perpetrator population, because it yields an upper bound to the frequency that would be obtained by that approach.
Some have proposed a Bayesian approach,^{19}^{,}^{20}^{,}^{21} to the presentation of DNA evidence. However, this approach, focusing on likelihood ratios, does not avoid the kinds of population genetic problems discussed in this chapter. The committee has not tried to assess the relative merits of Bayesian and frequentist approaches, because, outside the field of paternity testing, no forensic laboratory in this country has, to our knowledge, used Bayesian methods to interpret the implications of DNA matches in criminal cases.

Although the ceiling principle is a conservative approach, we feel that it is appropriate, because DNA typing is unique in that the forensic analyst has an essentially unlimited ability to adduce additional evidence. Whatever power is sacrificed by requiring conservative estimates can be regained by examining additional loci. (Although there could be cases in which the DNA sample is insufficient for typing additional loci with RFLPs, this limitation is likely to disappear with the eventual use of PCR.) A conservative approach imposes no fundamental limitation on the power of the technique.
DETERMINING ALLELE FREQUENCIES IN A POPULATION DATABANK
For forensic purposes, the frequency of an allele in a laboratory's databank should be calculated by counting the number of alleles that would be regarded as a match with the laboratory's forensic matching rule, which should be based on the empirical reproducibility of the system. This matching rule must account for both the quantitative reproducibility of forensic
measurements in the testing laboratory and the quantitative reproducibility of the population measurements in the laboratory that generated the databank. In addition, the matching rule should reflect that one is making intergel comparisons, which are typically less precise than intragel comparisons.
The above approach is sometimes referred to as ''floating bins," in that one counts the alleles that fall into a "bin" centered on the allele of interest. Most forensic laboratories in this country use the slightly different approach of "fixed bins":22 One first aggregates alleles into a predetermined set of bins. Given an allele in a forensic case, one must then compute its frequency by adding the frequencies of all the bins that contain any alleles that fall within the window specified by the laboratory's forensic matching rule. (All bin frequencies must be added; it is not enough to take the largest of the bin frequencies.) This fixedbin approach is acceptable and might be more convenient in some settings, because examiners need only consult a short table of bin frequencies, rather than search an entire databank.
IMPLICATIONS OF GENETIC CORRELATIONS AMONG RELATIVES
Because of the laws of Mendelian inheritance, the genotypes of biological relatives are much more similar than those of random individuals. Parent and child share exactly one identical allele at every locus, sibs share an average of one identical allele per locus, and grandparent and grandchild share an average of 0.5 identical allele per locus. (Here, identical refers to identity by descent from a common ancestor. Relatives can share additional alleles simply by chance.) These facts have important consequences for DNA typing:

The genetic correlation between relatives makes it possible to carry out parentage and grandparentage testing. Paternity testing with DNA typing is already an active industry in the United States, and grandmaternity testing (with mitochondrial DNA, as well as nuclear genes) has been used in Argentina to reunite families with children who were abducted during the military dictatorship in the 1970s.^{23}^{,}^{24} Relatedness testing involves a question analogous to that asked in identity testing: What is the chance that a randomly chosen person in the population would show the degree of relatedness expected of a relative? The same basic methods of population genetics apply, as discussed earlier.

The ability to recognize relatedness poses a novel privacy issue for DNA databanks. Many states are starting to compile databanks that record patterns of DNA from convicted criminals, but not from other citizens, with the hope of identifying recidivists. When a biological sample is found at

the scene of a crime, its DNA pattern can be determined and compared with a databank. If the unidentified sample perfectly matches a sample in the convictedcriminal databank at enough loci, the probable perpetrator is likely to have been found. However, a different outcome could occur: the sample might match no entry perfectly, but match some entry at about one allele per locus. Depending on the number of loci studied, one could have a compelling case that the source of the sample was a firstdegree relative (e.g., brother) of the convicted criminal whose entry was partially matched. (In practice, four loci would not suffice for this conclusion, but 10 might.) Such information could be sufficient to focus police attention on a few persons and might be enough to persuade a court to compel a blood sample that could be tested for exact match with the sample.
To put it succinctly, DNA databanks have the ability to point not just to individuals but to entire families—including relatives who have committed no crime. Clearly, this poses serious issues of privacy and fairness. As we discuss more fully later (Chapter 5), it is inappropriate, for reasons of privacy, to search databanks of DNA from convicted criminals in such a fashion. Such uses should be prevented both by limitations on the software for search and by statutory guarantees of privacy.

Finally, the genetic correlation among relatives warrants caution in the statistical interpretation of DNA typing results. Our discussion above focused on the probability that a forensic sample would by chance match a person randomly chosen from the population. However, the probability that the forensic sample would match a relative of the person who left it is considerably greater than the probability that it would match a random person. Indeed, two sibs will often have matching genotypes at a locus—they have a 25% chance of inheriting the same pair of alleles from their parents and a 50% chance of inheriting one allele in common (which will result in identical genotypes if their other alleles happen to match by chance). Roughly speaking, the probability of a match at k loci will be approximately (0.25 + 0.5p + 2p^{2})^{k} in the general population, where p is the average chance that two alleles will match (i.e., the apparent homozygosity rate). Using p = 10% per locus for illustration, the probability that two sibs match at two loci would be about 10% and at four loci about 1%. Even for DNA profiles consisting entirely of very rare alleles (p∼0%), the probability that two sibs will match at two loci is about 6% and at four loci about 0.3%. In short, the probability that two relatives will have matching genotypes is much greater than for two randomly chosen persons. Whenever there is a possibility that a suspect is not the perpetrator but is related to the perpetrator, this issue should be pointed out to the court. Relatives of a suspect could be excluded, of course, by testing their genotypes directly, provided that their DNA could be obtained.
IMPLICATIONS OF INCREASED POWER OF DNA TYPING COMPARED WITH CONVENTIONAL SEROLOGY
Questions about the population genetics of DNA markers remain open, but it is clear that the forensic scientist's discriminatory power has been substantially expanded with the advent of DNA markers. Indeed, forensic laboratories are routinely finding cases in which a suspect is included through conventional serology but later excluded through testing with DNA markers. The FBI reports, for example, that some 33% of suspects that match evidence samples according to conventional serology turn out to be excluded through DNA typing (J. W. Hicks, presentation to committee, 1990). Such outcomes represent a dramatic success of the new technology and often lead to the exoneration of innocent suspects.
LABORATORY ERROR RATES
Interpretation of DNA typing results depends not only on population genetics, but also on laboratory error. Two samples might show the same DNA pattern for two reasons: two persons have the same genotype at the loci studied, or the laboratory has made an error in sample handling, procedure, or interpretation. Coincidental identity and laboratory error are different phenomena, so the two cannot and should not be combined in a single estimate. However, both should be considered.
Early in the application of the DNA approach, results from nonblind proficiency studies suggested a high rate of false positives due to laboratory error. One commercial laboratory reported one false match in 50 samples in each of the first two blind proficiency tests conducted by the California Association of Crime Laboratory Directors (CACLD).^{25} The error was attributed to incorrect sample loading in the first test and to mixing of DNA samples (because of reagent contamination) in the second. Another commercial laboratory reported no false positives in the two CACLD tests, but is reported to have made errors related to sample mixup in actual casework in New York v. Neysmith^{26} and in the matter of a dead infant found in the Rock Creek area of Erie, Ill.^{27} A third commercial laboratory made one error in 50 samples in the first CACLD test, but none in later blind trial testing. Estimates of laboratory errors in more recent practice are not available because of the lack of standardized proficiency testing.
Proficiency testing has also revealed important instances of false negatives. In the second CACLD test, the second laboratory cited failed to detect that two samples were 1:1 mixtures from two donors. Similarly, the first laboratory cited failed to detect several 1:1 mixtures and, in one case, reported that a stain from one person was a mixture. Those results raised serious questions about the reliability of interpretation of mixed samples.
Especially for a technology with high discriminatory power, such as DNA typing, laboratory error rates must be continually estimated in blind proficiency testing and must be disclosed to juries. For example, suppose the chance of a match due to two persons' having the same pattern were 1 in 1,000,000, but the laboratory had made one error in 500 tests. The jury should be told both results; both facts are relevant to a jury's determination.
Laboratory errors happen, even in the best laboratories and even when the analyst is certain that every precaution against error was taken. It is important to recognize that laboratory errors on proficiency tests do not necessarily reflect permanent probabilities of falsepositive or falsenegative results. One purpose of regular proficiency testing under standard case conditions is to evaluate whether and how laboratories have taken corrective action to reduce error rates. Nevertheless, a high error rate should be a matter of concern to judges and juries.
Reported error rates should be based on proficiency tests that are truly representative of case materials (with respect to sample quality, accompanying description, etc.). Tests based on pure blood samples would probably underestimate an error rate, and tests based primarily on rare and extremely difficult samples (which might be useful for improving practice) would probably overestimate. Although the CACLD proficiency test was less than ideal (being open, rather than blind, and not requiring reporting of size measurements), the materials appear to have been representative of standard casework.
TOWARD A FIRM FOUNDATION FOR STATISTICAL INTERPRETATION
Statistical interpretation of DNA typing evidence has probably yielded the greatest confusion and concern for the courts in the application of DNA to forensic science. Some courts have accepted the multiplication rule based on the grounds of allelic independence, others have used various ad hoc corrections to account for nonindependence, and still others have rejected probabilities altogether. Some courts have ruled that it is unnecessary even to test allelic independence, and others have ruled that allelic independence cannot be assumed without proof. The confusion is not surprising, inasmuch as the courts have little expertise in population genetics or statistics.
In reaching a recommendation on statistical interpretation of population frequencies, the committee balanced the following considerations:

DNA typing should be able to provide virtually absolute individual identification (except in the case of identical twins), provided that enough loci are studied and that the populationgenetics studies are developed with

appropriate scientific care. The importance of this longterm goal justifies substantial investment in ensuring that the underlying populationgenetics foundation is firm.

Statistical testimony should be based on sound theoretical principles and empirical studies. Specifically, the validity of the multiplication rule in any application depends on the empirical degree of population differentiation for the loci involved. Adequate empirical data must be collected, and appropriate adjustments must be made to reflect the remaining uncertainties.

It is feasible and important to estimate the degree of variability among populations to determine ceiling frequencies for forensic DNA markers and to evaluate the impact of population substructure on genotype frequencies estimated with the multiplication rule.

Careful population genetics is especially important for the development and use of databanks of convictedoffender DNA patterns. Whereas the comparison of an evidence sample to a single suspect involves testing only one hypothesis, the comparison of a sample to an entire databank involves testing many alternative hypotheses. Special attention must thus be paid to the possibility of coincidental matches.
On the basis of those considerations, the committee reached conclusions, which now will be discussed.
Population Studies to Set Ceiling Frequencies
In view of the longterm importance of forensic DNA typing, the populationgenetics foundation should be made as secure as possible. Accordingly, population studies should be promptly initiated to provide valid estimation of ceiling frequencies, as described above. Specifically, variation in allele frequencies should be examined in appropriately drawn random samples from various populations that are genetically relatively homogeneous. The selection, collection, and analysis of such samples should be overseen by the National Committee on Forensic DNA Typing (NCFDT) recommended in Chapter 2.
Given the effort involved in drawing appropriate population samples and the continuing need to type new markers as the technology evolves, the samples should be maintained as immortalized call lines in a cell repository; that would make an unlimited supply of DNA available to all interested investigators. We note that preparation of immortalized cell lines through transformation of lymphoblasts with EpsteinBarr virus is routine and costeffective. Transformation and storage can be handled as contract services offered by existing cell repositories, such as the NIHsupported repository in Camden, N.J.
Such a cell repository would be analogous to that of the international consortium Centre d'Etude du Polymorphisme Humain (CEPH)^{28} created in 1983. It holds some 1,000 samples from 60 reference families, which are used for genetic mapping of human chromosomes. The cell lines have played an essential role in the development of the human geneticlinkage map. The existence of a common resource has also promoted standardization and quality control through the ability to recheck samples. (We should note that the CEPH families themselves are not appropriate for studying population frequencies, because they represent closely related people in a small number of families.)
Substantial benefits will accrue to forensic DNA typing through the availability of a reference collection that can be maintained at an existing facility like the ones at the Coriell Institute of Medical Research and the American Type Culture Collection. Although there is an initial investment in collecting, transforming, and storing cells, the cost will be more than repaid in the broad and continued availability of wellchosen samples for population studies of newly developed DNA typing systems and the ability of investigators to confirm independently the DNA typing that was done in another laboratory.
Reporting of Statistical Results
Until ceiling frequencies can be estimated from appropriate population studies, we recommend that estimates of population frequencies be based on existing data by applying conservative adjustments:

First, the testing laboratory should check to see that the observed multilocus genotype matches any sample in its population database. Assuming that it does not, it should report that the DNA pattern was compared to a database of N individuals from the population and no match was observed, indicating its rarity in the population. This simple statement based on the counting principle is readily understood by jurors and makes clear the size of the database being examined.

The testing laboratory should then calculate an estimated population frequency on the basis of a conservative modification of the ceiling principle, provided that population studies have been carried out in at least three major "races" (e.g., Caucasians, blacks, Hispanics, Asians, and Native Americans) and that statistical evaluation of HardyWeinberg equilibrium and linkage disequilibrium has been carried out (with methods that accurately incorporate the empirically determined reproducibility of band measurement) and no significant deviations were seen. The conservative calculation represents a reasonable effort to capture the actual power of DNA typing while reflecting the fact that the recommended population studies have not yet been undertaken. The calculation should be carried out as follows.
For each allele, a modified ceiling frequency should be determined by (1) calculating the 95% upper confidence limit for the allele frequency in each of the existing population samples and (2) using the largest of these values or 10%, whichever is larger. The use of the 95% upper confidence limit represents a pragmatic approach to recognize the uncertainties in current population sampling. The use of a lower bound of 10% (until data from ethnic population studies are available) is designed to address a remaining concern that populations might be substructured in unknown ways with unknown effect and the concern that the suspect might belong to a population not represented by existing databanks or a subpopulation within a heterogeneous group. We note that a 10% lower bound is recommended while awaiting the results of the population studies of ethnic groups, whereas a 5% lower bound will likely be appropriate afterwards. In the context of the discussion of the ceiling principle, the higher threshold reflects the greater uncertainty in using allele frequency estimates as predictors for unsampled subpopulations.
Once the ceiling for each allele is determined, the multiplication rule should be applied. The race of the suspect should be ignored in performing these calculations.
Regardless of the calculated frequency, an expert should—given with the relatively small number of loci used and the available population data—avoid assertions in court that a particular genotype is unique in the population. Finally, we recommend that the testing laboratory point out that reported population frequency, although it represents a reasonable scientific judgment based on available data, is an estimate derived from assumptions about the U.S. population that are being further investigated.
As an example, suppose that a suspect has genotype A1/A2, B1/B2 at loci A and B and that three U.S. populations have been sampled in the current "convenience sample" manner and typed for these loci. The likelihood of a match for this twolocus genotype would be estimated as follows:
A frequency of 0.001554 corresponds to about 1 in 644 persons. Addition of two loci with about the same information content would yield a fourlocus genotype frequency of about 1 in 414,000 persons. Of course, if fewer than four loci were interpretable, as is common in forensic typing, the estimated genotype frequency would be much higher.
Significantly more statistical power for the same loci will be available when appropriate population studies have been carried out, because the availability of data based on a more rigorous sampling scheme will make it unnecessary to take an upper 95% confidence limit for each allele frequency nor to put such a conservative lower bound (0.10) on each allele frequency. Assuming that the population studies do not reveal significant substructure, the 5% lower bound recommended earlier should be used.
Finally, once appropriate population studies have been conducted and ceiling frequencies estimated under the auspices of NCFDT, population frequency estimates can be based on the ceiling principle (rather than the modified ceiling principle discussed above). Such calculations can never be perfect, but we believe that such a foundation will be sufficient for calculating frequencies that are prudently cautious—i.e., for calculating a lower limit of the frequency of a DNA pattern in the general population. In addition, new scientific techniques (e.g., minisatellite repeat codings^{29}) are being and will be developed and might require reexamination by NCDFT of the statistical issues raised here.
Our recommendations represent an attempt to lay a firm foundation for DNA typing that will be able to support the increasing weight that will be placed on such evidence in the coming years. We recognize that a wide variety of methods for population genetics calculations have been used in previous cases—including some that are less conservative than the approach recommended here. We emphasize that our recommendations are not intended to question previous cases, but rather to chart the most prudent course for the future.
Openness of Population Databanks
Any population databank used to support forensic DNA typing should be openly available for reasonable scientific inspection. Presenting scientific conclusions in a criminal court is at least as serious as presenting scientific conclusions in an academic paper. According to longstanding and wise scientific tradition, the data underlying an important scientific conclusion must be freely available, so that others can evaluate the results and publish their own findings, whether in support or in disagreement. There is no excuse for secrecy concerning the raw data. Protective orders are inappropriate, except for those protecting individual's names and other identifying information, even for data that have not yet been published or for data
claimed to be proprietary. If scientific evidence is not yet ready for both scientific scrutiny and public reevaluation by others, it is not yet ready for court.
Reporting of Laboratory Error Rates
Laboratory error rates should be measured with appropriate proficiency tests and should play a role in the interpretation of results of forensic DNA typing. As discussed above, proficiency tests provide a measure of the falsepositive and falsenegative rates of a laboratory. Even in the best of laboratories, such rates are not zero.
A laboratory's overall rate of incorrect conclusions due to error should be reported with, but separately from, the probability of coincidental matches in the population. Both should be weighed in evaluating evidence.
SUMMARY OF RECOMMENDATIONS
Although mindful of the controversy concerning the population genetics of DNA markers, the committee has decided to assume that population substructure might exist for currently used DNA markers or for DNA markers that will be used in the future. The committee has sought to develop a recommendation on the statistical interpretation of DNA typing that is appropriately conservative, but at the same time takes advantage of the extraordinary power of individual identification provided by DNA typing. We have sought to develop a recommendation that is sufficiently robust, but is flexible enough to apply not only to markers now used, but also to markers that might be technically preferable in the future. We point out that in using conservative numbers in the interpretation of DNA typing results, any loss of statistical power is often offset through typing of additional loci. The committee seeks to eliminate the necessity to consider the ethnic background of a subject or of the group of potential perpetrators.

As a basis for the interpretation of the statistical significance of DNA typing results, the committee recommends that blood samples be obtained from 100 randomly selected persons in each of 1520 relatively homogeneous populations; that the DNA in lymphocytes from these blood samples be used to determine the frequencies of alleles currently tested in forensic applications; and that the lymphocytes be "immortalized" and preserved as a reference standard for determination of allele frequencies in tests applied in different laboratories or developed in the future. The collection of samples and their study should be overseen by a National Committee on Forensic DNA Typing.

Sample collection and immortalization should be supported by feder

al funds, in view of the benefits for law enforcement in general and for the convictedoffender databanks in particular.

The ceiling principle should be used in applying the multiplication rule for estimating the frequency of particular DNA profiles. For each allele in a person's DNA pattern, the highest allele frequency found in any of the 1520 populations or 5% (whichever is larger) should be used.

In the interval (which should be short) while the reference samples are being collected, the significance of the findings of multilocus DNA typing should be presented in two ways: 1) If no match is found with any sample in a total databank of N persons (as will usually be the case), that shouldbe stated, thus indicating the rarity of a random match. 2) In applying the multiplication rule, the 95% upper confidence limit of the frequency of each allele should be calculated for separate U.S. "racial" groups and the highest of these values or 10% (whichever is the larger) should be used. Data on at least three major "races" (e.g., Caucasians, blacks, Hispanics, Asians, and Native Americans) should be analyzed.

Any population databank used to support DNA typing should be openly available for scientific inspection by parties to a legal case and by the scientific community.

Laboratory error rates should be measured with appropriate proficiency tests and should play a role in the interpretation of results of forensic DNA typing.
REFERENCES
1. Devlin B, Risch N, Roeder K. No excess of homozygosity at loci used for DNA fingerprinting, Science. 249:14161420, 1990.
2. Cohen JE, Lynch M, Taylor CE, Green P, Lander ES. Forensic DNA tests and HardyWeinberg equilibrium. (Comment on Devlin et al. Science. 249:14161420, 1990.) Science. 253:10371039, 1991.
3. Devlin B, Risch N, Roeder K. (Response to Cohen et al. Science. 253:1(1371039, 1991). Science. 253:10391041, 1991.
4. Lander ES. Research on DNA typing catching up with courtroom application. (Invited Editorial.) Am J Hum Genet. 48:819823, 1991.
5. Wooley JR. A response to Lander: The courtroom perspective. Am J Hum Genet. 49:892893, 1991.
6. Caskey CT. Comments on DNAbased forensic analysis. (Response to Lander. Am J Hum Genet. 48:819, 1991.) Am J Hum Genet. 49:893905, 1991,
7. Chakraborty R. Statistical interpretation of DNAtyping data. (Letter.) Am .I Hum Genet. 49:895897, 1991.
8. Daiger SP. DNA fingerprinting. (Letter.) Am J Hum Genet. 49:897, 1991.
9. Lander ES. Lander reply. (Letter.) Am J Hum Genet. 49:899903, 1991.
10. Lewontin RC, Hartl DL. Population genetics in forensic DNA typing. Science. 254:17451750, 1991.
11. Chakraborty R, Daiger SP. Polymorphisms at VNTR loci suggest homogeneity of the white population of Utah. Hum Biol. 63:571588, 1991.
12. Chakraborty R, Kidd K. The utility of DNA typing in forensic work. Science. 254:17351739, 1991.
13. Risch N, Devlin B. On the probability of matching DNA fingerprints. Science. 255:717720, 1992.
14. Weir B. Independence of VNTR alleles defined as fixed bins. Genetics, in press.
15. Lewontin RC. The apportionment o£ human diversity. Evol Biol. 6:381398, 1972.
16. Deka R, Chakraborty R, Ferrell RE. A population genetic study of six VNTR loci in three ethnically defined populations. Genomics. 11:8392, 1991.
17. CavalliSforza LL, Bodmer WF. The genetics of human populations. San Francisco: W.H. Freeman, 1971.
18. Lempert R. Some caveats concerning DNA as criminal identification evidence: with thanks to the Reverend Bayes. Cardozo Law Rev. 13:303341, 1991.
19. Evett I, Werrett D, Pinchin R, Gill P. Bayesian analysis of single locus DNA profiles. Proceedings of the International Symposium on Human Identification 1989. Madison, Wisconsin: Promega Corp., 1991).
20 Berry DA. Influences using DNA profiling in forensic identification and paternity cases. Star Sci. 6:175205, 1991.
21. Berry DA, Evett IW, Pinchin R. Statistical inferences in crime investigation using DNA profiling. J Royal Star Soc. [Series C  Applied Statistics], in press.
22. Budowle B, Giusti AM, Waye JS, Baechtel FS, Fourney RM, Adams DE, Presley LA, Deadman HA, Monson KL. Fixedbin analysis for statistical evaluation of continuous distributions of allelic data from VNTR loci, for use in forensic comparisons. Am J Hum Goner. 48841855, 1991.
23. DiLonardo AM, Darlu P, Baur M, Orrego C, King MC. Human genetics and human rights: identifying the families of kidnapped children. Am J Forensic Med Pathol. 5:339347, 1984.
24. King MC. An application of DNA sequencing to a human rights problem. Pp. 117132 in: Friedmann T, ed. Molecular Genetic Medicine. Vol. 1. New York: Academic Press, 1991.
25. California Association of Crime Laboratory Directors, DNA Committee, Reports to the Board of Directors: 1, August 25, 1987; 2, November 19, 1987; 3, March 28, 1988, 4, May 18, 1988; 5, October 1, 1988; 6, October 1, 1988.
26. Lander ES. DNA fingerprinting on trial. (Commentary.) Nature. 339:501505, 1989.
27. The fallibility of forensic DNA testing: of proficiency in public and private laboratories. Part I. The private sphere. Sci Sleuth Rev. 14121:10, 1990.
28. Dausset J, Cann H, Cohen D, Lathrop M, Lalouel JM, White R, Centre d'Etude du Polymorphisme Humain (CEPH): collaborative genetic mapping of the human genome. Genomics. 6:575577, 1990.
29. Jeffreys A, MacLeod A, Tamaki K, Nell D, Monckton D. Minisatellite repeat coding as a digital approach to DNA typing. Nature. 354:204209, 1991.