National Academies Press: OpenBook

Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field (2023)

Chapter:1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges

« Previous: SECTION I: PAST AND CURRENT USE OF POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page21
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page22
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page23
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page24
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page25
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page26
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page27
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page28
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page29
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page30
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page31
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page32
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page33
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page34
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page35
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page36
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page37
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page38
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page39
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page40
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page41
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page42
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page43
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page44
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page45
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page46
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page47
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page48
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page49
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page50
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page51
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page52
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page53
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page54
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page55
Suggested Citation:"1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges." National Academies of Sciences, Engineering, and Medicine. 2023. Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field. Washington, DC: The National Academies Press. doi: 10.17226/26902.
×
Page56

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

1 Population Descriptors in Human Genetics Research: Genesis, Evolution, and Challenges THE STUDY OF HUMAN GENETIC VARIATION Our social conceptions of race and ethnicity do not match the underlying biological and genetic variation within our species, and we should never confuse the things that were created for the purposes of oppressing people with the nature of that biological and genetic variation. —Joseph Graves Jr., testimony to the committee in a public session on April 4, 2022 Genetics is the study of heredity, specifically the mechanisms by which traits or characteristics, known as phenotypes, are transmitted from one generation to the next (King et al., 2014). It is a long-standing observa- tion that no two members of a species, except identical twins or clones, have identical features (Strickberger, 1985), spurring the development of a science that sought to understand how individual traits vary, how this variation is generated, and how it is transmitted to the next generation. This raises the question of how different members of a species can share individual traits, for example, a particular eye color. What is the biological basis of this sharing and its transmission, and is this biological basis the same or different across members of the same species? Since the rediscovery of gene transmission rules in 1900 there has remained a debate on whether such differences and commonalities are from genes, environments, or both, and when there is an effect of genes, whether it stems from one or many genes (Provine, 1971; Provine and Russell, 1986). In recent times, epigenetic 21 PREPUBLICATION COPY—Uncorrected Proofs

22 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH and stochastic variation,1 beyond genetic variation, have been elaborated upon as other causes of phenotypic variation (Panzeri and Pospisilik, 2018). Human genetics, since its origin in 1900 with the discovery of inter- individual differences in blood transfusions by Karl Landsteiner, has been exceptional among the genetic and genomic sciences in that it focuses on existing groups of individuals to examine heredity rather than only on the offspring of controlled crosses, as is possible in other species. Although the study of trait transmission in human families is widespread and has been successful for rare conditions that follow Mendelian inheritance patterns, family studies are uncommon for common continuous (metrical) pheno- types, whose inheritance patterns are complex or non-Mendelian (NIH, 2007). A more efficient and generalizable study paradigm has, therefore, been to compare and contrast groups of individuals with and without a specific trait feature, such as persons with hypertension versus persons with normal blood pressure. Specifically, what is compared are the frequency dif- ferences of a specific genetic variant, this variant being one of at least two forms (alleles) of a gene (Manolio et al., 2009). Over the past two decades, technological advances have enabled the identification and comparison of genomic sites (base pairs) across the whole genome,2 both within and outside genes. Regardless of where they are sam- pled in the world, two human genomes differ at approximately 1 in 1,000 genomic sites on average or a total of 3 million positions (Sachidanandam et al., 2001). While the vast majority of non-ancestral alleles are rare (e.g., found  at frequencies of below 1% in population samples), most of the variants that differ between two genomes are common and often found in multiple regions of the world (Biddanda et al., 2020; Rosenberg, 2021). The frequency of a variant depends on when it arose, the demographic history of humans who carried it, and whether it affects fitness. Across the globe, geneticists have catalogued tens of millions of such variants (1000 Genomes Project Consortium, 2015). Most of the common genetic variants existing across human populations arose as early humans evolved within Africa and then migrated across Africa and the rest of the world (Chakravarti, 2014). This variation is a shared human legacy shap- ing, and in rare circumstances determining, human traits. Studying these variants, in say, hypertensives versus normotensives, can identify variants that are correlated with this trait difference. It takes substantially greater effort to demonstrate whether the detected variants are themselves biologi- 1 Epigenetic variation arises from chemical modification of DNA in body cells (soma), that can modify the functions of genes; not being a permanent DNA change means these are not transmitted to the next generation. Stochastic variation is alteration of gene function from random processes in cells, that are neither genetic nor epigenetic (Angers et al., 2020). 2 The totality of an individual’s DNA is known as their genome. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 23 cal causes of that trait difference or are simply markers that are correlated with the shared history or environment of the individuals studied. The human population is very young in evolutionary time; when hu- mans are grouped by geographic origin, between-group differences are substantially smaller than within-group differences. Two other historical aspects need to be considered. First, although humans have migrated into new ecologies ever since spreading within and beyond Africa, there has been extensive ancient and recent movement and mixing of peoples both within, and across continents, which has affected global patterns of genetic variation (Cavalli-Sforza and Piazza, 1996; Chakravarti, 2014). Second, many humans also have residual ancestry from long-extinct hominids such as the Neanderthals and Denisovans, its extent varying across the globe (Pääbo, 2014). Human genetic variation is the result of many forces, historical, social and biological, and cannot be represented by any single variable. Addi- tionally, science is not the only, and sometimes not even the major, source of human origin stories. Each human culture, adapting to its lands and environments over time, has developed its own narrative of its emergence, stories that are rich, powerful, and deeply meaningful to it. The question today is, with all of this knowledge within reach, how should genetics stud- ies of human phenotypes be designed and conducted? The existence of genetic variation across geographic space does not mean that it is clustered in the distinct groups that notions of race pre- sume. To be sure, if group boundaries on humans are imposed across the globe, thus inventing 2 or 3 or 20 “races,” average differences in allele fre- quencies between geographically distant groupings will be discerned. The existence of such genetic differences in the aggregate, however, is no proof that the boundaries applied were natural, objective, or otherwise geneti- cally meaningful in the first place. Too often, statistical findings of genetic differences between groups are misinterpreted as groupings determined by significant biological/genotypic characteristics as opposed to simply reflect- ing widespread social presumptions about who is similar to whom based on shared physical/phenotypic characteristics. So, how should individuals and populations be described in genetics and genomics studies? To answer this question, it is crucial to reflect on what such studies aim to accomplish in the first place. WHAT IS A STUDY USING GENETIC INFORMATION TRYING TO ACCOMPLISH? Genetic information is assessed directly or indirectly from the genome and can be defined narrowly or broadly. Narrowly defined genetic informa- tion is based on data from direct measurements of DNA, RNA, proteins, or PREPUBLICATION COPY—Uncorrected Proofs

24 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH epigenetic signatures such as DNA methylation, whereas broadly defined genetic information refers to phenotype information that includes indirect assessments of function (e.g., peripheral blood count) or form (e.g., observ- able traits such as eye or hair color) influenced by the genome. The advent of the Human Genome Project (HGP) now enables studies of all genes simultaneously using sequence variants across the entire genome. Genetics research typically studies the role of a variant, gene, or small number of genes in an outcome of interest, whereas approaches that interrogate the DNA sequence or epigenetic signatures across the entire genome are known as genomics studies. Both genetics and genomics studies are today common in biomedical research on humans. Researchers use human genetic information to address a wide variety of questions about history and evolution; the development and function of cells, tissues, and organs; the biology of the human genome; and the risks and mechanisms underlying rare conditions,3 common and rare diseases,4 and heritable traits (e.g., height, blood glucose). Genetics and genomics studies are conducted by scientists from a broad range of disciplines (e.g., human and medical geneticists, physicians in various medical specialties, ge- netic epidemiologists, forensic scientists, evolutionary biologists, biostatisti- cians, demographers, anthropologists, other social scientists) with different experiences, expertise, and biases. Genetic information is increasingly easy and inexpensive to produce, and tools to analyze genetic information have become widely available and straightforward to use. Expectations of researchers and the lay public about discoveries made by genetics studies have changed substantially over time. For decades, discovery that a condition or trait had a genetic basis, or more recently, the identification of the specific genetic basis of a condition or trait (e.g., the gene underlying a Mendelian condition such as cystic fibrosis) satisfied both the scientific community and public. However, over the past 10 years, there has been a growing expectation that genetics studies deliver informa- tion that can be used for improving health (e.g., accurately estimating the risk of a common disease or accelerating the development of novel treat- ment approaches and therapeutics) or precisely answering questions about population origins, migration patterns, or the effect of past environmental factors as forces of natural selection. Moreover, information from genetics, 3 In the United States, the Orphan Drug Act defines a rare disease or condition as one that affects less than 200,000 people (21 C.F.R. §316.20(b)); many rare conditions are so called Mendelian conditions, which means that changes in a single gene are necessary and sufficient to cause the condition (Chial, 2008). 4 Common, beyond frequency, refers to conditions that are variously called polygenic (many genes), multifactorial (many causes), or complex, the latter implying that both genes and environment are causal factors. Examples of such traits are cardiovascular disease, diabetes, and obesity. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 25 genomics, and sociogenomics studies is being used in new ways for finan- cial, political, or legal gain (Roberts, 2011; Bliss, 2020). Genetics has proven to be a powerful paradigm in medicine, from ex- plaining individual differences in medical outcome (e.g., ABO blood types for blood transfusion and the human leukocyte antigen (HLA) types for organ transplant compatibility), to explaining disease pathogenesis (e.g., in persons with rare Mendelian conditions, such as Marfan syndrome), to identifying therapeutic targets from knowledge of the genes involved (e.g., PCSK9 inhibitors for reducing serum cholesterol). The field has also trans- formed researchers’ knowledge of where and how modern humans arose and migrated across the globe. Yet, genetics also has substantial limitations. Virtually all conditions and traits are the result of both genetic and environmental factors as well as stochastic or nondeterministic influences. The effect of these nongenetic factors varies across different conditions and traits with some conditions strongly influenced (e.g., susceptibility to infectious disease, obesity, cardio- vascular disease) and others only weakly so (e.g., achondroplasia, fragile-X syndrome, Huntington’s disease). The effect of nongenetic factors falls be- tween these extremes for most genetic conditions, and the degree to which nongenetic factors influence a condition or trait is itself influenced by the genetic architecture of the condition (e.g., the type, number, and strength of the genetic variants involved), risk genotype(s) (e.g., the variants in an individual’s personal genome), and the effect of genetic modifiers (e.g., other genetic variants with indirect influence on the principal genes). Identifying nongenetic factors that influence a genetic condition or trait is challenging, and for most conditions they, therefore, remain unknown. Moreover, with- out careful study design, the effects of environmental and genetic factors can often be conflated. It should be further noted that although genetic variation can be criti- cal to identifying disease mechanisms and interindividual trait differences, human biological processes are universal. For example, everywhere in the world, the same ocular biology and neural pathways underlie human vision (Chakraborty et al., 2020). The ingestion of lead produces the same bio- chemical effects in human bodies whether they are in Alaska or Zambia (Fu and Xi, 2020). Vaccines for coronavirus disease 2019 (COVID-19) work via the same immunological mechanisms in Peru or Poland (Sadarangani et al., 2021). While environmental factors as well as inherited local genetic variants may influence these processes, their physiological mechanisms are essentially the same. In other words, genetic variation is used to identify fundamental mechanisms that are biologically universal among humans— often even relevant to other species, including ones used as model systems (e.g., mice)—to understand human biology and medicine. PREPUBLICATION COPY—Uncorrected Proofs

26 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH CLASSIFICATION OF GENOMICS STUDY TYPES There is no one kind of genetics or genomics study: thus, it is helpful to consider the various classes of such studies, some of which have a long history of use while others are more recent. Such a categorization is also practical because each study type, with its different questions in mind, recruits study participants differently, and therefore may require tailored guidance to researchers on how to improve the use of population descrip- tors; in other words, there is no “one size fits all” solution. The committee considers seven such archetypal studies, which are by no means an exhaus- tive list but serve to illustrate the different usages of population descriptors and highlight some of the considerations that should come into play in choosing a classification scheme for a study: 1. Gene discovery for Mendelian traits: studies aimed at identifying the genetic basis (e.g., pathogenic variant) underlying Mendelian disorders or traits. • For a review, see Chong et al., 2015. Examples include the discovery of the cystic fibrosis gene (Kerem et al., 1989), mu- tations that cause Kabuki syndrome (Ng et al., 2010), or the genetic basis of a trait such as lactose intolerance (Enattah et al., 2002). 2. Prediction for Mendelian traits: approaches that rely on the pres- ence of a specific genotype to predict risk for or incidence of a Mendelian disease or specific outcome, as done in research settings or the clinical context of prenatal or newborn screening or pre‐ symptomatic testing. • Examples are newborn screening for phenylketonuria (PKU), sickle cell disease, and others (Watson et al., 2022) or analy- sis of BRCA1/2 mutation-associated tumors (e.g., Shah et al., 2022). 3. Gene discovery for complex and polygenic traits: studies aiming to identify genetic variants associated with quantitative traits or complex disease risk, as done in genome-wide association studies (GWAS). • For examples, see case-control studies to identify genetic vari- ants associated with disease risk for type 1 diabetes or Crohn’s disease (e.g., Burton et al., 2007) or GWAS studies of height (Lettre et al., 2008). 4. Prediction for complex and polygenic traits: studies that aim to make probabilistic predictions about individual disease risk or traits based on genomic data. • Such studies often use “polygenic scores” (also called polygenic risk scores or polygenic indexes e.g., Khera et al., 2018). PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 27 5. Elucidation of molecular, cellular, or physiological mechanisms: studies using related or unrelated participants or cell lines derived from their biological tissues to understand molecular, cellular, or physiological mechanisms. • Examples include the study of the genetic basis of Huntington’s disease (e.g., Kremer et al., 1995) or the cellular mechanism of SARS-CoV-2 infection (e.g., Daniloski et al., 2021). 6. Studies of health disparities with genomic data: elucidation of the role of genetic and environmental effects in how social disadvan- tage leads to health disparities. • Examples include kidney disease risk in Hispanics/Latinos and biological aging in children (Kramer et al., 2017; Raffington et al., 2021; Raffington and Belsky et al., 2022; and West et al., 2017). • It should be noted that not all health disparities studies with genomic data require the use of descent-associated population descriptors. See, for example, a study of genetics and neighbor- hood effects on health outcomes (Belsky et al., 2019). 7. Studies of human evolutionary history: Inferences about hu- man evolutionary history using samples of related or unrelated participants. • One example is the study of genomic history of African popu- lations (Fan et al., 2019). For another example, see Waldman et al., 2022. A series of population descriptors that could be tailored to specific types of genetics studies will be examined in Chapter 2, and best practices for the use of population descriptors will be discussed in Chapter 5 for each of the seven study types.5 FEATURES OF HUMAN GENOME VARIATION By 2001, when the draft sequence of the human genome was reported (Lander et al., 2001; Venter et al., 2001), the tools developed to sequence the human genome and the resulting data were already transforming both how genetics research could be done and enabling unprecedented character- ization of patterns of human genetic variation (Aach et al., 2001; Birney et al., 2001; Lander et al., 2001). The sequence of the first reference genome was quickly followed by a number of efforts to characterize human genetic diversity such as the International Haplotype Map Project (HapMap), 5 The discussion of other types of genetics and genomics studies, such as those in forensics and genealogy reconstructions, are not a part of this study (see statement of task in Box 1-1). PREPUBLICATION COPY—Uncorrected Proofs

28 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Genome Aggregation Database (gnomAD), and the 1000 Genomes Project (1000G). These efforts, and the subsequent debates over the sampling and applicability of a limited number of reference populations, led to grappling with the use of population descriptors, specifically race, ethnicity, and an- cestry. These projects would confirm the high levels of genetic similarity among humans across the globe and the poor correspondence between racialized groups and the distribution of human genetic variation (Lewontin 1972). In brief, scientists’ current understanding of the distribution of hu- man genetic variation and its evolutionary origins is that: • Anatomically modern humans arose somewhere in the African continent approximately 300,000 years ago (Hublin et al., 2017). Their descendants expanded across much of the rest of the world within the past 100,000 years, giving rise to all modern humans today (Mallick et al., 2016; Nielsen et al., 2017). • Mating between members of human groups occurred repeatedly throughout evolution, from interbreeding that occurred with ar- chaic forms of humans (e.g., Neanderthal and Denisova) (Nara- simhan et al., 2019; Pääbo, 2014), to gene flow between various human groups throughout the world (e.g., Gomez et al., 2014, Reich, 2018). • Allele frequencies over time and space diverge gradually, owing to random fluctuations (known as genetic drift) and changes caused by natural selection, and are made more similar by gene flow (No- vembre and Di Rienzo, 2009). As a result of the relatively recent common origin of modern humans and the repeated mixing of groups, the alleles carried by people living all over the globe show little differentiation: • Levels of genetic diversity in humans are low compared to those of many other species: pairs of chromosomes differ only at approximately 1 in 1,000 sites in humans (Leffler et al., 2012), in contrast to 1 in 100 sites in the fruit fly Drosophila melanogaster and 3 in 1,000 sites in the chimpanzee (Leffler et al., 2012). • Alleles that are common in one population are typically shared across multiple populations, as they tend to be older. Variants that are rare in a population tend to be recent and are usu- ally found much more locally—for example if very rare, only among close relatives (Biddanda et al., 2020). • Human allele frequencies tend to vary continuously with geo- graphic distance (isolation by distance), with slightly larger differ- ences seen across long-term inhibitors of migration such as oceans PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 29 or mountains (Rosenberg, 2021). These geographic boundaries do not correspond to racial groupings. • Even when differences at any given locus are subtle, information from many loci can be aggregated to make each human genome recognizably unique and to assess an individual’s genetic similarity to others (e.g., Figure 2, Novembre and Peter, 2016). This similar- ity measure is often paired with geographic or other labels from genetically similar individuals in order to assign the individual to a single or multiple groupings (e.g., a method might assign a single geographic population designation or model an individual as a mixture of different “ancestry clusters”; see Chapter 2).   • In some regions of the genome, allele frequencies also vary geo- graphically because a variant contributes to adaption to past or present local environments (Novembre and Di Rienzo, 2009). Where selection on an individual locus was strong and sustained over hundreds of generations, these allele frequency differences can be larger than is typical in the genome. In humans, there are very few cases where one allele is present at very high frequency across a broad scale geographic region but not shared elsewhere in the world, besides at loci such as those that contribute to infectious disease susceptibility (e.g., the Duffy null allele at the Duffy gene) (Hamblin and Di Rienzo, 2000).  POPULATION CLASSIFICATION SCHEMES IN GENETICS AND GENOMICS RESEARCH The Origins of Describing Individuals and Populations in Human Genetics Human genetics research was propelled by the discovery of interindi- vidual differences in blood transfusions by Karl Landsteiner in the early 1900s, and his subsequent demonstration that the bloods of humans can be classified into what we now call the A, B, AB, and O groups (Landsteiner, 1961). Importantly, as early as 1901–1903, he had also suggested that the characteristics that determine blood groups were inherited (Nobel Prize Outreach AB, 2022). Shortly after, in 1910–1911, Emil von Dungern and Ludwik Hirschfeld showed, using families of the teaching staff of Heidel- berg University, that Landsteiner’s normal human serological features were inherited in a Mendelian pattern (von Dungern and Hirschfeld, 1962), thus making ABO the first known common human genetic trait (Bugert and Klüter, 2012). In 1919, Ludwik and Hanka Hirschfeld recorded the ABO blood types in more than 8,000 soldiers and refugees on the Macedonian front during PREPUBLICATION COPY—Uncorrected Proofs

30 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH World War I (Hirschfeld and Hirschfeld, 1919). These studies were the first to demonstrate differences in the frequencies of blood group alleles (variants), in mostly unrelated individuals, across different populations whom the authors refer to as “races” (Hirschfeld and Hirschfeld, 1919). The population descriptors used were highly varied and included a mix of continental, geographic, and religious labels (e.g., Europeans, Indians, and Jews). This study on a very large sample of heterogeneous individuals, which also showed geographic patterns of east–west and north–south blood group allele frequency variation, became highly influential in anthropology and human genetics by suggesting widespread allele frequency differences in human populations (Hirschfeld and Hirschfeld, 1919). By 1977, Arthur E. Mourant, a British hematologist and geneticist, had updated his com- pilation titled The Distribution of the Human Blood Groups to include genotype and allele frequency data from hundreds of thousands of samples collected across the globe (Mourant, 1977). These samples were also identi- fied by a dizzying array of terms meant to signify their origin. The choices of population descriptors used by 20th century scientists were consistent with a longstanding European and U.S. belief that human beings are naturally divided into biologically distinguishable races (Gos- sett, 1997; Hammonds and Herzig, 2009; Keel, 2018; Painter, 2010). The categorization of human beings into races was integral to settler colonialism and slavery, and simultaneously became foundational to scientific think- ing in the United States (Frederickson, 2002; Higginbotham, forthcoming; Roberts, 2011; Smedley and Smedley, 2012; TallBear, 2013). For example, prominent nineteenth century scientists such as Harvard biologist Louis Agassiz and Samuel Morton, president of the Academy of Natural Sciences in Philadelphia, promoted the white supremacist view that human beings were divided into unequal racial groups that descended from separate ori- gins (Gould, 1996). These ideas continued to influence U.S. science after the Civil War and persisted through the eugenics era in the 20th century into the 21st century (Graves, 2001; Reardon, 2009; Roberts, 2011; Zuberi, 2003). Some of the harmful scientific and societal practices of the eugenics era are described in a recently released statement and report6 from the American Society of Human Genetics, acknowledging and apologizing for the involve- ment of some of its early leaders in the American eugenics movement. Classifying people by race has been essential to institutional racism and tightly interwoven into political, economic, legal, scientific, and social practices in the United States. Race was “baked into” the very first instru- ments of governance in the United States, from its first census to its first law governing who could become a citizen (both in 1790). Under the Jim Crow 6 See https://www.ashg.org/wp-content/uploads/2023/01/Facing_Our_History-Building_an_ Equitable_Future_Final_Report_January_2023.pdf (accessed January 25, 2023). PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 31 regime, which extended from the end of Reconstruction to the Supreme Court’s 1954 Brown v. Board of Education decision and the passage of federal civil rights legislation, many states maintained rigid racial classifica- tion systems to help enforce de jure segregation (Dorr, 2008; Pascoe, 2009). The civil rights movement of the 1950s through the 1970s also shaped the scientific use of race, ethnicity, and ancestry as descriptors. New federal legislation including the Civil Rights Act of 1964, the Voting Rights Act of 1965, the Fair Housing Act of 1968, and the Home Mortgage Disclosure Act of 1975, required many federal agencies to monitor discrimination and to do so meant classifying people, typically into racial and ethnic catego- ries. In 1977, the OMB issued Statistical Directive 15 – Race and Ethnic Standards for Federal Statistics and Administrative Reporting to standard- ize federal agencies’ recordkeeping, collection, and presentation of data on race and ethnicity (equated with Hispanic origin), including its use on the census (OMB, 1977, 1997). OMB Directive 15 has had widespread effects because its racial and ethnic categories have been used widely across government and the private sector, including by many scientific researchers in genetics and genomics (Kahn, 2006; Nobles, 2000). This is in part because the NIH and other federal research agencies require OMB-based racial and ethnic information collection in funding proposals and applications for purposes of inclusion (Epstein, 2007). Although OMB Directive 15 is clear that race is a social— and not a biological—classification, this category is frequently applied as if it were biological. Thus, the institutional demand for biomedical research to become more inclusive has led to many U.S.-based genetics and genomics research projects collecting OMB ethnic and racial category-based informa- tion on study participants, including measurement of biological differences between these groups (Epstein, 2007). Race and racism also continue to figure in genomics research because many scientists hold the view that race is a biological category or that race is a useful proxy for human biological variation. Scientists not only learn biological concepts of race in their professional training but also, like the rest of U.S. society, are exposed from the earliest ages to racial concepts and practices (Morning, 2011). Racial taxonomy becomes a familiar way of seeing and describing the world, one that is taken for granted and presumed to be “natural” and objective (Hirschfeld, 1996; Hirschfeld and Gelman, 1997; Obasogie, 2010; Van Ausdale and Feagin, 1996). This framework has made its way unnoticed into the design and execution of scientific research. For example, a study by Fujimura and Rajagopalan (2011) highlights how despite the development of new technologies focused on genetic similar- ity that would preclude the need for pre-labeling populations, the use of terminology such as “ancestry” or “shared ancestry” in genetic analyses can lead to slippage toward racial concepts. In some cases, certain tools PREPUBLICATION COPY—Uncorrected Proofs

32 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH and practices currently used in genomics research can blur the differences between ancestry and race (Fullwiley, 2008). In their study of geneticists’ population labeling practices, Panofsky and Bliss (2017) found “persistent and indiscriminate blending of classification schemes” that has made the definition of population in genetics “more ambiguous rather than standard- ized over time (p. 59).” The outsized influence of U.S.-based research on scientific practice worldwide, moreover, means that Americans’ widespread exposure to racial thought, discourse, and institutions is transmitted to scientists around the globe. A complete history of the use of population descriptors in human genet- ics and the early and persistent use of race in science is beyond the scope of this report and outside of the committee’s statement of task. The brief sum- mary provided here is meant only to emphasize several important points. First, early studies, like those by the Hirschfelds, used population descrip- tors of many categories—racial, continental, ethnic, religious, and more—in ways that imply an interchangeableness among them when none may have existed. Second, the biological concept of race in humans was created to support settler colonialism and slavery, and has always been entangled with racist institutions, policies, and practices. The use of race as a population descriptor in scientific research therefore has caused incalculable confusion and harm. Third, the federal requirement to use OMB categories in many contexts perpetuates the institutional racism, confusion, and harm caused by false concepts of race as a biological grouping. Fourth, racist concepts of race that are deeply embedded in science and U.S. society more broadly continue to affect scientific thinking and research. Scientists must critically examine the underlying assumptions about race—and human commonality and difference—that shape their research studies. For a more complete his- tory of population descriptors in genetics, and for a deeper understanding of the history of the race concept and the intersections of race, science, and society, see the list of references in Box 1-1. Local and Global Contexts The conceptualization of “American” as an equivalent to being from the United States has led to the use of derived terms such as African American, European American, and Native American. This terminology has been adopted by the genetics community and applied in many population genomics studies (Bryc et al., 2015; Kidd et al., 2012; Price et al., 2007; Ruiz-Linares et al., 2014; Williamson et al., 2007). However, outside the United States these terms do not have the same context and may imply dif- ferent meanings, as the adjective American has a geographic reach across the North and South American continents, meaning the Americas, rather than a national one (the United States). It is important to move away from PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 33 Box 1-1 Race, Science, and Society: A Reference List Anderson, Margo J. 1988. The American Census: A Social History. New Haven: Yale University Press. Byron, Gay L. 2002. Symbolic Blackness and Ethnic Difference in Early Christian Literature. London: Routledge. Carter, J. Kameron. 2008. Race: A Theological Account. Oxford University Press. Dorr, Gregory M. 2008. Segregation’s Science: Eugenics and Society in Virginia. Charlottesville: University of Virginia Press. Fredrickson, George M. 2002. Racism: A Short History. Princeton, NJ: Princeton University Press. Goodman, Alan H., Moses, Yolanda T., Jones, Joseph L. 2019. Race: Are We So Different?. Wiley-Blackwell. Gossett, Thomas F. 1997. Race: The History of An Idea in America. New York: Oxford University Press. Gould, Stephen J. 1996. The Mismeasure of Man. New York: W.W. Norton. Graves, Joseph L., Jr. 2001. The Emperor’s Clothes: Biological Theories of Race at the Millennium. New Brunswick: Rutgers University Press. Hammonds, Evelynn M. and Herzig, Rebecca M. 2009. The Nature of Difference: Sciences of Race in the United States from Jefferson to Genomics. Cambridge, MA: MIT Press. Hannaford, Ivan. 1996.  Race: The History of an Idea in the West. Washington, DC: The Woodrow Wilson Center Press. Keel, Terence. 2018. Divine Variations: How Christian Thought Became Racial Science. Stanford, CA: Stanford University Press. Marks, Jonathan. 2017. Is Science Racist? Polity. Marx, Anthony W. 1998. Making Race and Nation: A Comparison of South Africa, the United States, and Brazil. Cambridge: Cambridge University Press. Molina, Natalia. 2006. Fit to Be Citizens: Public Health and Race in Los Angeles, 1879-1939. Berkeley: University of California Press. Morning, Ann. 2011. The Nature of Race: How Scientists Think and Teach About Human Difference, Berkeley: University of California Press. Nobles, Melissa. 2000. Shades of Citizenship: Race and the Census in Modern Politics. Stanford, CA: Stanford University Press. Painter, Nell I. 2010. The History of White People. New York: W.W. Norton & Company. Continued PREPUBLICATION COPY—Uncorrected Proofs

34 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Box 1-1  Continued Pascoe, Peggy. 2009. What Comes Naturally: Miscegenation Law and the Making of Race in America. Oxford University Press. Reardon, Jenny. 2005. Race to the Finish: Identity and Governance in an Age of Genomics. Princeton, NJ: Princeton University Press. Roberts, Dorothy. 2011. Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-first Century. New York: The New Press. Sanders, Edith R. 1969. “The Hamitic Hypothesis; Its Origins and Functions in Time Perspective.” Journal of African History X:521-532. Schor, Paul. 2017. Counting Americans: How the US Census Classified the Na- tion. Oxford: Oxford University Press. Smedley, Audrey, and Brian Smedley. 2012. Race in North America: Origin and Evolution of a Worldview. Boulder, CO: Westview Press. Snowden, Frank M., Jr. 1983. Before Color Prejudice: The Ancient View of Blacks. Cambridge, MA: Harvard University Press. Stepan, Nancy. 1982. The Idea of Race in Science: Great Britain, 1800-1960. Palgrave MacMillan. Stern, Alexandra M. 2015. Eugenic Nation: Faults and Frontiers of Better Breeding in Modern America. Berkeley: University of California Press. TallBear, Kim. 2013. Native American DNA: Tribal Belonging and the False Prom- ise of Genetic Science. Minneapolis: University of Minnesota Press. Wilder, Craig Steven. 2013. Ebony and Ivy: Race, Slavery, and the Troubled His- tory of America’s Universities. New York: Bloomsbury Press. Yudell, Michael. 2014. Race Unmasked: Biology and Race in the Twentieth Cen- tury. New York: Columbia University Press. Zuberi, Tukufu. 2003. Thicker Than Blood: How Racial Statistics Lie. Minneapolis: University of Minnesota Press. U.S.–centric definitions when working with global populations and to be aware of the historical use of alternative descriptors in order to come up with the best possible consensus to embrace diversity while making accurate descriptions of populations for scientific purposes. Population group classifications are context-specific and vary globally. For example, consider the classifications used in ongoing studies from three different countries: the United Kingdom (UK) Biobank, the South African HAALSI study, and the Brazilian BIPMed study (Table 1-1). All popula- tion descriptors vary with each study and are not interchangeable. The descriptors are context specific for those regions and some involve language groups, country of origin, background, or geographic region. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 35 The UK Biobank (UKB) is a biomedical database of genetic and health information from 500,000 participants living in the UK.7 The data in the UKB are globally accessible to approved researchers undertaking studies related to health and disease. Thus far, there are over 30,000 global regis- trations (80 percent from non-UK investigators) (UK Biobank, 2022) and over 5,000 scientific papers published (Conroy et al., 2022). The population descriptors used in the UKB include labels such as white, mixed, and so on, as outlined in Table 1-1. The Health and Aging in Africa: A Longitudinal Study of an INDEPTH Community in South Africa (HAALSI) study includes a community-based cohort of 5,059 men and women 40 years old or older.8 Study data were collected around the following areas: cognition and dementia, cardiometa- bolic disease, human immunodeficiency virus (HIV) and treatment, public policies and health, and multimorbidity. While no population descriptor la- bels are used in the study, data on country of origin were collected (Gómez- Olivé et al., 2018), and in the second wave of the survey, questions related to the languages the participants spoke were included (Berkman, 2020). The Brazilian Initiative on Precision Medicine (BIPMed) is an initia- tive of five research, innovation, and dissemination centers funded by the São Paulo Research Foundation (FAPESP) (Rocha et al., 2020). The five centers share data to create BIPMed, which provides genomic and pheno- typic information to the global research community. BIPMed investigates the distribution of rare and common variants within two BIPMed data sets including the Brazilian population from the metropolitan area of São Paulo. The Brazilian population structure derives from African, European, and Native American populations (Mychaleckyj et al., 2017; de Moura et al., 2015) but in the BIPMed study, the team decided to use geographic regions where individuals were born as population descriptors as this was more relevant for their study, and it was noted that two regions were not well represented in the data (Rocha et al., 2020). Challenges with Legacy Data and Harmonization In an effort to establish uniformity in the use of population descriptors across the globe, several international organizations including the United Nations (UN) and the European Commission have issued recommenda- tions for their member states’ census or other data collection efforts related to race and/or ethnicity (Farkas, 2017; UN, 2017). The UN, for example, includes guidance on data collection for ethnic and/or national groups, one of which is to consult with groups that will be categorized. The guidance 7 For more information on the UK Biobank see https://www.ukbiobank.ac.uk (accessed November 3, 2022). 8 For more information on HAALSI see https://haalsi.org/data (accessed November 3, 2022). PREPUBLICATION COPY—Uncorrected Proofs

36 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH TABLE 1-1  Comparison of Classification Schemes Used in Three Studies using Genetics from Three Distinct Global Contextsa UK Biobank HAALSI BIPMed White: Native Language: Geographic Regions in Brazil British Shangaan where Participants were born: Irish English North Any other white Afrikaans Northeast background Zulu Centre West Mixed Xhosa Southeast White and black Portuguese South Caribbean Other Unknown White and black African White and Asian Any other mixed background Asian or Asian British Indian Pakistani Bangladeshi Any other Asian background Black or black British Caribbean African Any other black background Chinese Other ethnic group Do not know Prefer not to answer aA more extensive, yet still not exhaustive, list of international programs and the population descriptors they use can be found in Appendix C. NOTE: BIPMed = Brazilian Initiative on Precision Medicine; HAALSI = Health and Aging in Africa: A Longitudinal Study of an INDEPTH Community in South Africa; UK = United Kingdom SOURCES: https://www.ukbiobank.ac.uk; Berkman, 2020; Rocha et al., 2020. also notes the diversity of categories and terminology across countries and states that “no internationally accepted criteria are possible” as a result (UN, 2017). Researchers have noted the challenges of harmonization across coun- tries. A study of 138 national censuses conducted around the world in the 1995–2004 period found that 63 percent included some kind of descent- associated question, including those on “race,” “population,” “tribe,” and “caste” (Morning, 2008). In other words, ethnoracial items were far from universal on censuses worldwide. Even among nations that did count their populations by ethnicity or race, they used a wide-ranging set of catego- PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 37 ries, such as “Kankanaey” in the Philippines or “Rotuman” in Fiji, that did not necessarily overlap with labels used elsewhere. In short, geographic variation in the descent-associated groups that are salient—as well as in the underlying classificatory concepts, practices, and norms that are val- ued—implies that a single, universal standardization is likely infeasible. In addition, any attempt to impose a standard global framework of popula- tion descriptors runs the risk of being detrimental or viewed unfavorably in many locales (Bourdieu and Wacquant, 1999; Onishi and Méheut, 2021; Wimmer, 2015). Despite the global variation in these systems, in recent years, there has been a growing need in genomics to analyze multiple data sets across studies to increase statistical power and to make cross-study comparisons. However, heterogeneity among studies in their design, recruitment methods, population descriptors, and measurements makes it difficult to easily com- pare and combine the data and metadata from multiple studies. Challenges of data harmonization include how to deal with missing data or how to compare or aggregate data and metadata in which similar but nonidentical terms are used. The goal of harmonizing population descriptors is to bring disparate classification systems into greater alignment for specific research goals. Even within a single country, many studies have different recruitment processes and reasons for their selection of population descriptors. Not only are there differences in the specific labels used but also in the underlying concepts represented. In addition, for harmonizing population descriptor data, it is challenging to address across studies differences in scale, resolution, or de- scriptors used, or to work with studies that use the same term but have dif- ferent definitions for that term. Existing legacy data often poses additional complications; for example, because some legacy data sets were collected before standards for data sharing were established there may be uncertainty around whether these data meet current ethical or scientific standards. As there is no universal system of descriptors, tools and strategies are needed to harmonize them—that is, to reduce heterogeneity—when looking at data across studies. Data harmonization strives to aggregate data from multiple cohorts and/or biobanks to a degree that is scientifically adequate yet acknowledges the heterogeneity among the data sets. There are two main harmonization methods: prospective and retrospective harmoniza- tion. Prospective approaches establish standard procedures prior to data collection, which makes aggregation and comparison considerably easier. One such approach is using common data elements, which are standard- ized pieces of information collected as part of a study. However, prospective methods are not always feasible, especially when using existing data sets. Thus, other investigators use retrospective methods to integrate data sets after collection. PREPUBLICATION COPY—Uncorrected Proofs

38 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH ATTEMPTS TO ADDRESS THE USE OF RACE, ETHNICITY, AND ANCESTRY IN THE GENOMIC ERA Advances in the measurement of human genetic variation, and subse- quent debates over the sampling and applicability of reference populations have led many in the research ecosystem to grapple with the use of popula- tion descriptors, especially race, ethnicity, and ancestry. For more than 20 years, numerous articles have been published and workshops held to discuss these implications, including calls for “a new vocabulary of human genetic variation” (Sankar and Cho, 2002) and the establishment of guidelines for using racial, ethnic, and ancestral categories in human genetics research (Bonham et al., 2018; Caulfield et al., 2009; Flanagin et al., 2021; Khan et al., 2022; NIMHD, 2017; Takezawa et al., 2014; Yudell et al., 2020). Yet, two decades later, use of these descent-associated population descriptors in genetics research remains largely unchanged and controversial. One impetus for the urgency twenty years ago arose from the rapid technological advancements that made possible whole genome analyses of genetic variation on large numbers of samples. This raised the concern that without thoughtful guidance, classical and stereotypical views of race and ethnicity would be exacerbated by genome analyses. In 2002, Sankar and Cho published an article on the use of race as a research variable in the study of human genetic variation (Sankar and Cho, 2002). They argued that researchers need to be more thoughtful, deliberate, and precise when designing a study, analyzing the results, and reporting the findings. The authors close their article with an appeal to researchers: It is imperative for the research community to acknowledge that the maps used in research are not the only maps used to describe the terrain they study and that careful use of language is necessary to avoid misunderstand- ing (Sankar and Cho, 2002, p. 1338). Other studies have focused on why it is difficult to effect change. For example, Caulfield et al. (2009) underscore how researchers work within structures that have been defined by the complex history of race and insti- tutional racism. The obstacles they highlighted were the requirement to use federal directives like the OMB Directive 15 categories of race and ethnicity for reporting; the media’s tendency to simplify scientific findings and use race and ethnicity as proxies without explaining how the social categories relate to the research design and results; and the qualities of race, its flu- idity, ambiguity, and contingency, which make it difficult to define neatly (Caulfield et al., 2009). The appropriate use of population descriptors in genomics research is a global issue, not one limited to the United States (Mir et al., 2013). Fol- lowing a series of workshops held in 2011 and 2012 in Japan, attendees PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 39 noted that continental labels, such as European, African, and Asian, are tremendously broad, and that among Japanese researchers at the workshop, there was no consensus on what populations should be called Asian. The authors also pointed out, as have others, that when samples are given con- tinental labels but are drawn from limited and specific groups, and there is no attempt to account for the “significant diversity within each region,” then the findings may not generalize to the larger group (Takezawa et al., 2014). They closed with recommendations that echo many of those from other researchers around the world, among them: • Respect cultural preferences in labeling processes, and use names that reflect ethnic or cultural backgrounds as much as possible. • Use categories that are more specific to avoid misinterpretation of results as emphasizing “racial” categories. • Underscore that genetic and trait differences among populations do not reflect discrete differences but rather frequency or probability. • Develop a clear summary of research findings to aid journalists in reporting appropriate population descriptors. In 2016, the NHGRI and National Institute on Minority Health and Health Disparities (NIMHD) hosted a workshop on the use of race and ethnicity data in biomedical and clinical research and how the data are and should be applied to research on minority health and health disparities (Bonham et al., 2018; NIMHD, 2017). A partial summary of the work- shop’s themes and recommendations include: • Collect data across multiple dimensions, including self-identified race and ethnicity, race and ethnicity description by others, how individuals perceive others to view their race and ethnicity, self- identified ancestry, and genetic ancestry. • Update OMB categories, including disaggregating South Asian from other Asian, adding categories to describe individuals from the Middle East/North Africa, adding a category for individuals native to the United States, including an option for multiracial description, adding parent and grandparent self-identified race and ethnicity, including variables to capture sociodemographic data, and updating questions that capture information related to histori- cal racial narratives. • Educate the public on the purpose of, and misconceptions about, data generated from race-associated biomedical genomics research and distinguish genetic ancestry data from sociopolitical or cultur- ally based racial self-identification. Consider ways to improve clini- cian and medical student education in human population genetics. PREPUBLICATION COPY—Uncorrected Proofs

40 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH • Work to improve the accessibility and comparability of race and ethnicity data via the standardization of analysis, tagging, and data reporting; harmonization of methods for data collection, analysis, and reporting; communication of community-based research in- corporating race and ethnicity data with study participants; and collaborative efforts to standardize race and ethnicity descriptors in electronic health records (NIMHD, 2017). In concluding their 2018 paper, Bonham et al. (2018) noted: Genomic knowledge has not changed the need to move beyond the misuse of social categories of race and ethnicity as a proxy for genomic variation. The challenge that scientists and medical journal editors must address is how to report human genomic variation without inappropriately describ- ing racial and ethnic groups as discrete population groups (p. 1,534). The National Heart, Lung, and Blood Institute (NHLBI) trans-omics for precision medicine (TOPMed) program collects and analyzes whole- genome sequencing and other -omics data (e.g., RNA, proteins, metabolites) with a wide range of basic and clinical data on heart, lung, blood, and sleep disorders. The program has over 180,000 participants, of whom 60% are of non-European descent.9 TOPMed researchers have recently provided recommendations on using and reporting population descriptors for race, ethnicity, and ancestry in genomics research, including ones that acknowl- edge the expanding global nature of genomics research and the current focus in the United States on reckoning with racism (Khan et al., 2022): • Avoid using U.S. racial categories to describe study participants not in the United States. • Retain detailed population data, if possible, rather than lumping individuals in broader categories early on in the process. • Understand the potential benefits and harms of analyzing popula- tions before deciding whether to conduct or how to conduct the study. • Recognize the interdisciplinary work already being done on health care disparities when using these as a justification for genomics research. • Follow community preference and study-specific reporting guide- lines when describing study populations. 9 For more information on TOPMed see https://topmed.nhlbi.nih.gov/ (accessed December 9, 2022). PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 41 Despite these and many other efforts, there has been little significant change in the confusing and damaging uses of race, ethnicity, and ancestry as population descriptors in genetics and genomics. In particular, scientists continue to debate whether race is a useful proxy for unmeasured biological differences in human beings—a debate that is fueled by deeply-embedded, and often unexamined, biological concepts of race (Nelson et al., 2019; Wagner et al., 2017). Furthermore, scientists are part of a research enter- prise whose members (e.g., journal editors, funders, research institutions) to date have failed to effectively coordinate their efforts in developing and implementing transformative policies and practices. Success requires a col- lective will to confront and resolve the inevitable challenges, change current ways of thinking and doing, and enrich science and society. The committee suggests a path forward in this report. WHY IS THIS STUDY IMPORTANT? WHY ANOTHER STUDY? WHY NOW? While this history of prior attempts to address population descriptors may create some skepticism about the usefulness of another report aiming to create best practices for this complex area, there are several reasons that this is a particularly opportune and important moment to offer concrete guidance to the research community. Research using human genetic data has grown exponentially over the last decade. Moving from a field largely populated by geneticists, the use of genetic information is now widespread across biomedical research and requires new thinking by all researchers. In addition to a general apprecia- tion of the importance of genetic variation in human disease and health, and the reduction in the cost of and widespread access to genomic technologies, this growth has occurred in part by major investments in large-scale studies, many of which have genomic sequence data available. With this growth, ge- netics research is now conducted by a wide range of investigators—many of whom have a limited understanding of the rationale and use of population descriptors in human genetics, particularly its history—both exacerbating the risk of misuse of such descriptors and creating an important opportu- nity to implement substantive changes. Projects such as NIH’s All of Us, the Million Veteran Program, and many others will further democratize access to genomic data for clinical research and accelerate this transformation. While some early genetics research included groups of individuals that have relatively high genetic and environmental similarity (e.g., inhabitants of Iceland, Amish residents of the United States) or conducted pedigree stud- ies (Francomano et al., 2003), recent large-scale efforts are enrolling more cosmopolitan and a more diverse set of populations (Morales et al., 2018; Zhou et al., 2022), raising more questions about how best to represent PREPUBLICATION COPY—Uncorrected Proofs

42 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH their diversity in the study data. Clear guidance about the use of population descriptors is therefore urgently needed before the mistakes of the past are baked into this new era of genetics research. With this growth in genetics research has come the development of more advanced methods of understanding and describing population struc- ture and variation, as well as a growing clarity about the contribution of such methods to elucidating the relationship between genetic variation and human traits and health outcomes. Methods to assess genetic similarity and infer genetic ancestry have been developed as have nongenetic approaches, such as geospatial mapping of study participants to states/provinces, cities, and neighborhoods. These advances have been accompanied by the growing recognition of the importance of social and physical environmental factors in health generally, and in modifying the relationship between genotype and disease more specifically (All of Us Research Program Investigators, 2019; Davidson et al., 2022). The importance of these factors has led to new efforts to develop and implement environmental measures in many fields, including in genetics. These advances have not been accompanied, however, by new ap- proaches to the use of population descriptors in genetic and genomics research. In the absence of a strong and widely disseminated conceptual framework to guide the use of population descriptors, researchers often assume that the only issue is one of finding the “correct” nomenclature for the groups whose data they analyze. This report aims to break new ground by distinguishing on one hand the fundamental conceptual decisions that genetics researchers must grapple with explicitly when they employ popu- lation descriptors, from the choices of terminology they face on the other hand. In other words, the committee emphasizes that scientists must get the descent-associated concepts right—that is, have a clear understanding of what these descriptors represent and a rigorous rationale for using them— before selecting the appropriate group categories and labels to work with. Without a deliberate, reasoned, and transparent deployment of population descriptors, human genetic and genomics studies are likely to fall into the same trap as in the past—namely, unwarranted typological thinking that reinforces longstanding prejudices about the characteristics of descent- associated groups. Since 2020, the U.S. scientific community has become more attentive to the urgency of addressing racism and the lack of diversity in science as well as the admission that little progress has been made in making science accessible and relevant to a more diverse citizenry (Yudell et al., 2020). Research universities embarked on efforts to address diversity, equity, and inclusion in their scientific and educational programs. The social construct PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 43 of race, the role of intersectionality, and the fundamental effect of racism on all aspects of science and medicine have become parts of faculty train- ings at many institutions (Dupree and Boykin, 2021; Holdren et al., 2022; Kossek et al., 2022). Journal editors recognized the problems of using racial labels in re- search studies, with growing calls for eliminating the use of high-risk proxy measures (Flanagin et al., 2021; Nature Human Behavior, 2022). The call to remove race from clinical prediction models, like glomerular filtration rate, spread rapidly because of the attention to the danger of false assumptions about innate racial differences and resulting harms to patients (Vyas et al., 2020). Recognition by the U.S. biomedical research community of the need to address the complex and important issue of population descriptors in genetics research has never been greater. This Report’s Audience Given the charge, the committee notes that the primary audience for the report is researchers who use genomic data. However, the committee recognizes that many of the recommendations and concepts presented in the report will be beneficial to the broader biomedical and social science re- search communities. One of the foundational tenets of the report is the need for all researchers to be intentional about which population descriptors they choose and how they use and describe descriptors in their research. Further- more, research is increasingly multidisciplinary, thus, the recommendations in this report could be useful for investigators interested in using biological data that may not necessarily have a genetic component. The chapters of the report reflect the complexity of the task the committee was charged with and the report’s diverse audience. Chapter 5 includes a somewhat technical discussion on how to select appropriate population descriptors for genetics research, and there, the primary audience is genetics and genomics research- ers. Chapters 3 and 4 focus on guiding principles to support trustworthy research and requisites for change that could facilitate implementation of the recommendations in the report. The committee notes that these two chapters are intended for a more general audience. Finally, to achieve last- ing change, the recommended actions in the report will need support from a broad and multidisciplinary group of relevant parties. Chapter 6 includes recommendations for implementation and highlights the roles that study participants; funders of genetics and genomics research; professional societ- ies and research journals; journalists, media, and researchers; and research institutions can play in conjunction with researchers to operationalize the report’s recommendations. PREPUBLICATION COPY—Uncorrected Proofs

44 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH WHAT IS THE GOAL OF THIS REPORT? Given this background, the committee was asked by NIH to review and assess the existing methodologies, benefits, and challenges in using race, ethnicity, and other population descriptors in genomics research (see Box 1-2 for the full statement of task). Fourteen different institutes, program, and offices within the NIH sponsored and funded the study. The statement of task focuses on understanding the current use of population descriptors in genomics research; examining best practices in the use of race, ethnicity, and genetic ancestry as population descriptors; and identifying how best practices in the use of population descriptors could be widely adopted within the biomedical and scientific communities to strengthen genetics and genomics research. The statement of task identifies four areas that are beyond the scope of this consensus study: examining the use of race and ethnicity in clinical care; examining racism in science and genomics; examining the use of race and ethnicity in biomedical research generally (e.g., beyond nongenetic and genomics research); and providing policy recommendations to NIH and government agencies. To accomplish the task, the National Academies convened a committee of 17 members repre- senting diverse expertise areas including human genetics; clinical genetics; population genetics; statistical and computational genetics and genomics; historical, ethical, legal, and social implications research; sociology and anthropology; and demography and population statistics (see Appendix E for the committee biographical sketches). During the committee’s first open meeting, NIH delivered the charge to the committee and clarified information related to the statement of task and the project scope (see Appendix A for the public session agendas). NIH specified that while it would be outside the scope of the committee’s work to develop recommendations for the four areas listed in the statement of task as being beyond the scope of this study, discussion and awareness around these topics are necessary to formulate thoughtful recommendations. NIH also clarified that while examining the use of race and ethnicity in clinical care is outside the scope of the committee’s work, clinical research using genomic data would be within the scope of the report. Furthermore, repre- sentatives said that discussing issues such as the effects of systemic racism in the field of genomics could be a useful context for addressing study design recommendations. NIH also reiterated that the recommendations and best practices identified by the committee over the course of the study would be beneficial for the broader scientific and genomics research communities (as opposed to government agencies) and that the committee should have this audience in mind. NIH acknowledged that the consideration and use of population descriptors is quickly evolving in the scientific community and PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 45 indicated that it would be useful to identify a framework and principles for considering race, ethnicity, and other population descriptors in genomics research. The statement of task emphasizes the use of appropriate and valid population descriptors in genomics research. Understanding the potential benefits and harms of past and current population descriptors used in ge- nomics research is discussed at length (see Box 2-1 for key definitions). The committee is mindful that the use of population descriptors including race, ethnicity, and genetic ancestry in genomics research is currently nonstan- dardized and is influenced by factors such as government categories and journal reporting guidelines. Categories of race and ethnicity, as constructs of social identity and culture, have had long standing historical implications for individuals in the United States and globally, to the marginalization of some and benefit of others. Genomics research takes place within this con- text, and social identity from one research participant to the next may vary. The committee is also mindful that additional variation in use of population descriptors is occurring in research studies outside of the United States, and best practices for genomics research might need to be applied differently. See Appendix A for details of the study methods. PREPUBLICATION COPY—Uncorrected Proofs

46 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH BOX 1-2 Statement of Task An ad hoc committee under the auspices of the National Academies of Sci- ences, Engineering, and Medicine’s Health and Medicine Division will convene to review and assess the existing methodologies, benefits, and challenges in the use of race and ethnicity and other population descriptors in genomics research. The committee work will focus on, but not be limited to, the following tasks:    1. Document and evaluate the variety of population descriptors currently used in genomics research and the potential benefits and challenges of changing these descriptors.   2. Assess how race, ethnicity, and genetic ancestry are currently being used as population descriptors in health disparities research to study genetics and genomics.   3. Assess the appropriate use of race, ethnicity, and genetic ancestry as population descriptors in the determination of genetic risk scores and health risk.   4. Develop feasible and logical approaches to advance appropriate use of race and ethnicity and alternative population descriptors in published genomics research studies.   5. Examine the potential of new, culturally responsive methods and com- mon data elements (CDEs) for advancing harmonization of population descriptors in large genomic studies in the United States and globally.   6. Assess when it is appropriate to use race and ethnicity as population descriptors in genetic and genomic research, and provide recommenda- tions to scientists and researchers for future research.   7. Propose best practices for domestic and international harmonization of population group descriptors.   8. Assess the scientific knowledge of the relationships among race, ethnic- ity, and population genetic variation.   9. Identify and discuss potential obstacles to implementation of the new methods to describe populations. 10. Discuss potential implementation strategies to help enhance the adop- tion of best practices by the research community. 11. Identify obstacles and propose best practices in the use of population descriptors with legacy biological samples and associated data. The final report should describe best practices on the use of race, ethnicity, and genetic ancestry and other population descriptors in genetics and genomics research, as formulated by the committee. Attention should be given to how these best practices could be used by biomedical and scientific communities to increase the robustness of study designs and methods for genetics and genomics research in the United States and globally. The following elements are beyond the scope of this consensus study: • Examining the use of race and ethnicity in clinical care • Examining racism in science and genomics • Examining the use of race and ethnicity in biomedical research generally (nongenetic and genomics research) • Providing policy recommendations to NIH and government agencies PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 47 REFERENCES 1000 Genomes Project Consortium. 2015. A global reference for human genetic variation. Nature 526:68-74. Aach, J., M. L. Bulyk, G. M. Church, J. Comander, A. Derti, and J. Shendure. 2001. Computational comparison of two draft sequences of the human genome. Nature 409(6822):856-859. All of Us Research Program Investigators. The “All of Us” Research Program. 2019. New England Journal of Medicine. 381(7):668-676. Anderson, M. J. 1988. The American census: A social history. New Haven and London: Yale University Press.Belsky, D. W., A. Caspi, L. Arseneault, D. L. Corcoran, B. W. Domingue, K. M. Harris, R. M. Houts, J. S. Mill, T. E. Moffitt, J. Prinz, K. Sugden, J. Wertz, B. Wil- liams, and C. L. Odgers. 2019. Genetics and the geography of health, behaviour and attainment. Nature Human Behaviour 3(6):576-586. Berkman, L. 2020. Health and Aging in Africa: A Longitudinal Study of an INDEPTH Com- munity in South Africa [HAALSI]: Agincourt, South Africa, 2015-2019. Inter-university Consortium for Political and Social Research, November 11. https://doi.org/10.3886/ ICPSR36633.v3. Biddanda, A., D. P. Rice, and J. Novembre. 2020. A variant-centric perspective on geographic patterns of human allele frequency variation. eLife 9. Birney, E., A. Bateman, M. E. Clamp, and T. J. Hubbard. 2001. Mining the draft human ge- nome. Nature 409(6822):827-828. Bliss, C. 2020. Conceptualizing race in the genomic age. The Hastings Center Report 50 Suppl 1:S15-S22. Bonham, V. L., E. D. Green, and E. J. Pérez-Stable. 2018. Examining how race, ethnicity, and ancestry data are used in biomedical research. JAMA 320(15):1533-1534. Bourdieu, P., and L. Wacquant. 1999. On the cunning of imperialist reason. Theory, Culture & Society 16(1):41-58. Bryc, K., E. Y. Durand, J. M. Macpherson, D. Reich, and J. L. Mountain. 2015. The genetic an- cestry of African Americans, Latinos, and European Americans across the United States. American Journal of Human Genetics 96(1):37-53. Bugert, P., and H. Klüter. 2012. 100 years after von Dungern & Hirschfeld: Kinship investiga- tion from blood groups to SNPs. Transfusion Medicine and Hemotherapy 39(3):161-162. Byron, Gay L. 2002. Symbolic blackness and ethnic difference in early Christian literature. London: Routledge. Carter, J. K. 2008. Race: A theological account. New York, NY: Oxford University Press. Caulfield, T., S. M. Fullerton, S. E. Ali-Khan, L. Arbour, E. G. Burchard, R. S. Cooper, B.-J. Hardy, S. Harry, R. Hyde-Lay, J. Kahn, R. Kittles, B. A. Koenig, S. S.-J. Lee, M. Ma- linowski, V. Ravitsky, P. Sankar, S. W. Sherer, B. Séguin, D. Shickle, G. Suarez-Kurtz, and A. S. Daar. 2009. Race and ancestry in biomedical research: Exploring the challenges. Genome Medicine 1(1):8. Cavalli-Sforza, L. L., P. Menozzi, and A. Piazza. 1996. The history and geography of human genes. Princeton, NJ: Princeton University Press. Chakraborty, R., S. A. Read, and S. J. Vincent. 2020. Understanding myopia: Pathogenesis and mechanisms. In Updates on myopia: A clinical perspective, edited by M. Ang and T. Y. Wong. Singapore: Springer, Singapore. Pp. 65-94. Chakravarti, A. 2014. Perspectives on human variation through the lens of diversity and race. Cold Spring Harbor Perspectives in Biology 7(a023358). Chial, H. 2008. Mendelian genetics: Patterns of inheritance and single-gene disorders. Nature Education 1(1):63. PREPUBLICATION COPY—Uncorrected Proofs

48 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Chong, J. X., K. J. Buckingham, S. N. Jhangiani, C. Boehm, N. Sobreira, J. D. Smith, T. M. Harrell, M. J. McMillin, W. Wiszniewski, T. Gambin, Z. H. Coban Akdemir, K. Doheny, A. F. Scott, D. Avramopoulos, A. Chakravarti, J. Hoover-Fong, D. Mathews, P. D. Witmer, H. Ling, K. Hetrick, L. Watkins, K. E. Patterson, F. Reinier, E. Blue, D. Muzny, M. Kircher, K. Bilguvar, F. Lopez-Giraldez, V. R. Sutton, H. K. Tabor, S. M. Leal, M. Gunel, S. Mane, R. A. Gibbs, E. Boerwinkle, A. Hamosh, J. Shendure, J. R. Lupski, R. P. Lifton, D. Valle, D. A. Nickerson, G. Centers for Mendelian, and M. J. Bamshad. 2015. The genetic basis of mendelian phenotypes: Discoveries, challenges, and opportunities. American Journal of Human Genetics 97(2):199-215. Conroy, M. C., B. Lacey, J. Bešević, W. Omiyale, Q. Feng, M. Effingham, J. Sellers, S. Sheard, M. Pancholi, G. Gregory, J. Busby, R. Collins, and N. E. Allen. 2022. UK Biobank: A globally important resource for cancer research. British Journal of Cancer. Daniloski, Z., T. X. Jordan, H.-H. Wessels, D. A. Hoagland, S. Kasela, M. Legut, S. Maniatis, E. P. Mimitou, L. Lu, E. Geller, O. Danziger, B. R. Rosenberg, H. Phatnani, P. Smibert, T. Lappalainen, B. R. Tenoever, and E. Sanjanade. 2021. Identification of required host factors for SARS-CoV-2 infection in human cells. Cell 184(1):92-105.e116. Davidson, J., R. Vashisht, and A. J. Butte. 2022. From genes to geography, from cells to com- munity, from biomolecules to behaviors: The importance of social determinants of health. Biomolecules 12(10). De Moura, R. R., A. V. C. Coelho, V. d. Q. Balbino, S. Crovella, and L. A. C. Brandão. 2015. Meta-analysis of Brazilian genetic admixture and comparison with other Latin America countries. American Journal of Human Biology 27(5):674-680. Dorr, G. M. 2008. Segregation’s science: Eugenics and society in Virginia. Charlottesville, VA: University of Virginia Press. Dupree, C. H., and C. M. Boykin. 2021. Racial inequality in academia: Systemic origins, modern challenges, and policy recommendations. Policy Insights from the Behavioral and Brain Sciences 8(1):11-18. Enattah, N. S., T. Sahi, E. Savilahti, J. D. Terwilliger, L. Peltonen, and I. Järvelä. 2002. Identifica- tion of a variant associated with adult-type hypolactasia. Nature Genetics 30(2):233-237. Epstein, S. 2007. Inclusion: The politics of difference in medical research. Chicago, IL: Uni- versity of Chicago Press. Fan, S., D. E. Kelly, M. H. Beltrame, M. E. B. Hansen, S. Mallick, A. Ranciaro, J. Hirbo, S. Thompson, W. Beggs, T. Nyambo, S. A. Omar, D. W. Meskel, G. Belay, A. Froment, N. Patterson, D. Reich, and S. A. Tishkoff. 2019. African evolutionary history inferred from whole genome sequence data of 44 indigenous African populations. Genome Biology 20(1). Farkas, L. 2017. Data collection in the field of ethnicity. Brussels: European Commission. Flanagin, A., T. Frey, and S. L. Christiansen. 2021. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA 326(7):621. Francomano, C. A., V. A. McKusick, and L. G. Biesecker. 2003. Medical genetic studies in the amish: Historical perspective. American Journal of Medical Genetics 121C(1):1-4. Frederickson, G. M. 2002. Racism: A short history. Princeton and Oxford: Princeton Univer- sity Press. Fu, Z., and S. Xi. 2020. The effects of heavy metals on human metabolism. Toxicology Mecha- nisms and Methods 30(3):167-176. Fujimura, J. H., and R. Rajagopalan. 2011. Different differences: The use of ‘genetic ancestry’ versus race in biomedical human genetic research. Social Studies of Science 41(1):5-30. Fullwiley, D. 2008. The biologistical construction of race: ‘admixture’ technology and the new genetic medicine. Social Studies of Science 38(5):695-735. Gomez, F., J. Hirbo, and S. A. Tishkoff. 2014. Genetic variation and adaptation in Africa: Implications for human evolution and disease. Cold Spring Harbor Perspectives in Biol- ogy 6(7):a008524. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 49 Gómez-Olivé, F. X., L. Montana, R. G. Wagner, C. W. Kabudula, J. K. Rohr, K. Kahn, T. Bär- nighausen, M. Collinson, D. Canning, T. Gaziano, J. A. Salomon, C. F. Payne, A. Wade, S. M. Tollman, and L. Berkman. 2018. Cohort profile: Health and Ageing in Africa: A Longitudinal Study of an Indepth community in South Africa (HAALSI). International Journal of Epidemiology 47(3):689-690j. Goodman, A. H., Y. T. Moses, and J. L. Jones. 2019. Race: Are We So Different?. UK: Wiley-Blackwell. Gossett, T. F. 1997. Race: The history of an idea in America. New York: Oxford University Press. Gould, S. J. 1996. The mismeasure of man. New York: W.W. Norton. Graves Jr., J. L. 2001. The emperor’s new clothes: Biological theories of race at the millennium. New Brunswick, NJ: Rutgers University Press. Hamblin, M. T., and A. Di Rienzo. 2000. Detection of the signature of natural selection in humans: Evidence from the Duffy blood group locus. American Journal of Human Ge- netics 66(5):1669-1679. Hammonds, E. M. and R. M. Herzig (eds). 2009. The nature of difference: Sciences of race in the United States from Jefferson to genomics. Cambridge, MA: MIT Press. Hannaford, I. 1996. Race: The history of an idea in the West. Washington, DC: Woodrow Wilson Center Press. Higginbotham, E., N. R. Powe, G. Barabino, E. Fuentes-Afflick, W. L. Harris, D. S. Massy, E. J. Perez-Stable, R. Pettigrew, P. Pierre, N. Risch, and C. Rotimi. Forthcoming. The Use of Race in Health, Science & Society: Origins, concepts, implications, alternatives and the path forward. NAM Perspectives. Discussion Paper, National Academy of Medicine, Washington, DC. Hirschfeld, L., and H. Hirschfeld. 1919. Serological differences between the blood of different races: The result of researches on the Macedonian front. Lancet 194(5016):675-679. Hirschfeld, L. A. 1996. Race in the making: Cognition, culture, and the child’s construction of human kinds. Cambridge, MA and London, England: MIT Press. Hirschfeld, L. A., and S. A. Gelman. 1997. What young children think about the relationship between language variation and social difference. Cognitive Development 12(2):213-238. Holdren, S., Y. Iwai, N. R. Lenze, A. B. Weil, and A. M. Randolph. 2022. A novel narrative medicine approach to dei training for medical school faculty. Teaching and Learning in Medicine:1-10. Hublin, J.-J., A. Ben-Ncer, S. E. Bailey, S. E. Freidline, S. Neubauer, M. M. Skinner, I. Bergmann, A. Le Cabec, S. Benazzi, K. Harvati, and P. Gunz. 2017. New fossils from Jebel Irhoud, Morocco, and the pan-African origin of Homo sapiens. Nature 546(7657):289-292. Kahn, J. 2006. Genes, race, and population: Avoiding a collision of categories. American Journal of Public Health 96(11):1965-1970. Keel, T. 2018. Divine variations: How Christian thought became racial science. 1st ed. Stanford, CA: Stanford University Press. Kerem, B.-S.,M. Rommens, J. A. Buchanan, D. Markiewicz, T. K. Cox, A. Chakravarti, M. Buchwald, and L.-C. Tsui. 1989. Identification of the cystic fibrosis gene: Genetic analysis. Science 245(4922):1073-1080. Khan, A. T., S. M. Gogarten, C. P. McHugh, A. M. Stilp, T. Sofer, M. L. Bowers, Q. Wong, L. A. Cupples, B. Hidalgo, A. D. Johnson, M.-L. M. McDonald, S. T. McGarvey, M. R. G. Taylor, S. M. Fullerton, M. P. Conomos, and S. C. Nelson. 2022. Recommendations on the use and reporting of race, ethnicity, and ancestry in genetic research: Experiences from the NHLBI TOPMed program. Cell Genomics 2(8):100155. Khera, A. V., M. Chaffin, K. G. Aragam, M. E. Haas, C. Roselli, S. H. Choi, P. Natarajan, E. S. Lander, S. A. Lubitz, P. T. Ellinor, and S. Kathiresan. 2018. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nature Genetics 50(9):1219-1224. PREPUBLICATION COPY—Uncorrected Proofs

50 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Kidd, J. M., S. Gravel, J. Byrnes, A. Moreno-Estrada, S. Musharoff, K. Bryc, J. D. Degenhardt, A. Brisbin, V. Sheth, R. Chen, S. F. McLaughlin, H. E. Peckham, L. Omberg, C. A. Bor- mann Chung, S. Stanley, K. Pearlstein, E. Levandowsky, S. Acevedo-Acevedo, A. Auton, A. Keinan, V. Acuña-Alonzo, R. Barquera-Lozano, S. Canizales-Quinteros, C. Eng, E. G. Burchard, A. Russell, A. Reynolds, A. G. Clark, M. G. Reese, S. E. Lincoln, A. J. Butte, F. M. De La Vega, and C. D. Bustamante. 2012. Population genetic inference from personal genome data: Impact of ancestry and admixture on human genomic variation. American Journal of Human Genetics 91(4):660-671. King, R. C., W. D. Stansfield, and P. K. Mulligan. 2014. A dictionary of genetics. 8th ed. Oxford University Press. Kossek, E. E., P. M. Buzzanell, B. J. Wright, C. Batz-Barbarich, A. C. Moors, C. Sullivan, K. Kokini, A. S. Hirsch, K. Maxey, and A. Nikalje. 2022. Implementing diversity training targeting faculty microaggressions and inclusion: Practical insights and initial findings. The Journal of Applied Behavioral Science:002188632211323. Kramer, H. J., A. M. Stilp, C. C. Laurie, A. P. Reiner, J. Lash, M. L. Daviglus, S. E. Rosas, A. C. Ricardo, B. O. Tayo, M. F. Flessner, K. F. Kerr, C. Peralta, R. Durazo-Arvizu, M. Conomos, T. Thornton, J. Rotter, K. D. Taylor, J. Cai, J. Eckfeldt, H. Chen, G. Papanicolau, and N. Franceschini. 2017. African ancestry-specific alleles and kidney disease risk in Hispanics/ Latinos. Journal of the American Society of Nephrology 28(3):915-922. Kremer, B., E. Almqvist, J. Theilmann, N. Spence, H. Telenius, Y. P. Goldberg, and M. R. Hayden. 1995. Sex-dependent mechanisms for expansions and contractions of the cag repeat on affected Huntington disease chromosomes. American Journal of Human Ge- netics 57(2):343-350. Lander, E. S., L. M. Linton, B. Birren, C. Nusbaum, M. C. Zody, J. Baldwin, K. Devon, K. Dewar, M. Doyle, W. FitzHugh, R. Funke, D. Gage, K. Harris, A. Heaford, J. Howland, L. Kann, J. Lehoczky, R. LeVine, P. McEwan, K. McKernan, J. Meldrim, J. P. Mesirov, C. Mi- randa, W. Morris, J. Naylor, C. Raymond, M. Rosetti, R. Santos, A. Sheridan, C. Sougnez, Y. Stange-Thomann, N. Stojanovic, A. Subramanian, D. Wyman, J. Rogers, J. Sulston, R. Ainscough, S. Beck, D. Bentley, J. Burton, C. Clee, N. Carter, A. Coulson, R. Deadman, P. Deloukas, A. Dunham, I. Dunham, R. Durbin, L. French, D. Grafham, S. Gregory, T. Hub- bard, S. Humphray, A. Hunt, M. Jones, C. Lloyd, A. McMurray, L. Matthews, S. Mercer, S. Milne, J. C. Mullikin, A. Mungall, R. Plumb, M. Ross, R. Shownkeen, S. Sims, R. H. Waterston, R. K. Wilson, L. W. Hillier, J. D. McPherson, M. A. Marra, E. R. Mardis, L. A. Fulton, A. T. Chinwalla, K. H. Pepin, W. R. Gish, S. L. Chissoe, M. C. Wendl, K. D. Dele- haunty, T. L. Miner, A. Delehaunty, J. B. Kramer, L. L. Cook, R. S. Fulton, D. L. Johnson, P. J. Minx, S. W. Clifton, T. Hawkins, E. Branscomb, P. Predki, P. Richardson, S. Wenning, T. Slezak, N. Doggett, J. F. Cheng, A. Olsen, S. Lucas, C. Elkin, E. Uberbacher, M. Frazier, R. A. Gibbs, D. M. Muzny, S. E. Scherer, J. B. Bouck, E. J. Sodergren, K. C. Worley, C. M. Rives, J. H. Gorrell, M. L. Metzker, S. L. Naylor, R. S. Kucherlapati, D. L. Nelson, G. M. Weinstock, Y. Sakaki, A. Fujiyama, M. Hattori, T. Yada, A. Toyoda, T. Itoh, C. Kawagoe, H. Watanabe, Y. Totoki, T. Taylor, J. Weissenbach, R. Heilig, W. Saurin, F. Artiguenave, P. Brottier, T. Bruls, E. Pelletier, C. Robert, P. Wincker, D. R. Smith, L. Doucette-Stamm, M. Rubenfield, K. Weinstock, H. M. Lee, J. Dubois, A. Rosenthal, M. Platzer, G. Nyakatura, S. Taudien, A. Rump, H. Yang, J. Yu, J. Wang, G. Huang, J. Gu, L. Hood, L. Rowen, A. Madan, S. Qin, R. W. Davis, N. A. Federspiel, A. P. Abola, M. J. Proctor, R. M. Myers, J. Schmutz, M. Dickson, J. Grimwood, D. R. Cox, M. V. Olson, R. Kaul, C. Raymond, N. Shimizu, K. Kawasaki, S. Minoshima, G. A. Evans, M. Athanasiou, R. Schultz, B. A. Roe, F. Chen, H. Pan, J. Ramser, H. Lehrach, R. Reinhardt, W. R. McCombie, M. de la Bastide, N. Dedhia, H. Blocker, K. Hornischer, G. Nordsiek, R. Agarwala, L. Aravind, J. A. Bailey, A. Bateman, S. Batzoglou, E. Birney, P. Bork, D. G. Brown, C. B. Burge, L. Cerutti, H. C. Chen, D. Church, M. Clamp, R. R. Copley, T. Doerks, S. R. Eddy, E. E. Eichler, T. S. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 51 Furey, J. Galagan, J. G. Gilbert, C. Harmon, Y. Hayashizaki, D. Haussler, H. Hermjakob, K. Hokamp, W. Jang, L. S. Johnson, T. A. Jones, S. Kasif, A. Kaspryzk, S. Kennedy, W. J. Kent, P. Kitts, E. V. Koonin, I. Korf, D. Kulp, D. Lancet, T. M. Lowe, A. McLysaght, T. Mikkelsen, J. V. Moran, N. Mulder, V. J. Pollara, C. P. Ponting, G. Schuler, J. Schultz, G. Slater, A. F. Smit, E. Stupka, J. Szustakowki, D. Thierry-Mieg, J. Thierry-Mieg, L. Wagner, J. Wallis, R. Wheeler, A. Williams, Y. I. Wolf, K. H. Wolfe, S. P. Yang, R. F. Yeh, F. Collins, M. S. Guyer, J. Peterson, A. Felsenfeld, K. A. Wetterstrand, A. Patrinos, M. J. Morgan, P. de Jong, J. J. Catanese, K. Osoegawa, H. Shizuya, S. Choi, Y. J. Chen, and J. Szustakowki. 2001. Initial sequencing and analysis of the human genome. Nature 409(6822):860-921. Landsteiner, K. 1961. On agglutination of normal human blood. Transfusion 1(1):5-8. Leffler, E. M., K. Bullaughey, D. R. Matute, W. K. Meyer, L. Ségurel, A. Venkat, P. Andolfatto, and M. Przeworski. 2012. Revisiting an old riddle: What determines genetic diversity levels within species? PLOS Biology 10(9):e1001388. Lettre, G., A. U. Jackson, C. Gieger, F. R. Schumacher, S. I. Berndt, S. Sanna, S. Eyheramendy, B. F. Voight, J. L. Butler, C. Guiducci, T. Illig, R. Hackett, I. M. Heid, K. B. Jacobs, V. Lys- senko, M. Uda, M. Boehnke, S. J. Chanock, L. C. Groop, F. B. Hu, B. Isomaa, P. Kraft, L. Peltonen, V. Salomaa, D. Schlessinger, D. J. Hunter, R. B. Hayes, G. R. Abecasis, H. E. Wichmann, K. L. Mohlke, and J. N. Hirschhorn. 2008. Identification of ten loci associ- ated with height highlights new biological pathways in human growth. Nature Genetics 40(5):584-591. Lewontin, R. C. 1972. The apportionment of human diversity. In Evolutionary biology. Vol. 6, edited by T. Dobzhansky, M. K. Hecht and W. C. Steere. New York: Springer. Pp. 381-398. Mallick, S., H. Li, M. Lipson, I. Mathieson, M. Gymrek, F. Racimo, M. Zhao, N. Chennagiri, S. Nordenfelt, A. Tandon, P. Skoglund, I. Lazaridis, S. Sankararaman, Q. Fu, N. Rohland, G. Renaud, Y. Erlich, T. Willems, C. Gallo, J. P. Spence, Y. S. Song, G. Poletti, F. Balloux, G. van Driem, P. de Knijff, I. G. Romero, A. R. Jha, D. M. Behar, C. M. Bravi, C. Capelli, T. Hervig, A. Moreno-Estrada, O. L. Posukh, E. Balanovska, O. Balanovsky, S. Karachanak- Yankova, H. Sahakyan, D. Toncheva, L. Yepiskoposyan, C. Tyler-Smith, Y. Xue, M. S. Abdullah, A. Ruiz-Linares, C. M. Beall, A. Di Rienzo, C. Jeong, E. B. Starikovskaya, E. Metspalu, J. Parik, R. Villems, B. M. Henn, U. Hodoglugil, R. Mahley, A. Sajantila, G. Stamatoyannopoulos, J. T. Wee, R. Khusainova, E. Khusnutdinova, S. Litvinov, G. Ayodo, D. Comas, M. F. Hammer, T. Kivisild, W. Klitz, C. A. Winkler, D. Labuda, M. Bamshad, L. B. Jorde, S. A. Tishkoff, W. S. Watkins, M. Metspalu, S. Dryomov, R. Suke- rnik, L. Singh, K. Thangaraj, S. Pääbo, J. Kelso, N. Patterson, and D. Reich. 2016. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations. Nature 538(7624):201-206. Manolio, T. A., F. S. Collins, N. J. Cox, D. B. Goldstein, L. A. Hindorff, D. J. Hunter, M. I. McCarthy, E. M. Ramos, L. R. Cardon, A. Chakravarti, J. H. Cho, A. E. Guttmacher, A. Kong, L. Kruglyak, E. Mardis, C. N. Rotimi, M. Slatkin, D. Valle, A. S. Whittemore, M. Boehnke, A. G. Clark, E. E. Eichler, G. Gibson, J. L. Haines, T. F. C. Mackay, S. A. Mc- Carroll, and P. M. Visscher. 2009. Finding the missing heritability of complex diseases. Nature 461(7265):747-753. Marks, Jonathan. 2017. Is science racist? John Wiley & Sons. Marx, A. W. 1998. Making race and nation: A comparison of South Africa, the United States, and Brazil. Cambridge: Cambridge University Press. Mir, G., S. Salway, J. Kai, S. Karlsen, R. Bhopal, G. T. H. Ellison, and A. Sheikh. 2013. Prin- ciples for research on ethnicity and health: The Leeds consensus statement. European Journal of Public Health 23(3):504-510. Molina, N. 2006. Fit to be citizens: Public health and race in Los Angeles, 1879-1939. Berke- ley: University of California Press. PREPUBLICATION COPY—Uncorrected Proofs

52 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Morales, J., D. Welter, E. H. Bowler, M. Cerezo, L. W. Harris, A. C. McMahon, P. Hall, H. A. Junkins, A. Milano, E. Hastings, C. Malangone, A. Buniello, T. Burdett, P. Flicek, H. Parkinson, F. Cunningham, L. A. Hindorff, and J. A. L. MacArthur. 2018. A standardized framework for representation of ancestry data in genomics studies, with application to the NHGRI-EBI GWAS c atalog. Genome Biology 19(1):21. Morning, A. 2008. Ethnic classification in global perspective: A cross-national survey of the 2000 census round. Population Research and Policy Review 27(2):239-272. Morning, A. 2011. The nature of race: How scientists think and teach about human difference. Oakland, CA: University of California Press. Mourant, A. E. 1977. The distribution of the human blood groups. 2nd ed. Oxford, UK: Blackwell Scientific Publications. Mychaleckyj, J. C., A. Havt, U. Nayak, R. Pinkerton, E. Farber, P. Concannon, A. A. Lima, and R. L. Guerrant. 2017. Genome-wide analysis in Brazilians reveals highly differenti- ated Native American genome regions. Molecular Biology and Evolution 34(3):559-574. Narasimhan, V. M., N. Patterson, P. Moorjani, N. Rohland, R. Bernardos, S. Mallick, I. Laz- aridis, N. Nakatsuka, I. Olalde, M. Lipson, A. M. Kim, L. M. Olivieri, A. Coppa, M. Vidale, J. Mallory, V. Moiseyev, E. Kitov, J. Monge, N. Adamski, N. Alex, N. Broomand- khoshbacht, F. Candilio, K. Callan, O. Cheronet, B. J. Culleton, M. Ferry, D. Fernandes, S. Freilich, B. Gamarra, D. Gaudio, M. Hajdinjak, É. Harney, T. K. Harper, D. Keating, A. M. Lawson, M. Mah, K. Mandl, M. Michel, M. Novak, J. Oppenheimer, N. Rai, K. Sirak, V. Slon, K. Stewardson, F. Zalzala, Z. Zhang, G. Akhatov, A. N. Bagashev, A. Bagnera, B. Baitanayev, J. Bendezu-Sarmiento, A. A. Bissembaev, G. L. Bonora, T. T. Chargynov, T. Chikisheva, P. K. Dashkovskiy, A. Derevianko, M. Dobeš, K. Douka, N. Dubova, M. N. Duisengali, D. Enshin, A. Epimakhov, A. V. Fribus, D. Fuller, A. Goryachev, A. Gromov, S. P. Grushin, B. Hanks, M. Judd, E. Kazizov, A. Khokhlov, A. P. Krygin, E. Kupriyanova, P. Kuznetsov, D. Luiselli, F. Maksudov, A. M. Mamedov, T. B. Mamirov, C. Meiklejohn, D. C. Merrett, R. Micheli, O. Mochalov, S. Mustafokulov, A. Nayak, D. Pettener, R. Potts, D. razhev, M. Rykun, S. Sarno, T. M. Savenkova, K. Sikhymbaeva, S. M. Slepchenko, O. A. Soltobaev, N. Stepanova, S. Svyatko, K. Tabaldiev, M. Teschler-Nicola, A. A. Tishkin, V. V. Tkachev, S. Vasilyev, P. Velemínský, D. Voyakin, A. Yermolayeva, M. Zahir, V. S. Zubkov, A. Zubova, V. S. shinde, C. Lalueza-Fox, M. Meyer, D. Anthony, N. Boivin, K. Thangaraj, D. J. Kennett, M. Frachetti, R. Pinhasi, and D. Reich. 2019. The formation of human populations in South and Central Asia. Science 365(6457):eaat7487. Nature Human Behavior. 2022. Science must respect the dignity and rights of all humans. Nature Human Behaviour 6(8):1029-1031. Ng, S. B.,W. Bigham, K. J. Buckingham, M. C. Hannibal, M. J. McMillin, H. I. Gildersleeve, A. E. Beck, H. K. Tabor, G. M. Cooper, H. C. Mefford, C. Lee, E. H. Turner, J. D. Smith, M. J. Rieder, K. Yoshiura, N. Matsumoto, T. Ohta, N. Niikawa, D. A. Nickerson, M. J. Bamshad, and J. Shendure. 2010. Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nature Genetics 42(9):790-793. Nelson, S. C., J. H. Yu, J. K. Wagner, T. M. Harrell, C. D. Royal, and M. J. Bamshad. 2019. A content analysis of the views of genetics professionals on race, ancestry, and genetics. AJOB Empirical Bioethics 9(4):222-234. Nielsen, R., J. M. Akey, M. Jakobsson, J. K. Pritchard, S. Tishkoff, and E. Willerslev. 2017. Tracing the peopling of the world through genomics. Nature 541(7637):302-310. NIH. 2007. Biological sciences curriculum study. https://www.ncbi.nlm.nih.gov/books/ NBK20363/ (accessed December 8, 2022). NIMHD (National Institute on Minority Health and Health Disparitie). 2017. Workshop examines the use of race and ethnicity in genomics and biomedical research. Bethesda, MD: NIMHD. https://www.nimhd.nih.gov/news-events/features/inside-nimhd/nimhd-nhgri- wrkshp.html (accessed December 2, 2022). PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 53 Nobel Prize Outreach AB. 2022. “Karl Landsteiner – facts.” NobelPrize.org. https://www. nobelprize.org/prizes/medicine/1930/landsteiner/facts/ (accessed November 17, 2022). Nobles, M. 2000. Shades of citizenship: Race and the census in modern politics. Redwood City, CA: Stanford University Press. Novembre, J., and A. Di Rienzo. 2009. Spatial patterns of variation due to natural selection in humans. Nature Reviews Genetics 10(11):745-755. Novembre, J., and B. M. Peter. 2016. Recent advances in the study of fine-scale population structure in humans. Current Opinion in Genetics & Development 41:98-105. Obasogie, O. K. 2010. Do blind people see race? Social, legal, and theoretical considerations. Law & Society Review 44(3-4):585-616. OMB (U.S. Office of Management and Budget). 1977. Directive no. 15 race and ethnic standards for federal statistics and administrative reporting. https://transition.fcc.gov/ Bureaus/OSEC/library/legislative_histories/1195.pdf (accessed December 12, 2022). OMB (U.S. Office of Management and Budget). 1997. Revisions to the standards for the clas- sification of federal data on race and ethnicity. https://www.whitehouse.gov/wp-content/ uploads/2017/11/Revisions-to-the-Standards-for-the-Classification-of-Federal-Data-on- Race-and-Ethnicity-October30-1997.pdf (accessed December 5, 2022). Onishi, N., and C. Méheut. 2021. Heating up culture wars, France to scour universities for ideas that “corrupt society.” New York Times, February 21, 2021. Pääbo, S. 2014. The human condition—a molecular approach. Cell 157(1):216-226. Painter, Nell. 2010. The history of white people. New York: W.W. Norton & Company. Panofsky, A. and C. Bliss. 2017. Ambiguity and scientific authority: Population classification in genomic science. American Sociological Review. 82 (1):59-87. Panzeri, I., and J. A. Pospisilik. 2018. Epigenetic control of variation and stochasticity in metabolic disease. Molecular Metabolism 14:26-38. Pascoe, 2009. What comes naturally: Miscegenation law and the making of race in America. Oxford University Press. Price, A. L., N. Patterson, F. Yu, D. R. Cox, A. Waliszewska, G. J. McDonald, A. Tandon, C. Schirmer, J. Neubauer, G. Bedoya, C. Duque, A. Villegas, M. C. Bortolini, F. M. Salzano, C. Gallo, G. Mazzotti, M. Tello-Ruiz, L. Riba, C. A. Aguilar-Salinas, S. Canizales-Quin- teros, M. Menjivar, W. Klitz, B. Henderson, C. A. Haiman, C. Winkler, T. Tusie-Luna, A. Ruiz-Linares, and D. Reich. 2007. A genomewide admixture map for Latino populations. American Journal of Human Genetics 80(6):1024-1036. Provine, W. B. 1971. The origins of theoretical population genetics. Chicago, IL: University of Chicago Press. Provine, W. B., and E. S. Russell. 1986. Geneticists and race. American Zoologist 26(3):857-887. Raffington, L., D. W. Belsky, M. Kothari, M. Malanchini, E. M. Tucker-Drob, and K. P. Harden. 2021. Socioeconomic disadvantage and the pace of biological aging in children. Pediatrics 147(6):e2020024406. Raffington, L., and D. W. Belsky. 2022. Integrating DNA methylation measures of biological aging into social determinants of health research. Current Environmental Health Reports 9(2):196-210. Reardon, J. 2009. Race to the finish: Identity and governance in an age of genomics. Princeton, NJ: Princeton University Press. Reich, D. 2018. Who we are and how we got here: Ancient DNA and the new science of the human past. Oxford, United Kingdom. Oxford University Press. Roberts, D. 2011. Fatal invention: How science, politics, and big business re-create race in the twenty-first century. New York: The New Press. Rocha, C. S., R. Secolin, M. R. Rodrigues, B. S. Carvalho, and I. Lopes-Cendes. 2020. The Brazilian Initiative on Precision Medicine (BIPMed): Fostering genomic data-sharing of underrepresented populations. NPJ Genomic Medicine 5(1):42. PREPUBLICATION COPY—Uncorrected Proofs

54 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Rosenberg, N. A. 2021. A population-genetic perspective on the similarities and differences among worldwide human populations. Human Biology 92(3):135-152. Ruiz-Linares, A., K. Adhikari, V. Acuña-Alonzo, M. Quinto-Sanchez, C. Jaramillo, W. Arias, M. Fuentes, M. Pizarro, P. Everardo, F. de Avila, J. Gómez-Valdés, P. León-Mimila, T. Hun- emeier, V. Ramallo, C. C. Silva de Cerqueira, M. W. Burley, E. Konca, M. Z. de Oliveira, M. R. Veronez, M. Rubio-Codina, O. Attanasio, S. Gibbon, N. Ray, C. Gallo, G. Poletti, J. Rosique, L. Schuler-Faccini, F. M. Salzano, M. C. Bortolini, S. Canizales-Quinteros, F. Rothhammer, G. Bedoya, D. Balding, and R. Gonzalez-José. 2014. Admixture in Latin America: Geographic structure, phenotypic diversity and self-perception of ancestry based on 7,342 individuals. PLoS Genetics 10(9):e1004572. Sachidanandam, R., D. Weissman, S. C. Schmidt, J. M. Kakol, L. D. Stein, G. Marth, S. Sherry, J. C. Mullikin, B. J. Mortimore, D. L. Willey, S. E. Hunt, C. G. Cole, P. C. Coggill, C. M. Rice, Z. Ning, J. Rogers, D. R. Bentley, P.-Y. Kwok, E. R. Mardis, R. T. Yeh, B. Schultz, L. Cook, R. Davenport, M. Dante, L. Fulton, L. Hillier, R. H. Waterston, J. D. McPherson, B. Gilman, S. Schaffner, W. J. Van Etten, D. Reich, J. Higgins, M. J. Daly, B. Blumenstiel, J. Baldwin, N. Stange-Thomann, M. C. Zody, L. Linton, E. S. Lander, and D. Altshuler. 2001. A map of human genome sequence variation containing 1.42 million single nucleo- tide polymorphisms. Nature 409(6822):928-933. Sadarangani, M., A. Marchant, and T. R. Kollmann. 2021. Immunological mechanisms of vaccine-induced protection against COVID-19 in humans. Nature Reviews Immunology 21(8):475-484. Sanders, E. R. 1969. The Hamitic Hypothesis; Its Origins and Functions in Time Perspec- tive.” Journal of African History X:521-532. Sankar, P., and M. K. Cho. 2002. Toward a new vocabulary of human genetic variation. Sci- ence 298(5597):1337-1338. Schor, Paul. 2017.  Counting Americans: How the US census classified the nation.  Oxford: Oxford University Press. Smedley, A., and B. D. Smedley. 2012. Race in North America: Origin and evolution of a worldview. 4th ed. Boulder, CO: Westview Press. Snowden, F. M., Jr. 1983. Before color prejudice: The ancient view of blacks. Cambridge, MA: Harvard University Press. Stepan, N. 1982. The idea of race in science: Great Britain, 1800-1960. Palgrave MacMillan. Stern, A. M. 2015. Eugenic nation: Faults and frontiers of better breeding in modern America. Berkeley: University of California Press. Strickberger, M. W. 1985. Genetics. Third ed: Macmillan. Takezawa, Y., K. Kato, H. Oota, T. Caulfield, A. Fujimoto, S. Honda, N. Kamatani, S. Kawamura, K. Kawashima, R. Kimura, H. Matsumae, A. Saito, P. E. Savage, N. Seguchi, K. Shimizu, S. Terao, Y. Yamaguchi-Kabata, A. Yasukouchi, M. Yoneda, and K. Tokunaga. 2014. Human genetic research, race, ethnicity and the labeling of populations: Recom- mendations based on an interdisciplinary workshop in Japan. BMC Medical Ethics 15(1). TallBear, K. 2013. Native American DNA: Tribal belonging and the false promise of genetic science. Minneapolis: University of Minnesota Press. UN (United Nations). 2017. Ethnocultural characteristics. https://unstats.un.org/unsd/demo- graphic/sconcerns/popchar/popcharmethods.htm (accessed October 19, 2022). Van Ausdale, D., and J. R. Feagin. 1996. Using racial and ethnic concepts: The critical case of very young children. American Sociological Review 61:779-793. PREPUBLICATION COPY—Uncorrected Proofs

GENESIS, EVOLUTION, AND CHALLENGES 55 Venter, J. C., M. D. Adams, E. W. Myers, P. W. Li, R. J. Mural, G. G. Sutton, H. O. Smith, M. Yandell, C. A. Evans, R. A. Holt, J. D. Gocayne, P. Amanatides, R. M. Ballew, D. H. Huson, J. R. Wortman, Q. Zhang, C. D. Kodira, X. H. Zheng, L. Chen, M. Skupski, G. Subramanian, P. D. Thomas, J. Zhang, G. L. Gabor Miklos, C. Nelson, S. Broder, A. G. Clark, J. Nadeau, V. A. McKusick, N. Zinder, A. J. Levine, R. J. Roberts, M. Simon, C. Slayman, M. Hunkapiller, R. Bolanos, A. Delcher, I. Dew, D. Fasulo, M. Flanigan, L. Florea, A. Halpern, S. Hannenhalli, S. Kravitz, S. Levy, C. Mobarry, K. Reinert, K. Rem- ington, J. Abu-Threideh, E. Beasley, K. Biddick, V. Bonazzi, R. Brandon, M. Cargill, I. Chandramouliswaran, R. Charlab, K. Chaturvedi, Z. Deng, V. Di Francesco, P. Dunn, K. Eilbeck, C. Evangelista, A. E. Gabrielian, W. Gan, W. Ge, F. Gong, Z. Gu, P. Guan, T. J. Heiman, M. E. Higgins, R. R. Ji, Z. Ke, K. A. Ketchum, Z. Lai, Y. Lei, Z. Li, J. Li, Y. Liang, X. Lin, F. Lu, G. V. Merkulov, N. Milshina, H. M. Moore, A. K. Naik, V. A. Narayan, B. Neelam, D. Nusskern, D. B. Rusch, S. Salzberg, W. Shao, B. Shue, J. Sun, Z. Wang, A. Wang, X. Wang, J. Wang, M. Wei, R. Wides, C. Xiao, C. Yan, A. Yao, J. Ye, M. Zhan, W. Zhang, H. Zhang, Q. Zhao, L. Zheng, F. Zhong, W. Zhong, S. Zhu, S. Zhao, D. Gilbert, S. Baumhueter, G. Spier, C. Carter, A. Cravchik, T. Woodage, F. Ali, H. An, A. Awe, D. Baldwin, H. Baden, M. Barnstead, I. Barrow, K. Beeson, D. Busam, A. Carver, A. Center, M. L. Cheng, L. Curry, S. Danaher, L. Davenport, R. Desilets, S. Dietz, K. Dodson, L. Doup, S. Ferriera, N. Garg, A. Gluecksmann, B. Hart, J. Haynes, C. Haynes, C. Heiner, S. Hladun, D. Hostin, J. Houck, T. Howland, C. Ibegwam, J. Johnson, F. Kalush, L. Kline, S. Koduru, A. Love, F. Mann, D. May, S. McCawley, T. McIntosh, I. McMullen, M. Moy, L. Moy, B. Murphy, K. Nelson, C. Pfannkoch, E. Pratts, V. Puri, H. Qureshi, M. Reardon, R. Rodriguez, Y. H. Rogers, D. Romblad, B. Ruhfel, R. Scott, C. Sitter, M. Smallwood, E. Stewart, R. Strong, E. Suh, R. Thomas, N. N. Tint, S. Tse, C. Vech, G. Wang, J. Wetter, S. Williams, M. Williams, S. Windsor, E. Winn-Deen, K. Wolfe, J. Zaveri, K. Zaveri, J. F. Abril, R. Guigo, M. J. Campbell, K. V. Sjolander, B. Karlak, A. Kejariwal, H. Mi, B. Laza- reva, T. Hatton, A. Narechania, K. Diemer, A. Muruganujan, N. Guo, S. Sato, V. Bafna, S. Istrail, R. Lippert, R. Schwartz, B. Walenz, S. Yooseph, D. Allen, A. Basu, J. Baxendale, L. Blick, M. Caminha, J. Carnes-Stine, P. Caulk, Y. H. Chiang, M. Coyne, C. Dahlke, A. Deslattes Mays, M. Dombroski, M. Donnelly, D. Ely, S. Esparham, C. Fosler, H. Gire, S. Glanowski, K. Glasser, A. Glodek, M. Gorokhov, K. Graham, B. Gropman, M. Harris, J. Heil, S. Henderson, J. Hoover, D. Jennings, C. Jordan, J. Jordan, J. Kasha, L. Kagan, C. Kraft, A. Levitsky, M. Lewis, X. Liu, J. Lopez, D. Ma, W. Majoros, J. McDaniel, S. Murphy, M. Newman, T. Nguyen, N. Nguyen, M. Nodell, S. Pan, J. Peck, M. Peterson, W. Rowe, R. Sanders, J. Scott, M. Simpson, T. Smith, A. Sprague, T. Stockwell, R. Turner, E. Venter, M. Wang, M. Wen, D. Wu, M. Wu, A. Xia, A. Zandieh, and X. Zhu. 2001. The sequence of the human genome. Science 291(5507):1304-1351. von Dungern, E., and L. Hirschfeld. 1962. Concerning heredity of group specific structures of blood. Transfusion 2(1):70-74. Vyas, D. A., L. G. Eisenstein, and D. S. Jones. 2020. Hidden in plain sight—Reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine 383(9):874-882. Wagner, J. K., J. H. Yu, J. O. Ifekwunigwe, T. M. Harrell, M. J. Bamshad, and C. D. Royal. 2017. Anthropologists’ views on race, ancestry, and genetics. American Journal of Physi- cal Anthropology 162(2):318-327. Waldman, S., D. Backenroth, É. Harney, S. Flohr, N. C. Neff, G. M. Buckley, H. Fridman, A. Akbari, N. Rohland, S. Mallick, I. Olalde, L. Cooper, A. Lomes, J. Lipson, J. Cano Nistal, J. Yu, N. Barzilai, I. Peter, G. Atzmon, H. Ostrer, T. Lencz, Y. E. Maruvka, M. Lämmerhirt, A. Beider, L. V. Rutgers, V. Renson, K. M. Prufer, S. Schiffels, H. Ringbauer, K. Sczech, S. Carmi, and D. Reich. 2022. Genome-wide data from medieval German Jews show that the Ashkenazi founder event pre-dated the 14th century. Cell 185(25):4703-4716.e4716. PREPUBLICATION COPY—Uncorrected Proofs

56 POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH Watson, M. S., M. A. Lloyd-Puryear, and R. R. Howell. 2022. The progress and future of US newborn screening. International Journal of Neonatal Screening 8(3):41. West, K. M., E. Blacksher, and W. Burke. 2017. Genomics, health disparities, and missed op- portunities for the nation’s research agenda. JAMA 317(18):1831. Wilder, C. S. 2013. Ebony and ivy: Race, slavery, and the troubled history of America’s uni- versities. New York: Bloomsbury Press. Williamson, S. H., M. J. Hubisz, A. G. Clark, B. A. Payseur, C. D. Bustamante, and R. Nielsen. 2007. Localizing recent adaptive evolution in the human genome. PLoS Genet- ics 3(6):e90. Wimmer, A. 2015. Race-centrism: A critique and a research agenda. Ethnic and Racial Studies 38(13):2186-2205. Yudell, M. 2014. Race unmasked: Biology and race in the twentieth century. New York: Co- lumbia University Press. Yudell, M., D. Roberts, R. DeSalle, S. Tishkoff, and 70 signatories. 2020. NIH must confront the use of race in science. Science 369(6509):1313-1314. Zhou, W., M. Kanai, K.-H. H. Wu, H. Rasheed, K. Tsuo, J. B. Hirbo, Y. Wang, A. Bhattacharya, H. Zhao, S. Namba, I. Surakka, B. N. Wolford, V. Lo Faro, E. A. Lopera-Maya, K. Läll, M.-J. Favé, J. J. Partanen, S. B. Chapman, J. Karjalainen, M. Kurki, M. Maasha, B. M. Brumpton, S. Chavan, T.-T. Chen, M. Daya, Y. Ding, Y.-C. A. Feng, L. A. Guare, C. R. Gignoux, S. E. Graham, W. E. Hornsby, N. Ingold, S. I. Ismail, R. Johnson, T. Laisk, K. Lin, J. Lv, I. Y. Millwood, S. Moreno-Grau, K. Nam, P. Palta, A. Pandit, M. H. Preuss, C. Saad, S. Setia-Verma, U. Thorsteinsdottir, J. Uzunovic, A. Verma, M. Zawistowski, X. Zhong, N. Afifi, K. M. Al-Dabhani, A. Al Thani, Y. Bradford, A. Campbell, K. Crooks, G. H. de Bock, S. M. Damrauer, N. J. Douville, S. Finer, L. G. Fritsche, E. Fthenou, G. Gonzalez-Arroyo, C. J. Griffiths, Y. Guo, K. A. Hunt, A. Ioannidis, N. M. Jansonius, T. Konuma, M. T. M. Lee, A. Lopez-Pineda, Y. Matsuda, R. E. Marioni, B. Moatamed, M. A. Nava-Aguilar, K. Numakura, S. Patil, N. Rafaels, A. Richmond, A. Rojas-Muñoz, J. A. Shortt, P. Straub, R. Tao, B. Vanderwerff, M. Vernekar, Y. Veturi, K. C. Barnes, M. Boezen, Z. Chen, C.-Y. Chen, J. Cho, G. D. Smith, H. K. Finucane, L. Franke, E. R. Gamazon, A. Ganna, T. R. Gaunt, T. Ge, H. Huang, J. Huffman, N. Katsanis, J. T. Koskela, C. Lajon- chere, M. H. Law, L. Li, C. M. Lindgren, R. J. F. Loos, S. MacGregor, K. Matsuda, C. M. Olsen, D. J. Porteous, J. A. Shavit, H. Snieder, T. Takano, R. C. Trembath, J. M. Vonk, D. C. Whiteman, S. J. Wicks, C. Wijmenga, J. Wright, J. Zheng, X. Zhou, P. Awadalla, M. Boehnke, C. D. Bustamante, N. J. Cox, S. Fatumo, D. H. Geschwind, C. Hayward, K. Hveem, E. E. Kenny, S. Lee, Y.-F. Lin, H. Mbarek, R. Mägi, H. C. Martin, S. E. Medland, Y. Okada, A. V. Palotie, B. Pasaniuc, D. J. Rader, M. D. Ritchie, S. Sanna, J. W. Smoller, K. Stefansson, D. A. van Heel, R. G. Walters, S. Zöllner, A. R. Martin, C. J. Willer, M. J. Daly, and B. M. Neale. 2022. Global biobank meta-analysis initiative: Powering genetic discovery across human disease. Cell Genomics 2(10):100192. Zuberi, T. 2003. Thicker than blood: How racial statistics lie. Minneapolis: University of Minnesota Press. PREPUBLICATION COPY—Uncorrected Proofs

Next: 2 A Multiplicity of Descriptors in Genetics and Genomics Research »
Using Population Descriptors in Genetics and Genomics Research: A New Framework for an Evolving Field Get This Book
×
Buy Prepub | $34.00 Buy Paperback | $25.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Genetic and genomic information has become far more accessible, and research using human genetic data has grown exponentially over the past decade. Genetics and genomics research is now being conducted by a wide range of investigators across disciplines, who often use population descriptors inconsistently and/or inappropriately to capture the complex patterns of continuous human genetic variation.

In response to a request from the National Institutes of Health, the National Academies assembled an interdisciplinary committee of expert volunteers to conduct a study to review and assess existing methodologies, benefits, and challenges in using race, ethnicity, ancestry, and other population descriptors in genomics research. The resulting report focuses on understanding the current use of population descriptors in genomics research, examining best practices for researchers, and identifying processes for adopting best practices within the biomedical and scientific communities.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!