Using Population
Descriptors in Genetics
and Genomics Research
A New Framework for an Evolving Field
______
Committee on the Use of Race, Ethnicity, and Ancestry
as Population Descriptors in Genomics Research
Board on Health Sciences Policy
Health and Medicine Division
Committee on Population
Division of Behavioral and Social Sciences and Education
Consensus Study Report
NATIONAL ACADEMIES PRESS 500 Fifth Street, NW, Washington, DC 20001
This project has been funded with federal funds under Contract No. HHSN263201800029I (75N98021F00009) between the National Academy of Sciences and the Department of Health and Human Services, National Institutes of Health: All of Us Research Program; National Cancer Institute; National Heart, Lung, and Blood Institute; National Human Genome Research Institute; Eunice Kennedy Shriver National Institute of Child Health and Human Development; National Institute of Dental and Craniofacial Research; National Institute of Diabetes and Digestive and Kidney Diseases; National Institute of Environmental Health Sciences; National Institute of Nursing Research; National Institute on Aging; National Institute on Drug Abuse; National Institute on Minority Health and Health Disparities; NIH Office of Behavioral and Social Sciences Research; and NIH Office of Science Policy. Any opinions, findings, conclusions, or recommendations expressed in this publication do not necessarily reflect the views of any organization or agency that provided support for the project.
International Standard Book Number-13: 978-0-309-70065-8
International Standard Book Number-10: 0-309-70065-5
Digital Object Identifier: https://doi.org/10.17226/26902
This publication is available from the National Academies Press, 500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-3313; http://www.nap.edu.
Copyright 2023 by the National Academy of Sciences. National Academies of Sciences, Engineering, and Medicine and National Academies Press and the graphical logos for each are all trademarks of the National Academy of Sciences. All rights reserved.
Printed in the United States of America.
Suggested citation: National Academies of Sciences, Engineering, and Medicine. 2023. Using population descriptors in genetics and genomics research: A new framework for an evolving field. Washington, DC: The National Academies Press. https://doi.org/10.17226/26902.
The National Academy of Sciences was established in 1863 by an Act of Congress, signed by President Lincoln, as a private, nongovernmental institution to advise the nation on issues related to science and technology. Members are elected by their peers for outstanding contributions to research. Dr. Marcia McNutt is president.
The National Academy of Engineering was established in 1964 under the charter of the National Academy of Sciences to bring the practices of engineering to advising the nation. Members are elected by their peers for extraordinary contributions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was established in 1970 under the charter of the National Academy of Sciences to advise the nation on medical and health issues. Members are elected by their peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau is president.
The three Academies work together as the National Academies of Sciences, Engineering, and Medicine to provide independent, objective analysis and advice to the nation and conduct other activities to solve complex problems and inform public policy decisions. The National Academies also encourage education and research, recognize outstanding contributions to knowledge, and increase public understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medicine at www.nationalacademies.org.
Consensus Study Reports published by the National Academies of Sciences, Engineering, and Medicine document the evidence-based consensus on the study’s statement of task by an authoring committee of experts. Reports typically include findings, conclusions, and recommendations based on information gathered by the committee and the committee’s deliberations. Each report has been subjected to a rigorous and independent peer-review process, and it represents the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and Medicine chronicle the presentations and discussions at a workshop, symposium, or other event convened by the National Academies. The statements and opinions contained in proceedings are those of the participants and are not endorsed by other participants, the planning committee, or the National Academies.
Rapid Expert Consultations published by the National Academies of Sciences, Engineering, and Medicine are authored by subject-matter experts on narrowly focused topics that can be supported by a body of evidence. The discussions contained in rapid expert consultations are considered those of the authors and do not contain policy recommendations. Rapid expert consultations are reviewed by the institution before release.
For information about other products and activities of the National Academies, please visit www.nationalacademies.org/about/whatwedo.
COMMITTEE ON THE USE OF RACE, ETHNICITY, AND ANCESTRY AS POPULATION DESCRIPTORS IN GENOMICS RESEARCH1
ARAVINDA CHAKRAVARTI (Cochair), Director, Center for Human Genetics and Genomics; Muriel G. & George W. Singer Professor of Neuroscience and Physiology, Professor of Medicine, New York University Grossman School of Medicine
CHARMAINE ROYAL (Cochair), Robert O. Keohane Professor of African & African American Studies, Biology, Global Health, and Family Medicine & Community Health; Director, Center on Genomics, Race, Identity, Difference and Center for Truth, Racial Healing & Transformation, Duke University
KATRINA ARMSTRONG, Executive Vice President, Health and Biomedical Sciences; Dean of the Vagelos College of Physicians and Surgeons and the Faculties of Health Sciences; Harold and Margaret Hatch Professor of the University; Columbia University
MICHAEL BAMSHAD, Professor and Chief, Division of Genetic Medicine, Department of Pediatrics; Allan and Phyllis Treuer Endowed Chair, Genetics and Development, University of Washington and Seattle Children’s Hospital
LUISA N. BORRELL, Distinguished Professor, Department of Epidemiology & Biostatistics, Graduate School of Public Health & Health Policy, City University of New York
KATRINA CLAW, Assistant Professor, Department of Biomedical Informatics, School of Medicine; Faculty, Colorado Center for Personalized Medicine, University of Colorado Denver–Anschutz Medical Campus
CLARENCE C. GRAVLEE, Associate Professor, Department of Anthropology, University of Florida
MARK D. HAYWARD, Professor of Sociology; Centennial Commission Professor in Liberal Arts; Faculty Research Associate, Population Research Center, University of Texas at Austin
RICK KITTLES, Senior Vice President for Research; Professor of Community Health and Preventive Medicine, Morehouse School of Medicine
SANDRA SOO-JIN LEE, Professor of Medical Humanities & Ethics; Chief of the Division of Ethics, Department of Medical Humanities & Ethics, Vagelos College of Physicians & Surgeons, Columbia University
___________________
1 See Appendix F, Disclosure of Unavoidable Conflict of Interest.
ANDRÉS MORENO-ESTRADA, Professor, Advanced Genomics Unit, Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional (CINVESTAV), Mexico
ANN MORNING, James Weldon Johnson Professor of Sociology, New York University
JOHN P. NOVEMBRE, Professor, Department of Human Genetics, Department of Ecology & Evolution, University of Chicago
MOLLY PRZEWORSKI, Professor, Department of Biological Sciences, Department of Systems Biology, Columbia University
DOROTHY E. ROBERTS, George A. Weiss University Professor of Law & Sociology; Raymond Pace & Sadie Tanner Mossell Alexander Professor of Civil Rights; Professor of Africana Studies; Director, Penn Program on Race, Science & Society, University of Pennsylvania
SARAH A. TISHKOFF, David and Lyn Silfen University Professor, Departments of Genetics and Biology; Director, Center for Global Genomics & Health Equity, University of Pennsylvania
GENEVIEVE L. WOJCIK, Assistant Professor of Epidemiology, Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health
Study Staff
SARAH H. BEACHY, Study Director
SAMANTHA N. SCHUMM, Associate Program Officer
LEAH CAIRNS, Study Codirector (until October 2022)
KATHRYN ASALONE, Associate Program Officer
MEREDITH HACKMANN, Associate Program Officer
LYDIA TEFERRA, Research Assistant
APARNA CHERAN, Senior Program Assistant (from June 2022)
MICHAEL K. ZIERLER, Science Writer
ANDREW M. POPE, Senior Director, Board on Health and Sciences Policy (until July 2022)
CLARE STROUD, Senior Director, Board on Health and Sciences Policy (from July 2022)
MALAY K. MAJMUNDAR, Director, Committee on Population
Reviewers
This Consensus Study Report was reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise. The purpose of this independent review is to provide candid and critical comments that will assist the National Academies of Sciences, Engineering, and Medicine in making each published report as sound as possible and to ensure that it meets the institutional standards for quality, objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process.
We thank the following individuals for their review of this report:
WENDY CHUNG, Columbia University
DANA A. GLEI, Georgetown University
EVELYNN M. HAMMONDS, Harvard University
CHANITA HUGHES-HALBERT, University of Southern California
BENJAMIN NEALE, Harvard Medical School
NEIL R. POWE, University of California, San Francisco
ERICA RAMOS, Genome Medical
ALIYA SAPERSTEIN, Stanford University
THE REVEREND ROBERT JEMONDE TAYLOR, Duke Cancer Institute Community Advisory Council
SHARON F. TERRY, Genetic Alliance
HONGYU ZHAO, Yale University
Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations of this report nor did they see the final draft before its release. The review of this report was overseen by SUSAN J. CURRY of the University of Iowa and LINDA C. DEGUTIS of the Yale School of Public Health. They were responsible for making certain that an independent examination of this report was carried out in accordance with the standards of the National Academies and that all review comments were carefully considered. Responsibility for the final content rests entirely with the authoring committee and the National Academies.
Acknowledgments
The study committee and project staff acknowledge that the National Academies of Sciences, Engineering, and Medicine is physically housed on the traditional land of the Nacotchtank (Anacostan) and Piscataway Peoples, past and present. The committee and staff honor with gratitude the land itself and the people who have stewarded it throughout the generations. They honor and respect the enduring relationship that exists between these peoples and nations and this land. The committee and staff thank these peoples for their resilience in protecting this land and aspire to uphold our responsibilities to their example. The committee and staff also acknowledge the countless number of people who have participated, both willingly and unwillingly, in biomedical research, as well as those who have raised the issues addressed in this study for many years.
The study committee and project staff would like to thank the study sponsor—the 14 institutes, program, and offices of the National Institutes of Health—for their leadership on this issue and for their vision and commitment to developing and supporting this project. The committee and staff express their gratitude to the many experts who shared their diverse perspectives and advice with the committee throughout the process and during the public sessions. The committee is grateful for the staff within the Health and Medicine Division who provided support and guidance for the project, along with their collaborators in the Division of Behavioral and Social Sciences and Education.
This page intentionally left blank.
Contents
LIST OF BOXES, FIGURES, AND TABLES
SECTION I: PAST AND CURRENT USE OF POPULATION DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH
1 POPULATION DESCRIPTORS IN HUMAN GENETICS RESEARCH: GENESIS, EVOLUTION, AND CHALLENGES
The Study of Human Genetic Variation
What Is a Study Using Genetic Information Trying to Accomplish?
Classification of Genomics Study Types
Features of Human Genome Variation
Population Classification Schemes in Genetics and Genomics Research
Attempts to Address the Use of Race, Ethnicity, and Ancestry in the Genomic Era
Why Is This Study Important? Why Another Study? Why Now?
What Is the Goal of This Report?
2 A MULTIPLICITY OF DESCRIPTORS IN GENETICS AND GENOMICS RESEARCH
A Range of Descent-Associated Population Descriptors
The Importance of Environmental Factors in Genetics and Genomics Research
Synergy Among and Tension Between Guiding Principles
4 REQUISITES FOR SUSTAINED CHANGE
5 GUIDANCE FOR SELECTION AND USE OF POPULATION DESCRIPTORS IN GENOMICS RESEARCH
The Importance of Transparency and Specificity When Selecting and Reporting Population Descriptors
Conclusion and Recommendations
Tools for Selecting and Using Population Descriptors in Genetics and Genomics Research
Considerations for Harmonization of Population Descriptors Across Studies
This page intentionally left blank.
List of Boxes, Figures, and Tables
BOXES
S-1 Key Terminology and Definitions
1-1 Race, Science, and Society: A Reference List
2-1 Key Terminology and Definitions
5-1 Key Terminology for This Chapter
5-2 Common Data Elements for Researchers to Include as Metadata to Help Harmonize Across Studies
5-3 Concise Language for Genetic Similarity: The Abbreviation + -Like System
6-1 Example Checklist that Funders of Genetics and Genomics Research Can Implement for Researchers
6-2 Example Checklist that Journals Can Implement for Genomics Researchers
FIGURES
2-1 Visualization of genealogical vs. genetic ancestry
2-2 U.S. Census race and ethnicity categories over time (1790–2020)
D-1 Decision tree for the use of population descriptors in genomics research
TABLES
S-1 Recommended Approaches for the Use of Population Descriptors by Genomics Study Type
5-1 Recommended Approaches for the Use of Population Descriptors by Genomics Study Type
Preface
Human genetics studies that assess the contributions of genes to phenotypes can be conducted either using relatives or groups of distantly related individuals (“unrelated” in a colloquial sense). In both instances, geneticists search for a pattern of genetic (sequence or allelic) variation that can distinguish between different forms of a phenotype, say, individuals with sickle cell anemia from those with sickle cell trait, based on the known rules of Mendelian genetic transmission. Although the expected similarity or dissimilarity of closely related individuals largely depends on gene transmission rules, that between more distantly related individuals mostly depends on their remote ancestral histories, such as where, when, and how their common ancestors arose. This information is partially captured by affiliation of an individual to a population; however, how a population should be defined for any specific question in genetics research is less clear. Nevertheless, for any human genetics research, now extended to entire genomes, it is critical to clearly describe who is selected for a study, why, and how. Researchers also need to specify the criteria used to describe participants, including the use of population descriptors. Unfortunately, genetics studies have not named individuals consistently or in a principled manner, often reflexively using race and ethnicity without great thought or justification. Though seldom studied, measures of the environments associated with study individuals and groups are also germane to our understanding of genetic traits and disorders and need to be included.
In recent years, genetic information has become far more accessible. The number of human genetics and genomics studies is rapidly increasing, and many such studies are led by investigators who were not primarily
trained in human genetics. While this study focuses mainly on knowledge from human genetics and genomics, we acknowledge that knowledge from many other sources (oral, archaeological, traditional, community, etc.) serves to inform our identities, history, relationships to other humans, and our traits and diseases. It is time for us to reshape how genetics studies are conceptualized, conducted, and interpreted.
This commissioned study describes an effort to clarify the scientific rationales for describing research participants and their group labels. We start with a historical view of how we got to our current state, then proceed to examine how else we could achieve our scientific aims, and follow with our recommendations and suggested implementations to improve genetic and genomic science. Our overarching goal is to motivate researchers to consider when population descriptors are necessary, which ones are appropriate for a specific type of genetics study design, whether multiple descriptors are necessary, and what additional information is needed for genetic dissection of phenotypes. Accordingly, this report is divided into two sections; the first is “Past and Current Use of Population Descriptors,” and the second section is “Recommendations.”
Aravinda Chakravarti and Charmaine Royal,
Cochairs, Committee on the Use of Race, Ethnicity, and
Ancestry as Population Descriptors in Genomics Research
Abbreviations
1000G | 1000 Genomes Project; the international 1000 genomes sequence variation project |
AAA | American Anthropological Association |
AAPA | American Association of Physical Anthropologists |
AFR | African “superpopulation” |
AMA | American Medical Association |
APA | American Psychological Association |
APOE4 | apolipoprotein E gene |
BBJ | BioBank Japan |
BIPMed | Brazilian Initiative on Precision Medicine |
CDC | Centers for Disease Control and Prevention |
CDE | common data element |
CEPH | northern and western European ancestry in Utah |
CEU | northern European in Utah |
CKB | China Kadoorie Biobank |
CONSORT | Consolidated Standards of Reporting Trials |
COREQ | COnsolidated criteria for REporting Qualitative research |
COVID-19 | coronavirus disease 2019 |
DNA | deoxyribonucleic acid |
EUR | European “superpopulation” |
FAPESP | Research Innovation and Dissemination Centers funded by the São Paulo Research Foundation |
GBR | British in England and Scotland |
GIH | Gujaratis sampled in Houston |
gnomAD | Genome Aggregation Database |
GWAS | genome-wide association study |
HAALSI | Health and Aging in Africa: A Longitudinal Study of an INDEPTH Community in South Africa |
HAAO | 3-hydroxyanthranilate 3,4-dioxygenase |
HapMap | International Haplotype Map Project |
HGP | Human Genome Project |
HIV | human immunodeficiency virus |
HLA | human leukocyte antigen |
HMD | Health and Medicine Division |
INDEPTH | International Network for the Demographic Evaluation of Populations and Their Health |
JAMA | Journal of the American Medical Association |
KBP | Korean Biobank Project |
KYNU | kynureninase |
LD | linkage disequilibrium |
MeSH | Medical Subject Headings |
mRNA | messenger RNA |
MXB | Mexican Biobank |
NAD | nicotinamide adenine dinucleotide |
NHGRI | National Human Genome Research Institute |
NHLBI | National Heart, Lung, and Blood Institute |
NIH | National Institutes of Health |
NIMHD | National Institute on Minority Health and Health Disparities |
NYU | New York University |
OMB | Office of Management and Budget |
PCA | principal component analysis |
PCORI | Patient-Centered Outcomes Research Institute |
PCSK9 | proprotein convertase subtilisin/kexin type 9 |
PEL | Peruvian in Lima, Peru |
PERSIAN | Prospective Epidemiological Research Studies in Iran |
PGS | polygenic score |
PKU | phenylketonuria |
PRISMA | Preferred Reporting Items for Systematic Review and Meta-Analysis |
PUR | Puerto Rican in Puerto Rico |
QBB | Qatar Biobank |
RNA | ribonucleic acid |
SALL4 | spalt like transcription factor 4 |
TOPMed | Trans-Omics for Precision Medicine |
TSI | Tuscans of Italy |
UCLA | University of California, Los Angeles |
UK | United Kingdom |
UKB | United Kingdom Biobank |
UMAP | uniform manifold approximation and projection |
UN | United Nations |
U.S. | United States |
WHO | World Health Organization |
YRI | Yoruba in Ibadan, Nigeria |
This page intentionally left blank.