Using Population Descriptors in Genetics and Genomics Research

A New Framework for an Evolving Field

The National Institutes of Health (NIH) requested the National Academies conduct a study to review and assess existing methodologies, benefits, and challenges in the use of race, ethnicity, and other population descriptors in genomics research. Using a new framework for change, the report describes guiding principles and best practices for research. This page contains some of the key messages from the report including guiding principles, key terms, and an interactive decision tree for researchers seeking to explore appropriate population descriptors for their studies, among other highlights from the committee’s work.


Everyone who engages in research with genomics data is a decisionmaker about the use of population descriptors. Because research in the field is dynamic and the report’s specific best practices may not cover every research situation, the report outlines guiding principles to foster ethical and empirical best practices for supporting trustworthy research. The guiding principles are respect, equity & justice, beneficence, validity & reproducibility, and transparency & replicability.

  • Respect for individual and community preferences, norms, and values should inform approaches when determining which population descriptors to use in research.
  • Equity and justice require determining whether and how the selection and use of population descriptors will produce equitable benefit to avoid reinforcing existing inequities or introducing new ones.
  • Beneficence calls on researchers to assess how the selection of population descriptors may not only generate potential good but also potential harm and requires consideration of the effect of population descriptors on health equity.
  • Validity and reproducibility require judicious evaluation of research objectives and assessment of the appropriateness and purpose of including population descriptors.
  • Transparency and replicability include the obligation to provide a clear rationale for the selection and/or use of population descriptors and to explain decision-making processes in an open and accessible manner to both other researchers and research participants, thus enhancing replicability.

Key Terminology and Definitions

Ancestry: a person’s origin or descent, lineage, “roots,” or heritage, including kinship.

Environment: the complex of physical, social, cultural, chemical, and biotic factors that act upon a person.

Ethnicity: a sociopolitically constructed system for classifying human beings according to claims of shared heritage often based on perceived cultural similarities (e.g., language, religion, beliefs); the system varies globally.

Genetic ancestry: the paths through an individual’s family tree by which they have inherited DNA from specific ancestors. Genetic ancestry can be thought of in terms of lines extending upwards in a family tree from an individual through their genetic ancestors. Shared genetic ancestry arises from having genetic ancestors in common (that is, overlapping lines of ancestry). In practice, shared genetic ancestry is typically inferred by some measure(s) of genetic similarity.

Genetic ancestry group: a set of individuals who share more similar genetic ancestries. In practice, a genetic ancestry group is constituted based on some measure(s) of genetic similarity. Once a set is designated as a genetic ancestry group, its members are often assigned a geographic, ethnic, or other nongenetic label that is common among its members.

Genetic similarity: quantitative measure of the genetic resemblance between individuals that reflects the extent of shared genetic ancestry.

Group label: name given to a population that describes or classifies it according to the dimension along which it was identified. An example is French as the label for a group identified by its members’ possession of French nationality, where nationality is the population descriptor.

Population descriptor: a concept or classification scheme that categorizes people into groups (or “populations”) according to a perceived characteristic or dimension of interest. A few examples are race, ethnicity, and geographic location, although this is a non-exhaustive list.

Race: a sociopolitically constructed system for classifying and ranking human beings according to subjective beliefs about shared ancestry based on perceived innate biological similarities; the system varies globally.

See Appendix B in the report for further comments, definitions, and citations.

Are Population Descriptors Needed in Your Study?

This interactive decision tree is meant to aid researchers who use genetics and genomics data in implementing the best practices detailed in the report. It offers guidance and provides specific questions to ask during study design to determine whether population descriptors are needed and, if so, which ones to use.


Download Decision Tree PDF

Key Questions

  • Genetic and genomic information has become far more accessible, and research using human genetic data has grown exponentially over the past decade. The use of genetic information is now widespread across biomedical research. Genetics and genomics research is conducted by a wide range of investigators across disciplines, who often use population descriptors inconsistently and/or inappropriately to capture the complex patterns of continuous human genetic variation. Because there is a misconception that humans can be grouped into discrete, innate categories, researchers can inappropriately rely on descent-associated population descriptors such as race to describe population genetic differences. Therefore, there is a need to create clear guidance about the use of population descriptors as well as an opportunity to implement substantive changes to the ways they are used.

  • There is a long history of prior attempts to address population descriptors, and this may create some skepticism about the usefulness of another report aiming to create best practices for this complex area, however, there are several reasons that this is a particularly opportune and important moment to offer concrete guidance to the research community.

    • The growth in research using human genetic data has occurred in part by major investments in large-scale studies, many of which have genomic sequence data of a more diverse set of populations, raising more questions about how best to represent their diversity in the study data. Clear guidance about the use of population descriptors is therefore urgently needed before the mistakes of the past are baked into this new era of genetics research.
    • There has been development of more advanced methods of understanding and describing population structure and variation. These advances have not been accompanied, however, by new approaches to the use of population descriptors in genetics and genomics research with an implementation and accountability plan.

    The report aims to emphasize that scientists must get the descent-associated concepts right—that is, have a clear understanding of what these descriptors represent and a rigorous rationale for using them—before selecting the appropriate group categories and labels to work with.

  • Since 2020, the U.S. scientific community has become more attentive to the urgency of addressing racism and the lack of diversity in science as well as the admission that little progress has been made in making science accessible and relevant to a more diverse citizenry (Yudell et al., 2020). Research universities embarked on efforts to address diversity, equity, and inclusion in their scientific and educational programs. The social construct of race, the role of intersectionality, and the fundamental effect of racism on all aspects of science and medicine have become parts of faculty trainings at many institutions (Dupree and Boykin, 2021; Holdren et al., 2022; Kossek et al., 2022). Journal editors recognized the problems of using racial labels in research studies, with growing calls for eliminating the use of high-risk proxy measures (Flanagin et al., 2021; Nature Human Behavior, 2022). The call to remove race from clinical prediction models spread rapidly because of the attention to the danger of false assumptions about innate racial differences and resulting harms to patients (Vyas et al., 2020). Recognition by the U.S. biomedical research community of the need to address the complex and important issue of population descriptors in genetics research has never been greater.

    In response to the timeliness of the study and the statement of task, the committee developed a framework to explore the use of population descriptors in genetics and genomics research (see Guiding Principles). Learn more about the report.


Dupree, C. H., and C. M. Boykin. 2021. Racial inequality in academia: Systemic origins, modern challenges, and policy recommendations. Policy Insights from the Behavioral and Brain Sciences 8(1):11-18.

Eisenmann, S., E. Bánffy, P. van Dommelen, K. P. Hofmann, J. Maran, I. Lazaridis, . . . P. W. Stockhammer. 2018. Reconciling material cultures in archaeology with genetic data: The nomenclature of clusters emerging from archaeogenomic analysis. Scientific Reports 8:13003.

Flanagin, A., T. Frey, and S. L. Christiansen. 2021. Updated guidance on the reporting of race and ethnicity in medical and science journals. JAMA 326(7):621.

Holdren, S., Y. Iwai, N. R. Lenze, A. B. Weil, and A. M. Randolph. 2022. A novel narrative medicine approach to dei training for medical school faculty. Teaching and Learning in Medicine:1-10.

Kossek, E. E., P. M. Buzzanell, B. J. Wright, C. Batz-Barbarich, A. C. Moors, C. Sullivan, . . . A. Nikalje. 2022. Implementing diversity training targeting faculty microaggressions and inclusion: Practical insights and initial findings. The Journal of Applied Behavioral Science:002188632211323

Nature Human Behavior. 2022. Science must respect the dignity and rights of all humans. Nature Human Behaviour 6(8):1029-1031.

Vyas, D. A., L. G. Eisenstein, and D. S. Jones. 2020. Hidden in plain sight—Reconsidering the use of race correction in clinical algorithms. New England Journal of Medicine 383(9):874-882.

Yudell, M., D. Roberts, R. DeSalle, S. Tishkoff, and 70 signatories. 2020. NIH must confront the use of race in science. Science 369(6509):1313-1314.

Learn More

Please e-mail to report any bugs with the decision tree. Please make sure to include as much detail about the problem as possible in your e-mail. Information like the kind of device you were using, the browser and version you were using (IE, Firefox, Chrome, Safari, etc.), your operating system (Windows, MAC), what you were doing when the problem occurred, and adding a screenshot may help us address the issue.