Page 31 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

2

Facial Recognition Technology

Development of facial recognition technology (FRT) began around 1970.¹ In the past decade, the pace of development has accelerated with the industrial adoption and adaptation of various neural network–based machine learning techniques. These advances have led to remarkable gains in recognition accuracy and speed.

Specifically, when photographs are acquired cooperatively and under constrained conditions—such as in passport or driver’s license applications or when crossing an international border—the photos are of sufficient quality to support high-confidence, high-accuracy retrieval from databases of such photographs. Using the leading 2023 face recognition algorithms to search a mugshot database of 12 million identities, fully 99.9 percent of searches will return the correct matching entry.² The only failures result from changes in facial appearance associated with acute facial injury and long-term aging. This one result, however, involved the use of photos taken under mostly ideal conditions in which the photography is formally standardized, and the subject cooperates with the photographer. If those conditions do not apply, accuracy falls off sharply. Between these two extremes, accuracy will vary and any measurement of it must be accompanied by a narrative about how the photos were acquired.

The potential for very high accuracy must be further qualified by considerations of what the FRT is used for, and on whom:

___________________

¹ T. Kanade, 1973, “Picture Processing System by Computer Complex and Recognition of Human Faces,” https://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/162079/2/D_Kanade_Takeo.pdf.

² P.J. Grother, M. Ngan, and K. Hanaoka, 2019, Face Recognition Vendor Test (FRVT)—Part 2: Identification, Washington, DC: Department of Commerce (DOC) and Gaithersburg, MD: National Institute of Standards and Technology (NIST), https://www.nist.gov/system/files/documents/2019/09/11/nistir_8271_20190911.pdf.

Page 32 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

Many applications require correct rejection of faces that are not in the database—that is, the minimization of false matches. This is critical to avoid identity mismatches that can, in certain applications, have adverse consequences for an individual’s civil liberties.
Error rates are not always the same for all queries; they can vary by demographic group or even person to person, but these variations are becoming smaller as FRT models continue to evolve.
Identical twins represent an extreme yet realistic example of persons who may cause false matches. Approximately 0.4 percent of births in the United States are identical twins.³
Some use cases require a search to produce high-confidence matches—where the face recognition software deems the match to be highly similar—that is, above some minimum pairwise similarity threshold.
Some applications include mechanisms to detect subjects trying to impersonate someone else or to conceal their own identity—for example, by wearing makeup or wearing a high-quality silicone mask. These evasion-detection mechanisms do not always work and can contribute to errors.

The frequency of errors always depends on the design and engineering of the system. The consequences of errors depend on how the system is used. This chapter begins by describing the algorithms, image capture hardware, and performance improvements over time. It then turns to pose, illumination, expression and facial aging challenges, demographic effects, and sources and consequences of errors. It concludes by looking at human examiner roles and capabilities and several salient attributes of commercially deployed FRT systems.

ALGORITHMS

A face recognition algorithm has three parts: a detector, a feature extractor, and a comparator. The detector will find a face in an image; perhaps rotate, center, and resize it; and produce an image suitable for feature extraction. The feature extraction step, known more generically as template generation, performs various elaborate computations on the pixel values, and produces a set of numbers that are known in various communities

___________________

³ P. Gill, M.N. Lende, and J.W. Van Hook, 2023, “Twin Births,” updated February 6, In StatPearls [Internet], Treasure Island, FL: StatPearls Publishing, https://www.ncbi.nlm.nih.gov/books/NBK493200.

Page 33 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-1** Face recognition pipeline.
NOTE: Many systems will only output a ranked list of candidates with similarity scores above a specified threshold.

as a template, a feature vector, or an embedding; this report uses the term “template.”⁴ This process is depicted in Figure 2-1.

A template is designed to support the core recognition goal of the comparator, which takes two templates and produces a single number expressing how similar the faces were that went into the templates. Comparison code is usually quite simple. The result is universally known as a similarity score, often normalized between 0 and 1. If the score is high, this is taken as an indication that the two input faces were from the same person. This interpretation is discussed further in the following text.

Face recognition can thus be used to confirm identity to authenticate that a user is who they claim to be. Similarity scores are not the same as personal identification numbers (PINs) and passwords, which authenticate a user only if they are identical to what the user initially specified. Note that similarity scores are used rather than a binary match/no match because no two photos of a face are identical—owing to even the slightest variations in lighting, facial expression, head position, and camera noise.

The task of FRT is to ignore the “nuisance” variations in face images such as those shown in Figure 2-2 and produce templates that when compared yield high similarity scores from the photos of the same person and low scores for photos of different people. This task is the core difficulty in improving FRTs’ accuracy; it is addressed today by training a neural network to learn from many highly variable images in terms of pose, illumination, expressions, aging, and occlusion of many people, ranging from tens of thousands to tens of millions. Such images are usually of real individuals, but in recent years there is considerable interest in synthesizing face images in unlimited quantities using a different class of neural network⁵ to increase the size of the training data.

___________________

⁴ The term “faceprint” has been used, but this should be deprecated because its progenitor “fingerprint” applies to an image, not to features derived from it.

⁵ P. Melzi, C. Rathgeb, R. Tolosana, et al., 2023, “GANDiffFace: Controllable Generation of Synthetic Datasets for Face Recognition with Realistic Variations,” arXiv:abs/2305.19962.

Page 34 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-2** Examples of variations in the face images of the same person that could alter a similarity score.
NOTE: These variations include full pose variation, a mixture of still images and video frames, and a wide variation in imaging conditions and geographic origin of subjects.
SOURCE: © 2015 IEEE. Reprinted, with permission, from B.F. Klare, 2015, “Pushing the Frontiers of Unconstrained Face Detection and Recognition: IARPA Janus Benchmark A,” *2015 Conference Proceedings on Computer Vision and Pattern Recognition (CVPR)* 1931–1939.

History

Although human beings have been using faces to recognize one another since time immemorial,⁶ the work on enabling computers to recognize human faces was started in the mid-1960s by Woodrow W. Bledsoe and his colleagues at Panoramic Research. Bledsoe qualified his face recognition system as a “man-machine” system, because it required human experts to first manually locate some facial landmarks on a photograph. The comparison was then performed automatically based on 20 normalized distances derived from these facial landmarks (e.g., width of the mouth, width of eyes, etc.). Bledsoe observed that “[t]his recognition problem is made difficult by the great variability in head rotation and tilt, lighting intensity and angle, facial expression, aging, etc.”⁷

A method to automatically extract such facial landmarks was first proposed in Takeo Kanade’s 1973 PhD thesis, which can be considered to have presented the first fully automatic FRT system.⁸ Although the earliest face recognition systems were based on geometric features (distances between pre-defined landmarks), Sirovich and Kirby in 1987⁹ and later Turk and Pentland in 1991 showed that faces could be represented by extracting features from all the pixels in the whole image by a method known as principal component analysis.¹⁰ This holistic appearance-based technique generates a

___________________

⁶ Adapted in part from A.K. Jain, K. Nandakumar, and A. Ross, 2016, “50 Years of Biometric Research: Accomplishments, Challenges, and Opportunities,” Pattern Recognition Letters 79(3.2):80–105, https://doi.org/10.1016/j.patrec.2015.12.013.

⁷ W.W. Bledsoe, 1966, Man-Machine Facial Recognition: Report on a Large-Scale Experiment, Palo Alto, CA: Panoramic Research, Inc.

⁸ T. Kanade, 1974, Picture Processing System by Computer Complex and Recognition of Human Faces, https://repository.kulib.kyoto-u.ac.jp/dspace/bitstream/2433/162079/2/D_Kanade_Takeo.pdf.

⁹ L. Sirovich and M. Kirby, 1987, “Low-Dimensional Procedure for the Characterization of Human Faces,” Journal of the Optical Society of America A 4(3):519, https://doi.org/10.1364/josaa.4.000519.

¹⁰ M. Turk and A. Pentland, 1991, “Eigenfaces for Recognition,” Journal of Cognitive Neuroscience 3(1):71–86, https://doi.org/10.1162/jocn.1991.3.1.71.

Page 35 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

compact representation of the entire face region in the acquired image. As an example, a 64 × 64 pixel face image (a total of 4,096 pixels) could be represented in terms of merely 100 feature values that are learned using a training set of face images. These features have the property that they could be used to reconstruct the original face image with sufficient fidelity. Two other historical examples of face recognition approaches are the local feature analysis method of Penev and Atick and the Fisherface method of Belhumeur et al.^11,12

Model-based techniques derive a pose-independent representation by building two-dimensional or three-dimensional models of the face. They generally rely on detection of several fiducial points in the face such as the chin, the tip of the nose, the corners of eyes, or the corners of the mouth. The pioneering work in this area was Wiskott et al.’s elastic bunch graph matching approach.¹³ Another advance, which uses three-dimensional models and both facial texture and shape features, is the morphable model proposed by Blanz and Vetter.¹⁴

Appearance-based schemes use raw pixel intensity values and are thus very sensitive to variations in ambient lighting and facial expression. Texture-based methods such as scale-invariant feature transform¹⁵ and local binary patterns¹⁶ were developed to reduce that sensitivity. These methods make use of more robust representations that characterize image texture using the distribution of local pixel values rather than individual pixel values.

Most face recognition techniques assume that faces can be aligned and properly normalized geometrically and photometrically. Alignment is typically performed using the location of the two eyes in a face. The face detection scheme developed by Viola and Jones¹⁷ was a milestone because it enables faces to be detected in real time even in the presence of background clutter, a situation commonly encountered in surveillance applications. Even though the Viola–Jones detector performs very well in real-time applications, it struggles with illumination changes, non-frontal facial poses, and occlusion—and is thus outdated.

___________________

¹¹ P.S. Penev and J.J. Atick, 1996, “Local Feature Analysis: A General Statistical Theory for Object Representation,” Network: Computation in Neural Systems 7(3):477–500.

¹² P.N. Belhumeur, J.P. Hespanha, and D.J. Kriegman, 1997, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7):711–720.

¹³ L. Wiskott, J.-M. Fellous, N. Krüger, and C. Von Der Malsburg, 2022, “Face Recognition by Elastic Bunch Graph Matching,” Pp. 355–396 in Intelligent Biometric Techniques in Fingerprint and Face Recognition, New York: Routledge.

¹⁴ V. Blanz and T. Vetter, 2003, “Face Recognition Based on Fitting a 3D Morphable Model,” IEEE Transactions on Pattern Analysis and Machine Intelligence 25(9):1063–1074.

¹⁵ D.G. Lowe, 1999, “Object Recognition from Local Scale-Invariant Features,” Proceedings of the International Conference on Computer Vision 2:1150–1157.

¹⁶ T. Ojala, M. Pietikainen, and D. Harwood, 1994, “Performance Evaluation of Texture Measures with Classification Based on Kullback Discrimination of Distributions,” Proceedings of 12th International Conference on Pattern Recognition 1:582–585.

¹⁷ P. Viola and M.J. Jones, 2004, “Robust Real-Time Face Detection,” International Journal of Computer Vision 57(2):137–154.

Page 36 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

Artificial Intelligence–Based Revolution

Over the past decade, the field of face recognition has significantly advanced, primarily due to breakthroughs in an artificial intelligence technique known as deep convolutional neural networks (DCNNs), which were originally developed for optical character recognition and later applied to diverse computer vision tasks such as automated driving and medical image analysis. These deep learning techniques have proven to provide the most prominent advance in face recognition.

The application of DCNNs to face recognition was demonstrated to great effect in 2014, when researchers at Facebook trained a network with between 800 and 1,200 photos of each of 4,030 persons to obtain greatly improved accuracy on the open benchmark data sets of the day.¹⁸ The performance gains stemmed from increased tolerance of nuisance properties of image invariance to facial appearance variations that are extraneous to the identity of the subject. It remained to be seen whether that class of algorithm could also learn to distinguish between individuals in much larger populations than the 4,030 that Facebook used, a requirement because even before 2014, face recognition algorithms were being applied to populations of tens of millions. Ultimately, Facebook’s approach—leveraging larger numbers of photos from social media—proved revolutionary for the wider biometrics industry: over the next decade, the suppliers of face recognition algorithms largely discarded their prior hand-crafted feature techniques and adopted the new DCNN methods, adapting, modifying, and expanding them as an enormous research community developed the new technologies. Research since 2014 has further evolved the DCNN-based approach.¹⁹ A 2019 paper described significant improvements to the design of loss functions for face recognition.²⁰ A well-maintained Git repository²¹ contributes to the popularity of this work in the computer vision community and has helped make it the “go to” approach in face recognition and establish it as a new baseline. It has received more than 5,600 citations since its publication.

The deep neural network approach is illustrated in Figure 2-3. Once a face has been detected in its parent image, it will usually be rotated, cropped from its parent image, and then resized to the size of the input layer of the neural network. Some

___________________

¹⁸ Y. Taigman, M. Yang, M. Ranzato, and L. Wolf, 2014, “Deepface: Closing the Gap to Human-Level Performance in Face Verification,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708.

¹⁹ Some of this work was supported by the Intelligence Advanced Research Projects Agency through the JANUS program that ran from 2014 to 2020. See Office of the Director of National Intelligence, “JANUS,” https://www.iarpa.gov/research-programs/janus, accessed November 17, 2023.

²⁰ J. Deng, J. Guo, N. Xue, and S. Zafeiriou, 2019, “ArcFace: Additive Angular Margin Loss for Deep Face Recognition,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 4690–4699, https://doi.org/10.1109/CVPR.2019.00482.

²¹ J. Guo and J. Deng, 2021, “ArcFace with Parallel Acceleration on Both Features and Centers, Original MXNet Implementation on InsightFace,” GitHub, https://github.com/deepinsight/insightface/tree/master/recognition/arcface_mxnet.

Page 37 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-3** Images that contain coarse patterns extracted from the input.
SOURCE: © 2014 IEEE. Reprinted, with permission, from T. Yaniv, M. Yang, M. Ranzato, and L. Wolf, 2014, “DeepFace: Closing the Gap to Human-Level Performance in Face Verification,” *2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)* 1701–1708.

developers may perform these steps in a different order. Some developers may apply various image processing steps also—for example, to brighten the image. The input layer of the neural network is quite small—say, 112 × 112 pixels or 256 × 256 pixels—and usually square. This has implications, as discussed later.

The DCNN accepts the input image, usually as a color image with red, green, and blue color channels, and feeds it forward through a many-layered computation. In the first layer, the pixels are weighted and averaged and combined in many ways, the net effect of which is to produce a set of somewhat smaller-size outputs that can be viewed as images that contain coarse patterns extracted from the input (see Figure 2-3).

This output is then passed through a non-linear function, a necessary hallmark of neural computation. The second layer proceeds with a slightly different set of weights and computations, and its output is again transformed non-linearly. The layered computation continues with each output, when visualized, being a more abstract, less human-interpretable, version of the input face image. The feed-forward process culminates with the production of a vector, a set of numbers that comprises the template. The set of numbers is included in the biometric template, perhaps along with bookkeeping information such as the date, and the version of the DCNN.

Templates are generally reversible—they do not provide the privacy benefits afforded by one-way hashes; they can be reversed, with some difficulty, to something with some resemblance to the original face.²² They can also leak other information about an individual such as sex. As a result, templates must also be protected from disclosure in order to protect individual privacy.

___________________

²² See A. Zhmoginov and M. Sandler, 2016, “Inverting Face Embeddings with Convolutional Neural Networks,” arXiv preprint, arXiv:1606.04189; or G. Mai, K. Cao, P.C. Yuen, and A.K. Jain, 2019, “On the Reconstruction of Face Images from Deep Face Templates,” IEEE Transactions on Pattern Analysis and Machine Intelligence 41(5):1188–1202.

Page 38 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

There is considerable variation in template generation speed across today’s algorithms, with accurate algorithms producing templates from 0.1 second to several seconds on a server-class CPU. Faster algorithms can be ported to run on processors embedded in cameras or physical access-control devices. Graphical processing units (GPUs) that are considered essential for training algorithms are typically not necessary for recognition. When FRT is applied in video feeds, or when many images are captured, or when many faces appear in an image, a GPU may be employed to provide real-time recognition.

Resolution

Contemporary face recognition algorithms operate at very low resolution. They typically operate on face photographs that have been cropped and resized so that the head and face fill an image of size 112 × 112, 128 × 128, or 256 × 256 pixels. These sizes mean that the inputs to the algorithms have resolution low enough that it will not be possible to see human hair, skin pores, and similar-size detail. Operators of face recognition often cite standards that mandate collection of larger images, but core algorithms operate at a size determined by developers. Sizes are much smaller than the images collected by contemporary mobile phones or digital cameras (e.g., 3,000 × 4,000 pixels). They are also much smaller than the images preferred by the community of forensic examiners who review face pairs and testify in court. Human reviewers find value in high-resolution images because they support exculpation: if a specific feature is visible in one photo but not the other, this can be dispositive.

For example, Figure 2-4 shows how scars and moles could enable a reviewer to correctly distinguish between identical twins. Such marks are often not present in younger twins. Also, such fine details are typically not used by automated algorithms because they are often not visible in low-resolution images. These issues argue for the wholesale migration of the industry to high-resolution images, something that is not readily achieved because such images are not available to developers of FRT algorithms in sufficient quantities for training DCNNs.

Template Extraction Model Training

The models in face recognition algorithms convert an image to a template. The models are usually trained in the developer’s research and development laboratories; each developer uses different variants of DCNN and training protocols and has access to different training sets. Furthermore, these models are rarely trained on data derived from the operational environment where the system is ultimately deployed. Therefore, the characteristics of the images in the data set used to train the FRT model may differ from those encountered in an operational setting.

Page 39 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-4** Highlighted unique identifiers in two portrait photographs of identical twins.
SOURCE: © 2011 IEEE. Reprinted, with permission, from B. Klare, A.A. Paulino, and A.K. Jain, 2011, “Analysis of Facial Features in Identical Twins,” *2011 International Joint Conference on Biometrics (IJCB)* 1–8.

Training is key to the performance of the algorithm, and much of the intellectual property resides in the expert curation of data sets, selection of architecture, specification of loss functions, intervention, and selection and tuning of parameters. The training is almost always supervised, a term borrowed from machine learning that means that each training sample (face image) has an identity label associated with it. Thus, during training the DCNN learns to associate face images of the same identity and simultaneously to distinguish between faces of different identities and does this with low classification error. It is of commercial value therefore for a developer to possess, or have access to, a large number of face images and their associated identity labels. Such databases should come from a large number (millions) of individuals and have a large number (thousands) of diverse images per individual. The identity labels must have high integrity—the person in the image must be correctly labeled. It is costly to procure such a large collection of labeled photos, an expense that was historically avoided by many researchers by collecting photos from the Web—the popular Labeled Faces in the Wild²³ and MS-Celeb²⁴ databases were assembled in this way—and several such databases have

___________________

²³ G.B. Huang, M. Ramesh, T. Berg, and E. Learned-Miller, 2007, Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments, Technical Report 07-49, Amherst: University of Massachusetts.

²⁴ Y. Guo, L. Zhang, Y. Hu, X. He, and J. Gao, 2016, “MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition.” Pp. 87–102 in Computer Vision–ECCV 2016, European Conference on Computer Vision, Lectures Notes in Computer Science, Vol. 9907, Cham, Switzerland: Springer.

Page 40 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

since been expunged due to privacy and ethics concerns. Ironically, the Diversity in Faces database,²⁵ which was assembled to support development of equitable face analysis algorithms, was withdrawn as the collection of images from the Web quickly became controversial. The database was not suitable for development of actual recognition algorithms because it did not include identity labels.

Comparison and Similarity Scores

The final step in the face recognition algorithm is to compare two templates. The comparison module is often a simple piece of code that accepts two templates and computes some measure of how similar they are. This is known as one-to-one comparison. The method is generally a trade secret, but it generally treats the templates as vectors in a notional high-dimensional space and measures distance as a Euclidean distance (as the crow flies), a Manhattan distance (walking city blocks), or simply the angle between these vectors. If the distance measure is small (or equivalently the similarity measure is high), then it is likely that the two photos are of the same face (see also the discussion of errors in the section “Accuracy”). By industry convention, such numbers are presented on a similarity scale, where bigger values connote similarity of the faces.

Although high similarity scores are often construed to indicate sameness of identity of faces in two photographs, a low score should not be taken to be a definitive statement that two faces are from different people. The key factor is photo quality. Consider a comparison of two photos of the same person—a passport-style photo compared with an image of a face captured from a camera whose lens was far from well focused. The second photo has low resolution or information content such that most face recognition algorithms will return a low similarity score, just as they would from comparison of two high-information content passport photos of unrelated people. Thus, low scores stem from either a difference in identity or low image quality.

Importantly, similarity scores cannot be interpreted as likelihoods, probabilities, or a “percentage match.” This is true because each developer emits scores on their own proprietary interval; it is common to use [0,1], [0,100], but others use [0,19000], [2,3], and [0.6,0.9]. The distribution of scores within those intervals will vary by developer: some give continuous normal-like distributions; others arrange to pin non-mate and mate scores to 0 and 1, or 0 and 100, respectively. As such, there is no universal interpretation of when a similarity between two faces is “strong”—that is, high enough to confirm that two photos are of the same person. Nevertheless, such interpretations are sometimes made by system operators, and this can prejudice or bias human review of

___________________

²⁵ M. Merler, N. Ratha, R.S. Feris, and J.R. Smith, 2019, “Diversity in Faces,” arXiv:1901.10436.

Page 41 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

images.²⁶ There are no standards governing score values or statistical properties of similarity scores.

One-to-Many Identification

The larger and more demanding uses of face recognition involve search, known as one-to-many identification. Such applications first construct a template from “probe” imagery and then search it in a collection of previously enrolled templates known as a reference database or gallery. This operation is useful because the gallery entries are accompanied by some metadata—for example, a name, a location, or a URL—so that a successful search can yield some knowledge about the person in the search photo. It is very commonly implemented by comparing the probe’s template with each enrollment template, followed by a sort operation that ranks and returns the most similar enrollments.

There are also algorithms that use alternative approaches to a series of one-to-one comparisons with each template in a gallery. Some use fast search algorithms, which afford extremely rapid search but with one-time expense of building a data structure such as a tree, graph,²⁷ or a dictionary. Others use a prebuilt data structure to provide better demographic stability. These algorithms, which represent a sizable minority of all search algorithms, do not yield the same scores as performing the series of one-to-one comparisons.

Some search algorithms are built to give sublinear search time. This means that if the number of images enrolled into a reference database is increased 100-fold, the search duration may only grow by, say, 2-fold. Such systems are characterized by very fast search. One highly accurate algorithm submitted to the National Institute of Standards and Technology’s (NIST’s) Facial Recognition Vendor Test performs a search of a 12-mil-lion-entry database in a few tens of milliseconds on a commodity CPU. Such capability, without any loss in search accuracy, is essential to practical applications in which many faces are searched against potentially large databases. The alternative, to use a linear search algorithm, would require more hardware resources.

___________________

²⁶ J.J. Howard, L.R. Rabbitt, and Y.B. Sirotin, 2020, “Human-Algorithm Teaming in Face Recognition: How Algorithm Outcomes Cognitively Bias Human Decision-Making,” PLOS ONE 15(8), https://doi.org/10.1371/journal.pone.0237855.

²⁷ Y.A. Malkov and D.A. Yashunin, 2020, “Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs,” IEEE Transactions on Pattern Analysis and Machine Intelligence 42(4):824–836.

Page 42 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

IMAGE ACQUISITION

Practical face recognition systems have also benefited from improvements in camera resolution and resulting image quality.

Cameras

The role of the camera as part of a face recognition system is to provide an image suited to the recognition process. The appearance of such images has been formally standardized since the 1990s, and de facto standardized since faces were collected in the criminal justice system more than a century ago and printed on international travel documents started after World War II. Today, the standard face appearance is specified by the ISO/IEC 39794-5:2019 standard,²⁸ which defines a placement geometry and frontal viewpoint as illustrated in Figure 2-5, and requires the absence of blur, shadows, occlusion, and areas of under- or overexposure.

The availability of low-cost, compact, and high-resolution cameras that can be embedded in various devices has been a key enabler of real-time and accurate FRT systems.

A key turning point in camera technology was the commercialization of digital cameras in the early 1990s. The frame rate, pixel density, and pixel sensitivity of image sensors have improved significantly. At the same time, image sensors have become smaller and cheaper, and good-quality face images can be captured today using smartphones or wearable devices. Low-cost cameras, such as Microsoft’s Kinect, that can capture three-dimensional images in real time also entered the commercial market.

Cameras in use today range from inexpensive webcams to long-range surveillance cameras—and despite the overall improvements described here produce images that cover a wide range of quality. They are differentiated by several technical factors. First is whether they furnish a single image (stills) or a video stream. Stills are used in many applications, such as capturing a passport photo, while videos are naturally produced in settings where continuous imaging is in use, such as a closed-circuit television (CCTV) security camera. A second technical factor is whether the camera has any built-in capability for detecting faces.

Almost all security cameras, body-worn cameras, and ATM cameras observe and record scenes without specifically detecting and recognizing faces, which typically undermines image quality and face recognition accuracy. On the other hand, mobile phones are often equipped with cameras that will detect a face in a scene and, assuming

___________________

²⁸ International Organization for Standards (ISO), 2019, “Information Technology—Extensible Biometric Data Interchange Formats—Part 5: Face Image Data,” ISO/IEC 39794-5:2019.

Page 43 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-5** Example of the standard face appearance.
SOURCE: P. Grother, M. Ngan, and K. Hanaoka, 2019, *Face Recognition Vendor Test (FRVT), Part 3: Demographic Effects*, NISTIR 8280, Washington, DC: National Institute of Standards and Technology, Department of Commerce, https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8280.pdf.

that is the object of interest, focus and correctly expose that face. Such face-aware capture, although intended for aesthetic reasons, will improve face recognition accuracy essentially as a by-product. Mobile phone camera quality benefits also from high-dynamic-range sensors and the use of computational photography techniques.

Specifying the correct camera for an application is usually not sufficient to ensure accuracy because the environment in which it is used influences the properties of images. For example, if a camera is placed facing a window, subjects’ faces can be underexposed. Similarly, if a building access control system is equipped with a camera expected to operate at night, then supplemental illumination will be necessary. There are many applications that allow for the deployment of face recognition systems in environments that support high accuracy.

Image Quality Assessment

Some systems incorporate quality assessment (QA) software that analyzes a photograph and quantifies whether it is in some sense acceptable. There are several use cases for such a capability—all are intended to improve the quality and thereby the likelihood that downstream recognition will succeed. A primary role for QA software is to detect a poor photograph and immediately prompt the subject or the photographer to take a better photograph. The software sometimes offers specific feedback on how to correct the problem. Typical problems include blur owing to motion; the subject not facing the camera; part of the face not visible owing to the subject wearing a cap, scarf, sunglasses, or the like; or the subject presenting a non-neutral expression.

Page 44 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

Presentation Attack Detectors

In applications of face recognition that confer some benefit to the subject, there may be an incentive for a bad actor to attempt impersonation—that is, to fool the system into affirming a match to a falsely claimed identity. This deception is commonly attempted by presenting a printed photo or tablet display, or by wearing a face mask. Presentation attack detectors (PADs) are intended to thwart such attacks. They consist of software and sometimes hardware intended to generate additional signals for analysis.

In other applications, where a subject is motivated to not be recognized by a system, they may alter their appearance—for example, by wearing a disguise or a mask, or by presenting a photo of someone else. Again, the PAD system is intended to detect the subversive attempt.

POSE, ILLUMINATION, EXPRESSION, AND FACIAL AGING EFFECTS

Technological advancements have progressively tackled challenges caused by variations associated with pose, illumination, and expression. Contemporary algorithms are trained to tolerate such appearance changes, and also to handle changes inherent in facial aging. This is achieved, as mentioned earlier, by DCNNs that extract from photographs only the information that is salient to identity and ignore these so-called nuisance variations.

To demonstrate insensitivity to such extraneous factors, consider the following photo search results. When the image shown in Figure 2-6(A) is placed in a database with mugshots of 12 million other adult individuals, many recent face recognition algorithms correctly return it as the most similar face when searched with any of the photos shown in Figure 2-6(B). Those photos, taken from 2 to 19 years later, exhibit various changes in facial appearance—see the captions—that until the current decade would have mostly proved fatal to recognition retrieval.

The search accuracy described earlier has enabled many commercial and law enforcement applications—for example, detection of duplicate driver’s license photos. Although the population of six of the U.S. states exceeds the 12 million used here, the technology remains viable in much larger populations, with a retrieval rate or search accuracy that declines slowly with increase in gallery size, as discussed later. This success, analogous to the needle-in-the-haystack problem, has limits. If the quality of the probe photograph is sufficiently degraded, as in Figure 2-6(C), the search will fail. For that image, all but one algorithm used in an NIST test fails to find the true match.

Page 45 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-6** (A) Original image; (B) examples of changes in facial appearance that modern algorithms can correctly match; and (C) an example of a degraded image for which a search will fail.
SOURCE: P. Grother, with permission.

Two algorithms in the NIST test yielded partial success: One found the match, but judged 15 of the 12 million non-matching photographs to be more similar—that is, returned a rank 16 match. The 15 more-similar candidate identities are false matches—instances where the wrong identity is returned. A second algorithm gave the match at rank 42. These two outcomes show the power of the technology near its limit. The two algorithms can discern enough information from a heavily blurred photo to allow top 50 retrieval in a database of size 12 million.²⁹

These two outcomes show why law enforcement investigators find extraordinary value in FRT; they potentially get a lead that, without FRT, they would not have. The fact that both algorithms yielding a match did so with a high rank is problematic in that a human reviewer must exonerate the other candidate identities. This point is discussed further in the section “Demographic Disparities.”

High-rank hits (i.e., low similarity values for true matches) were much more common a decade ago even with better-quality search photos because the algorithms then could not discern information in a photograph to support assignment of high scores

___________________

²⁹ National Institute of Standards and Technology, 2020, “Face Recognition Vendor Test (FRVT),” updated November 30, https://www.nist.gov/programs-projects/face-recognition-vendor-test-frvt.

Page 46 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

to true matches. Today, a high proportion of searches will return the correct match at rank 1. However, when the query face quality is low, and face aging has occurred (i.e., a large time lapse between the search photo and its true mate in the database), the true matches will have low similarity, comparable to those of false matches. These outcomes can present operational problems, because there is no clear result for the search photo. The impact of such outcomes depends on how the technology is used.

The primary source of false matches is when the person in a search photo has no match in the database. For example, most casino patrons would not be present in the establishment’s compulsive gamblers or card-sharp databases. To suppress false positives (FPs) in such applications, a face recognition system for this application should be configured to return only highly similar candidates. If one is returned, further action is implied, either taking another photo and searching again, or involving a human to review the candidate identity.

This is a difficult task, as discussed later, made more difficult because of facial similarity that occurs naturally, particularly in twins and other siblings. As an example, a photograph of one adult sister was placed along with 12 million unrelated photos, and the resulting database searched with a photo of the other sister. All algorithms tested returned the sister as the most similar match. The similarity score was lower but still higher than those from searches of unrelated individuals. The system could be configured to correctly reject the sister in this instance, but that would not be effective for identical twins, who almost always produce high-scoring false matches.³⁰

The approach of configuring a similarity threshold is unusual in criminal investigations. There, a face recognition search always returns a list of the most similar candidate photos. These are presented to police officials, in order of similarity to the search photo, for review in a bid to determine the identity of a face in an unknown photograph. The system is configured without a threshold, so the algorithm returns candidates whether the subject is in the database or not. By employing a human reviewer to compare photos and make decisions, accuracy becomes dependent on both the algorithm and the human. This has important consequences, as discussed here.

To see why face recognition is used in this way, consider the investigation of the Boston Marathon bombing.³¹ There, authorities attempted to determine identities of all onlookers. Face recognition was used, and while it did not prove fruitful at that time, the motivation was clear. If one were to repeat two of the searches with present-day algorithms, the investigation might have been different. Repeating the demonstration here, when the Figure 2-7(A) photo of the convicted bomber is placed into a

___________________

³⁰ Ibid.

³¹ J.C. Klontz and A.K. Jain, 2013, “A Case Study on Unconstrained Facial Recognition Using the Boston Marathon Bombings Suspects,” Technical Report, Michigan State University.

Page 47 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-7** Database image and example search images of Boston Marathon bomber.
SOURCES: (A) Handout/Getty Images News via Getty Images, https://www.gettyimages.com/detail/news-photo/in-thisimage-released-by-the-federal-bureau-of-news-photo/166984823. (B, C) Federal Bureau of Investigation, 2013, “News Surveillance Video Related to the Boston Bombings,” https://www.fbi.gov/video-repository/newss-surveillance-videorelated-to-boston-bombings.

12-million-individual mugshot database, all algorithms tested by NIST find the person in Figure 2-7(B) and correctly return the match—and 10 contemporary algorithms placed the correct image at rank 1. That occurs despite the blur, chin occlusion, and viewpoint change. In 2013, however, face recognition was not successful at identifying the perpetrators even though their photos were in governmental databases.³² Even with a decade of improvements, none of the algorithms in 2023 succeeded at recognition using Figure 2-7(C) as the search photo owing to the blur, downward viewpoint, and shadow.

This result occurs despite ongoing research efforts focused on recognition of CCTV-captured and other images where neither the photographic environment nor the subject’s viewpoint with respect to the camera are conducive to providing high-quality face images for recognition. When a face image simultaneously contains multiple confounding factors such as variations in facial pose, illumination, expression, occlusion, image resolution, and facial aging, facial recognition may succeed or fail depending on the extent of those problems. However, recognition performance degrades for unconstrained face images—where image acquisition is uncontrolled and subjects may be uncooperative—requiring human intervention for accurate recognition.

___________________

³² S. Gallagher, 2013, “Why Facial Recognition Tech Failed in the Boston Bombing Manhunt,” Ars Technica, updated May 7, https://arstechnica.com/information-technology/2013/05/why-facial-recognition-tech-failed-in-the-boston-bombing-manhunt.

Page 48 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

ACCURACY

Face recognition works by comparing faces appearing in photos and producing measures of similarity. In most applications, a decision must be produced—for example, should the phone unlock, should the door open, or should a person board an aircraft without that person’s identity document being checked? As with other biometric traits such as fingerprints, decisions are made by comparing the similarity score to a threshold. The threshold is set by the system owner, often based on a provider recommendation. The appropriate threshold (and the acceptable error rate) for a particular application depends heavily on the statistics of the images and the relative costs of false negative (FN) and FP matches for the application. In a one-to-one authentication context, the threshold is typically set so that it is unlikely that unauthorized access will be granted.

However, face recognition, as with other kinds of authentication, sometimes fails. The next sections give terms and definitions to the sorts of errors that occur. More formal and extensive definitions and requirements for testing of biometric systems can be found in the ISO/IEC 19795-1:2019 standard.³³

Errors in One-to-One Verification Systems

Two types of error are possible. First is a false negative match, in which the face recognition algorithm fails to emit a similarity score above a decision threshold, and thereby fails to associate the two images of one face. Second is a false positive match, in which the algorithm produces a spuriously high score from images of two people.

A third category of error is possible: failures relating to cameras not collecting a photo (known variously as failure to capture, or failure to acquire) or of the algorithm failing to find or extract usable features from an image (failure to enroll or failure to extract template). Note that template generators can be configured to not produce an output if the input sample was of low quality; otherwise, a template may cause false matches or false non-matches in subsequent recognition. Quality assessment is considered essential to the ethical use of face recognition.

Errors in One-to-Many Identification Systems

One-to-many search systems take a photo of a face and return similar faces from one or more reference databases. For example, a person entering a casino could be searched against a database of known cheats, and against a database for high rollers. Face

___________________

³³ ISO, 2021, “Information Technology, Biometric Performance Testing and Reporting—Part 1: Principles and Framework,” ISO/IEC 19795-1:2021.

Page 49 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

recognition identification systems are generally configured in two ways—automated identification and investigational use.

Automated Identification

The system returns faces that are more similar than a numerical threshold. The threshold is specified by the system owner, and the users of the system must have a procedure to handle multiple matches.

With automated identification, FNs occur when the person in the probe image is present in the reference gallery but is not matched. FNs also occur because the algorithm finds the search photo to be dissimilar, at the specified threshold, to its reference gallery mate.

FPs occur when a non-mated search yields any candidates. A non-mated search is one in which the person in the photograph is not present in the reference gallery. FPs also occur when a comparison of the search photo and a reference gallery entry yields a similarity score at or above a threshold.

Investigational Use

The system is configured with a threshold of zero and returns the top K most-similar faces. The value K is usually specified by the system owner’s policy. More rarely, the value might be set by the investigator running the search—for example, to lower the threshold in an investigation of a serious crime—and in a manner consistent with policy set by the system owner. In this configuration, human review is a necessary and integral part of what is then an automated-plus-human system.

FNs occur in mated searches when (1) the search does not include the correct mate in the top K candidates or (2) the search does place the correct mate in the candidate list, but the human reviewer misses it because they judge it to be a non-mate.

An FP can occur in two cases. The first case is a non-mated search where the human reviewer erroneously associates the search photo with one of the K candidate reference images. An FP can also occur for a mated search if the human reviewer misses the correct mate photo and instead associates the search photo with one of the other candidate reference images. Note, the FRT component returns K candidates whether the searched person is in the reference database or not, because the threshold is set to zero. This means that that the false positive identification rate (FPIR) of the FRT engine is 100 percent. If, instead, the algorithm were equipped with a somewhat higher threshold, candidate list lengths often would be reduced, thereby offering the human reviewer fewer opportunities to make an FP mistake, but also fewer opportunities to detect weakly matching mates.

The quality, thoroughness, and accuracy of the human review is critical to this process. In operational settings, the reviewer, who may not be an expert, or even trained,

Page 50 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

could be working under time-pressure or urgency imperatives related to the case. In such circumstances, mistakes will occur. Even without such exigencies, human review may not be reliable, as discussed later. The interaction between machine and human has previously been studied in the related area of latent fingerprint matching,³⁴ where a low-quality sample is compared with an exemplar print retrieved in a biometric search.

Primary Causes: What Typically Causes False Negatives and False Positives?

False Negatives

Face recognition is sensitive to changes in appearance of a subject. Consider the two photographs of musician John Lennon at different times in his life (Figure 2-8).

The primary causes of change in appearance are aging, poor photography, poor presentation, and acute injury. Poor photography reduces image quality, with typical manifestations being underexposure, overexposure, and misfocus. Poor presentation also reduces image quality, typically arising because the subject does not look at the camera, or moves, inducing motion blur. Many other factors can reduce pairwise similarity. These include occlusion (a waved hand or sunglasses, for example); resolution (face is too small or the camera’s optics are poor); noise (owing to low light or weather); and image compression (owing to misconfiguration or low-bit-rate video).

In applications where subjects make cooperative presentations to a camera, FN rates can rise owing to poor usability. This is especially true in systems that are not used regularly—like border control gates—where subjects will not be habituated to the process. In such cases, usability testing is especially valuable. Some systems have good affordance and achieve low FN rates. Such systems usually allow a subject to retry.

False Positives

If a face recognition system erroneously associates photos of different people, an FP occurs. FPs arise primarily owing to similar appearance of two faces, which primarily arises from biological similarity of the faces, such as occurs in relatives, and particularly identical twins. This is discussed further in the discussion of demographic effects in the section “Demographic Disparities.”

FPs can occur due to similarity of artifacts in images, such as similar thick-framed eyeglasses, or prominent nostrils. Such effects are idiosyncratic to the algorithm, and generally less common in recent algorithms.

In large-scale one-to-many identification systems, where tens or hundreds of millions of people could be represented in the reference database, there is an elevated

___________________

³⁴ I.E. Dror and J.L. Mnookin, 2010, “The Use of Technology in Human Expert Domains: Challenges and Risks Arising from the Use of Automated Fingerprint Identification Systems in Forensic Science,” Law, Probability and Risk 9(1):47–67, https://doi.org/10.1093/lpr/mgp031.

Page 51 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-8** Two photos of musician John Lennon at different times in his life illustrate change in appearance.
SOURCES: (*Left*) E. Koch, National Archives/Anefo, http://hdl.handle.net/10648/aa6be4d4-d0b4-102d-bcf8-003048976d84. (*Right*) J. Evers, National Archives/Anefo, http://hdl.handle.net/10648/ab63fd72-d0b4-102d-bcf8-003048976d84.

chance of an FP match. In many systems, if the size of the reference database increases, the threshold will need to be increased to maintain a target FP identification rate. Some systems address this automatically.

Accuracy Improvements Over Time

The accuracy of a biometric system is estimated by conducting empirical trials. The result is a measurement of an FP rate and an FN rate. To compare systems, an analyst will configure a decision threshold for each system that yields a particular FP rate—say, 1 in 10,000—and then report the FN rate. Figure 2-9 shows how such a measure has improved since 2017 for algorithms from one industrial developer.

Figure 2-9 shows an analog of Moore’s law with face recognition error rates reducing annually by approximately a factor of 2. This applies to three fixed databases, involving cooperative photographs from four operational sources. The FN rates reduce because the algorithms are increasingly able to associate poor-quality photos and those of faces taken up to 18 years apart. Although such gains have been realized by many developers, error rates vary considerably across the industry: some organizations produce algorithms that are much more accurate than others. Importantly, any given operator of face recognition can only realize such gains in its operations by procuring updated algorithms and applying them to its image databases. Another implication is that operators will find pairs of mated images in legacy databases that had previously been unknown; in a criminal justice investigation, this could produce a new lead.

Page 52 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-9** Date of development and evaluation versus false rejection rate for FRT from a single vendor.
SOURCE: National Institute of Standards and Technology, 2023, “Face Recognition Technology Evaluation (FRTE) 1:N Identification,” Department of Commerce, https://pages.nist.gov/frvt/html/frvt1N.html.

Today, state-of-the-art systems are able to recognize images captured under controlled conditions with recognition accuracy high enough to meet many application requirements. They are also able to recognize poorer-quality photographs where the subject does not cooperatively engage the camera, or where the camera optics or imaging environment are poor. This ability has enabled end users to expand their capture envelope to include less-constrained photographs.

It is not possible to give a one-line answer to the question of how good face recognition is. Accuracy is inextricably linked to the properties of the images (both the search photo and database faces) being used. A second factor is the algorithm; accuracy varies widely across the industry. Face recognition algorithms do not yet have the capability to report “search photo does not exist in the database” without downgrading their capability to find true matches.

Accuracy in Large Populations

With an exception detailed below, face recognition search is viable even with databases with several hundred million faces—where viable means error rates that are sufficiently low for many use cases. First, the FP identification rate must be low—the system should not mismatch too many search photos with database entries—which is achieved by using a high threshold. However, the threshold cannot be raised arbitrarily because that will cause an elevation in FN identification rates—the system

Page 53 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

will fail to retrieve (“miss”) matching database entries. There is thereby a trade-off between FN and FP error rates.

As more individuals are enrolled into a database, the possibility of a mismatch increases. To maintain a fixed FPIR, it is necessary either for the algorithm to adapt or for the system owner to raise the threshold. Search remains viable in very large populations because of an aspect of statistics concerned with tails of distributions. To limit the FPIR—the proportion of searches that return a mismatch when they should not—the algorithm must correctly report only low similarity scores. The highest score, known to statisticians as an extreme value, will grow as the number of people in the database grows. However, the highest value grows only slowly with the size of the database. By analogy, one will find taller people in a sample of 10,000 versus 1,000, but not that much taller.

However, there is a problem. The extreme value model implies that FPIR grows slowly so that thresholds need to be elevated only slightly to maintain FPIR. However, this assumes that the similarity scores are sampled from a single and stable population distribution—that is, that one does not expect outlier or freak scores. In the same way that 500-year floods will occur more frequently when the climate has changed, the actual non-mate distribution will include a well-known population that generates high non-mate scores: twins. Twins are common: 3 percent of newborns are a twin in the United States³⁵ and 0.4 percent are identical twins.³⁶ Twins are becoming increasingly common with later-in-life motherhood and increased use of fertility technologies. The effect on FRT is that if one twin is in the database, and the other is searched, an FP will occur (because contemporary FRT algorithms are incapable of distinguishing them). Such events occur naturally even in small populations—for example, if the entire population of a small town is enrolled. They will occur more frequently in large data sets such as state drivers’ licenses databases.

Figure 2-10 shows that, even with the most accurate contemporary algorithms, low FPIRs cannot be achieved by elevating thresholds because FN rates ascend rapidly to levels that would render the system useless.

___________________

³⁵ Centers for Disease Control and Prevention, 2023, “Births: Final Data for 2021,” National Vital Statistics Reports 72(1), https://www.cdc.gov/nchs/data/nvsr/nvsr72/nvsr72-01.pdf.

³⁶ P. Gill, M.N. Lende, and J.W. Van Hook, “Twin Births,” updated February 6, In StatPearls [Internet], Treasure Island, FL: StatPearls Publishing, https://www.ncbi.nlm.nih.gov/books/NBK493200.

Page 54 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-10** False positive identification rate of given algorithms.
SOURCE: National Institute of Standards and Technology, 2023, “Face Recognition Vendor Test (FRVT) Part 2: Identification,” NISTIR 8271 Draft Supplement, Department of Commerce, https://github.com/usnistgov/frvt/blob/nist-pages/reports/1N/frvt_1N_report_2023_02_10.pdf.

Page 55 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

DEMOGRAPHIC DISPARITIES

All machine learning–based systems, including biometric systems, potentially have performance that varies across demographic groups. (An analogous effect, the cross-race effect—that is, the tendency for individuals to more easily recognize faces that belong to their own racial group—is seen with human observers.) This arises fundamentally because humans vary anatomically: our characteristics differ individually, and by sex, by age, by ethnicity, and potentially other groupings that may not have descriptors associated with them. Some groups are categorical (e.g., sex), some are continuous (e.g., height), and some are defined as categorical (e.g., the young versus the old). It is the responsibility of a biometric system designer to ensure uniform function across all groups—or at least sufficiently close to uniform to be acceptable for a given application—or to qualify that the system should be augmented or not used by certain groups.

The first study of differential accuracy among different demographic groups was a 2003 report from NIST.³⁷ It found that female subjects were more difficult for algorithms to recognize than male subjects, and that young subjects were more difficult to recognize than older subjects.

Considerable attention has been paid to demographic effects in face recognition since the 2018 “Gender Shades” study of cloud-based algorithms that inspect a face image and return a classification of male or female.³⁸ The study showed that the algorithms tested misclassified the gender of women more than men, and those with dark skin tone more than light skin tone, and it gave the highest error rates on dark skin tone women, classifying up to 35 percent of African females as men. While the work had the effect of drawing attention to demographic performance differences in face recognition, the Gender Shades systems were not face recognition algorithms because they are not designed to support verification or determination of who a person is. Classification algorithms make a direct guess at gender. Recognition algorithms use different mechanisms—they encode identity into templates and, later, compare them. The persistent popular conflation of gender classification and face recognition may stem from the fact that algorithms used for both tasks employ neural networks trained on, respectively, large gender- and identity-labeled sets of photographs, although they are trained toward different objectives.

All face recognition system components potentially have error rates that depend on the demographics of the subjects. For example, a camera might have inadequate

___________________

³⁷ P.J. Phillips, P. Grother, R.J. Michaels, D.M. Blackburn, E. Tabassi, and M. Bone, 2003, Face Recognition Vendor Test 2002: Evaluation Report, NISTIR 6965, https://nvlpubs.nist.gov/nistpubs/Legacy/IR/nistir6965.pdf.

³⁸ J. Buolamwini and T. Gebru, 2018, “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification,” Proceedings of Machine Learning Research: Conference on Fairness, Accountability and Transparency 81:1–15.

Page 56 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

field of view to capture tall individuals; a face detector could fail on individuals with no hair and with eyebrows of similar color to their skin; a quality assessment algorithm might reject a passport application photo of an individual whose eyelids are very close to each other;³⁹ or a presentation attack detection algorithm might reject a face because it misclassifies long hair near the face as the edge of a device used to present a replay in image in a spoofing attack. For face recognition itself, both FP and FN error rates can differ. Importantly, the magnitudes, causes, and consequences of these errors differ, so they are discussed separately in the following two subsections. This separation adds specificity over statements made in many articles that face recognition does not work in a particular group.

The most thorough evaluation of disparity in face recognition across demographic groups was the 2019 NIST Face Recognition Vendor Test,⁴⁰ which raised awareness in the academic community and prompted vendors to collect additional training data and improve the facial recognition algorithm accuracy to reduce bias across the demographic groups.

False Positive Variation by Demographic Group

Nature. FPs involve two people: they occur when images of two people are incorrectly matched, which will occur when an algorithm returns a high similarity score. This can occur for a variety of reasons, depending on the algorithm. These include natural similarity of identical twins and other close relatives; spurious high scores from very poor-quality photographs such as low resolution or extreme overexposure; and matching within demographic groups that are under-represented in the data sets used to train the algorithm. FPs will also occur when the decision threshold is set to a very low value, as is the case when humans are employed to review the matches.

Affected groups. For most algorithms, FP rates are higher in women than men, also in the very young and old, and in particular ethnic groups.^41,42 For many algorithms, these groups are Africans, African Americans, East Asians, and South Asians. For some algorithms developed in China, the East Asian group gives low FP rates and, instead, the White group gives elevated rates. FPs are highest at the intersection of these groups—for

___________________

³⁹ J. Regan, 2016, “New Zealand Passport Robot Tells Applicant of Asian Descent to Open Eyes,” Reuters, updated December 7, https://www.reuters.com/article/us-newzealand-passport-error/new-zealand-passport-robot-tells-applicant-of-asian-descent-to-open-eyes-idUSKBN13W0RL.

⁴⁰ P. Grother, M. Ngan, and K. Hanoaka, 2019, Face Recognition Vendor Test (FRVT)—Part 3: Demographic Effects, NISTIR 8280, Washington, DC: Department of Commerce and Gaithersburg, MD: National Institute of Standards and Technology, https://doi.org/10.6028/NIST.IR.8280.

⁴¹ G. Pangelinan, K.S. Krishnapriya, V. Albiero, et al., 2023, “Exploring Causes of Demographic Variations in Face Recognition Accuracy,” arXiv:2304.07175.

⁴² K. Krishnapriya, V. Albiero, K. Vangara, M.C. King, and K.W. Bowyer, 2020, “Issues Related to Face Recognition Accuracy Varying Based on Race and Skin Tone,” IEEE Transactions on Technology and Society 1(1):8–20.

Page 57 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

example, for many algorithms elderly Chinese women give the highest false match rates. These effects are not related to poor photography; they occur even in well-controlled, standard-quality images. Also, this is not clearly related to skin tone—high false match rates are observed in both light-skinned East Asian and dark-skinned African populations. Furthermore, algorithms known to be trained on East Asians can give high false match rates on Whites. Last, very young children give high false match rates,⁴³ possibly due to undeveloped features and severe lack of representation in training sets.

Magnitude and prevalence. These will be more common in deployments where many non-mated comparisons are performed. This will occur in one-to-many searches of large databases such as when detecting duplicate identities in benefits systems, and when many non-mated searches are conducted—for example, in public area surveillance, or sports arena entry, where a watchlist alert system is in use. FP rates can vary massively across groups; the ratio can be one, two, or three orders of magnitude in some demographic groups versus others; this depends strongly on the algorithm and the groups being recognized.

Impact. The consequences of FPs vary by application. As an FP involves two people, either or both can be affected. In a one-to-one access control task, an FP could lead to loss of privacy or theft, for example. In a pharmacy, an employee would not be able to refute the assertion that they dispensed drugs to a fraudster. In a benefits-fraud detection setting, an FP might lead to a wrongly delayed or rejected application. In a public area surveillance application, an FP could result in interview and arrest.

Root-cause remediation. There is consensus that remediation of disparities in FP rates is the job of the recognition algorithm developer by, for example, increasing the diversity of the training data or accounting for imbalances in the training data by reweighting under-represented groups.⁴⁴

False Negative Variation by Demographic Group

Nature. FNs involve one person: they occur when two photographs of that person do not match, which is a result of low similarity arising from some change in facial appearance. This can occur owing to a change in hairstyle or presence of cosmetics, to aging, or when image quality is degraded—for example, when a photograph does not have fidelity to a subject’s face. This can occur variably across demographic groups. One common circumstance is for a photograph to be underexposed, a problem that occurs

___________________

⁴³ P.J. Grother, M. Ngan, and K. Hanaoka, 2019, Face Recognition Vendor Test (FRVT)—Part 3: Demographic Effects, NISTIR 8280, Washington, DC: Department of Commerce and Gaithersburg, MD: National Institute of Standards and Technology, https://doi.org/10.6028/NIST.IR.8280.

⁴⁴ M. Bruveris, J. Gietema, P. Mortazavian, and M. Mahadevan, 2020, “Reducing Geographic Performance Differentials for Face Recognition,” Pp. 98–196, IEEE Winter Applications of Computer Vision Workshops (WACVW), https://doi.org/10.1109/wacvw50321.2020.9096930.

Page 58 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

more frequently in dark-skinned individuals because pigmented skin reflects less light. Poor photography can lead to overexposure of light skin, but this is less common. Given such images, a face detector can fail such that a test might record failure-to-capture rates that differ by demographic group. If detection succeeds, however, an underexposed face image can have insufficient detail to allow the face recognition algorithm to discern face features or face shape. This will tend to elevate FN rates.⁴⁵

Affected groups. Although FNs are usually more common in women than men, and sometimes in Africans and African Americans versus Whites, false FNs are uniformly quite low (see the following), and variation across groups is small. Standardized measures of inequity are much smaller than for FPs. An exception to this is in very young children, where rapid, growth-related changes in appearance cause FN rates to be much higher than in adults.

Magnitude and prevalence. Notably, with contemporary face recognition algorithms applied to images collected from cooperative subjects, FN rates are below 1 percent, and much lower than the gender misclassification rates measured in Gender Shades—for example, 35 percent. FN rates and demographic differences will generally increase if imaging is less controlled, such as from a webcam installed in a taxi being operated at night.

Impact. The consequences of an FN vary by application. In a mobile-phone authentication context, an FN can be remedied by a retry or entering of a PIN. Without a secondary authentication mechanism, a set of FNs in a time-and-attendance application could be construed as a failure to come to work. In a surveillance application, FNs are to the advantage of the person; in a protest, for example, an individual might wear a protective face mask and sunglasses to hide their features and thereby impede detection or induce an FN. Likewise, an FN would be to the benefit of a soccer hooligan.

The magnitude of demographic variation depends on what measures have been taken to mitigate these issues. For example, some systems use improved lighting to help mitigate face detection and insufficient detail effects with dark skin tone individuals. Some systems have attempted to rebalance the composition of the training data to mitigate the effects of under-representation.

Root-cause remediation. This is a photography problem that is difficult to fully remedy without adoption of controlled light, controlled exposure, high-dynamic-range imaging, or active camera control. The value of such approaches will be realized only if higher-precision data transmission standards⁴⁶ are promulgated in the face recognition

___________________

⁴⁵ C.M. Cook, J.J. Howard, Y.B. Sirotin, J.L. Tipton, and A.R. Vemury, 2019, “Demographic Effects in Facial Recognition and Their Dependence on Image Acquisition: An Evaluation of Eleven Commercial Systems,” IEEE Transactions on Biometrics, Behavior, and Identity Science 1(1):32–41.

⁴⁶ ISO, 2022, “Information Technology-JPEG XL Image Coding System—Part 1: Core Coding System,” ISO/IEC 18181-1:2022.

Page 59 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

community; these would encode luminance (and color) in more than 8-bit integers, allowing higher-contrast images to be captured.

The impacts of errors associated with demographic variation depend on the application. For example, in authentication scenarios like access control, where almost all usage is by the legitimate account holder, a high FN rate in a demographic would directly impact convenience and useability. The same system will be configured to give low FP rates (1 in 10,000 is typical), such that even if some demographic existed for which the false match rate was much higher (1 in 100, say), then it would still be rare for there to be any observable impact. Indeed, some practitioners incorrectly consider FP variations to be entirely irrelevant, arguing that it only affects impostors. However, a high FP match rate can represent a security flaw such that members of an affected demographic could be harmed. In one-to-many surveillance applications, such as soccer stadium entry, FPs cause adverse outcomes (e.g., eviction), so large demographic variations are hazardous.

FACE RECOGNITION UNDER ATTACK

Face recognition is used to verify identity claims and to identify subjects in a database. In applications that are used to confer some benefit—such as access to a building, country, or account—a bad actor may seek to subvert the intended operation of the system. Depending on the setting, an attacker may want to positively match someone else, or to not match themselves. These are discussed in the next two subsections.

Impersonation

In verification, if an attacker can successfully use a face recognition system to match a victim, then the benefits accrue to the attacker—this could be access to a mobile phone, or entry to a country using someone’s passport. This standardized term for this is impersonation,⁴⁷ and it requires the attacker to (1) appropriate a credential (the phone or passport), and (2) arrange for the face recognition to produce a sufficiently high similarity score. This is attempted in the physical domain using a number of techniques such as wearing a face mask or cosmetics so as to resemble the legitimate enrollee, or by simply displaying a photo of that person on paper or tablet. Such methods are termed presentation attack instruments, and the activity is a presentation attack. Examples are shown in Figure 2-11. It is also possible to launch attacks in the digital domain by injecting a photo electronically into a system—for example, by tricking the receiving system into thinking that the injected photo came from a real camera.

___________________

⁴⁷ ISO, 2023, “Biometric Presentation Attack Detection—Part 1: Framework” ISO/IEC 30107-1:2023.

Page 60 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

**FIGURE 2-11** Legitimate photo of a subject, and two presentation attack instruments.
SOURCES: P. Grother, M. Ngan, and K. Hanaoka, 2019, *Face Recognition Vendor Test (FRVT), Part 3: Demographic Effects*, NISTIR 8280, Washington, DC: National Institute of Standards and Technology, Department of Commerce, https://nvlpubs.nist.gov/nistpubs/ir/2019/NIST.IR.8280.pdf.

The success of such attacks depends on knowledge, opportunity, skill, and whether countermeasures, if any, are effective. The attacker generally needs to know who they are attacking—to impersonate the owner of a phone, an attacker will need knowledge of their appearance. This is often readily available via casual observation and photography of the victim. For other biometrics such as fingerprint or iris, such information is more difficult to come by.

Impersonation attacks are possible also in face recognition applications using one-to-many search. For example, in a paperless aircraft boarding application, a subject resembling someone on the departure manifest could authenticate and board successfully. An identical twin or an able attacker equipped with a face mask could attempt this. Such systems are single-factor authentication systems relying solely on the biometric match.

By using a presentation attack instrument that resembles a target subject, an impersonator could incriminate that person at a crime scene that they knew was being recorded.

Evasion

Face recognition is often used to check whether a subject has been seen previously. For example, if people are evicted from a casino for cheating, their photos may be retained and enrolled in a face recognition system with the intent that they will be recognized and denied entry should they return. An attacker would anticipate such steps and seek to evade recognition. This may be achieved by avoiding cameras, by not looking at cameras, or, more effectively, by changing one’s appearance so that recognition returns

Page 61 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

a low similarity score. This can be attempted by wearing a face mask of someone else, by wearing sufficient cosmetics, or by occlusion. For example, in the 2019 protests against legislation in Hong Kong, citizens wore face masks to undermine recognition.

Detection of Attacks

Attack detection is critical in applications where economic or other incentives exist for attackers to impersonate or evade. For example, there are obvious monetary benefits to someone who can execute unemployment benefits fraud by establishing two or more identities. As such, there are successful efforts to detect presentation and injection attacks. These fall into two categories: passive and active. PAD analyzes the received biometric data, which could be a photo or video, and makes a decision. In active attack detection, the software will arrange for a change in the appearance of a subject—for example, by issuing an instruction to the subject, or by manipulating the illumination of the subject. The key to success of such countermeasures is randomness: the attacker would need to respond correctly to the “challenge” issued by the PAD system. Both passive and active attack detection schemes can be supplemented with information obtained from other sensors—for example, the vascular structure of a face could be imaged using a long-wave infrared camera sensitive to thermal information.

If attack detection can be done perfectly, then the biometric system conclusively binds the actual person to the capture event. If it is imperfect, then security and trust are eroded.

HUMAN ROLES AND CAPABILITIES

In applications of face recognition such as access control, where most transactions are mated, accuracy is high enough that matching will usually succeed. In those FN cases where it does not, a secondary resolution process is needed. This could involve a human, as happens after a passport gate rejection in immigration, or with an airline staff member after a failure in automated aircraft boarding. In such cases, the human will compare the face on a presented ID document with that of the identity claimant. This process will itself have some errors: FNs if the reviewer fails to verify a legitimate claimant and FPs if an impostor is verified—for example, when the impostor is trying to circumvent the automated check.

In investigations, face recognition is typically used to present lists of candidate photos to a human reviewer, who compares each candidate with the searched photo to check whether it is a true match. The use of human review is an integral part of the process, used in 100 percent of searches. Moreover, humans are fallible and, as with FRT,

Page 62 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

human review can result in two types of error. These are FPs (incorrect associations of two people in the photos) and FNs (failure to associate one person in two photos). In criminal investigations, an FN would result in an unidentified suspect, but an FP could lead to an incorrect detention. When humans review long lists of candidate photos, there are typically tens of opportunities for false matches: the human review must correctly reject all of them to avoid an FP. In terms of binomial statistics, even if a reviewer’s false match rate was 1 percent, then the chance of falsely accepting any one of 50 would be 1 − (1 − 0.01)⁵⁰—which is about 0.4, or about a 40 percent chance that a mistake will be made.

Human adjudication of photos has been extensively studied by experimental psychologists. The task is termed “unfamiliar face matching,” as it usually involves review of two juxtaposed photos to determine whether they are of the same person. As such, the task does not require memorization. The first step for a human is to determine if one or both of the photos are unsuitable for comparison; this “no value” determination is sometimes skipped, and a match or no-match decision will be made. Face recognition algorithms faced with the same task can fail to find a face or can electively refuse to process an image by analyzing its quality and suitability for recognition. However, systems are usually configured to accept even poor-quality photos.

It is well documented that a human reviewer’s accuracy is improved when there are no constraints on review duration,^48,49 there are multiple images of a person,⁵⁰ the images are of standardized high quality,^51,52 and the reviewer has had adequate sleep.⁵³ Additionally, it is known that accuracy depends on the demographics of the reviewed faces—most importantly, that humans of one race give reduced accuracy when reviewing photographs of another.⁵⁴ Human false non-match rates are reduced when the expression and head orientation in the two photos are similar and when the time elapsed between photo creation is small.⁵⁵

Human trials are complicated because human performance is time dependent on timescales similar to the test duration, and over longer timescales. One notable aspect

___________________

⁴⁸ M.C. Fysh and M. Bindemann, 2017, “Effects of Time Pressure and Time Passage on Face-Matching Accuracy,” Royal Society Open Science 4(6).

⁴⁹ M. Özbek and M. Bindemann, 2011, “Exploring the Time Course of Face Matching: Temporal Constraints Impair Unfamiliar Face Identification Under Temporally Unconstrained Viewing,” Vision Research 51(19):2145–2155.

⁵⁰ D. White, A.M. Burton, R. Jenkins, and R.I. Kemp, 2014, “Redesigning Photo-ID to Improve Unfamiliar Face Matching Performance,” Journal of Experimental Psychology: Applied 20(2):166.

⁵¹ A.M. Burton, D. White, and A. McNeill, 2010, “The Glasgow Face Matching Test,” Behavior Research Methods 42(1):286–291.

⁵² P.J. Phillips, 2017, “A Cross Benchmark Assessment of a Deep Convolutional Neural Network for Face Recognition,” Pp. 705–710 in 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), Washington, DC, https://doi.org/10.1109/fg.2017.89.

⁵³ L. Beattie, D. Walsh, J. McLaren, S.M. Biello, and D. White, 2016, “Perceptual Impairment in Face Identification with Poor Sleep,” Royal Society Open Science 3(10):160321.

⁵⁴ C.A. Meissner and J.C. Brigham, 2001, “Thirty Years of Investigating the Own-Race Bias in Memory for Faces: A Meta-Analytic Review,” Psychology, Public Policy, and Law 7(1):3–35.

⁵⁵ A.M. Megreya, A. Sandford, and A.M. Burton, 2013, “Matching Face Images Taken on the Same Day or Months Apart: The Limitations of Photo ID,” Applied Cognitive Psychology 27(6):700–706.

Page 63 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

is that human observers gradually develop a match bias during prolonged testing such that the FN rate declines (i.e., improves) but the false match rate increases.⁵⁶ Such behavior would be important, for example, over the hours of a border guard’s shift. It would be less important in a criminal investigation featuring ample review time, and limited numbers of image pairs to review.

The cognitive explanation for the experimental observations is still being researched, but the existence of the effects, and their magnitudes, is largely settled. An important topic in cognition research is whether standardized forensic-level training is effective in improving accuracy. As an explanation, it has been suggested that training drives toward an unlearning of the innate perceptual mode in which humans process faces holistically.⁵⁷

So how accurate are humans? In a 2017 test of human capability, reviewers were given three months to review 20 pairs of frontal photographs without being given identity ground truth; there were 12 pairs of the same person, and 8 pairs of different people.⁵⁸ The reviewers were categorized into five groups by experience, training, and aptitude: forensic examiners (with extensive training, and who testify in court); reviewers (who typically perform initial law enforcement reviews in investigations); super recognizers (who have documented aptitude in tests or during employment); and fingerprint examiners and undergraduate students (as control groups). Despite the extended review duration, only 7 of 57 examiners correctly adjudicated all 20 pairs. The corresponding figure for reviewers was 2 of 30, for super recognizers 3 of 13, for fingerprint examiners 1 of 53, and for students 0 of 31. More tangibly, for the most proficient groups, forensic face examiners and super recognizers, the study estimated an approximately 1 percent probability of assigning a highly confident match decision to an actually non-matching pair. The study did not address image quality. The images used were of fair quality, collected in a cooperative university setting.

OTHER SALIENT ATTRIBUTES OF TODAY’S COMMERCIAL FACIAL RECOGNITION TECHNOLOGY

Today’s commercial FRT systems have several attributes that relate to how they might best be governed. These include

___________________

⁵⁶ H.M. Alenezi and M. Bindemann, 2013, “The Effect of Feedback on Face-matching Accuracy,” Applied Cognitive Psychology 27(6):735–753.

⁵⁷ D. White, A. Towler, and R.I. Kemp, 2021, “Understanding Professional Expertise in Unfamiliar Face Matching,” Forensic Face Matching 62–88.

⁵⁸ P.J. Phillips, A.N. Yates, Y. Hu, et al., 2018, “Face Recognition Accuracy of Forensic Examiners, Superrecognizers, and Face Recognition Algorithms,” Proceedings of the National Academy of Sciences 115(24):6171–6176.

Page 64 Cite

Suggested Citation:"2 Facial Recognition Technology." National Academies of Sciences, Engineering, and Medicine. 2024. Facial Recognition Technology: Current Capabilities, Future Prospects, and Governance. Washington, DC: The National Academies Press. doi: 10.17226/27397.

×

Proprietary. Since its inception, the face recognition industry is built on algorithms that are trade secrets—the details of their architecture, objective functions, and training data are closely held. There are a few open-source algorithms, and although these may seed commercial development, they are not supported and documented to the level of commercial viability.
Not commoditized. Commercial FRT algorithms vary greatly in their technical capabilities, in terms of accuracy, stability across demographic groups and imaging conditions, and in speed, memory, and power consumption. They differ also in the software maturity, application programming interface support for programmers, scalability to large populations and volumes of searches, and portability across computer hardware.
Deployed as cloud services as well as on-premises. For many years, face recognition systems were deployed only as software libraries installed on customer-owned computers or cameras. In recent years, with widely deployed fast networks, face recognition systems have been deployed in clouds in which imagery is uploaded to a remote data center. The two deployment paradigms differ with respect to custody of customer data. In the on-premises approach, faces and associated biographic data are maintained on customer-controlled systems. In the cloud-based arena, the data are uploaded to cloud provider’s hardware. As such, use of the data by the cloud provider is constrained only by the contractual arrangements between the cloud provider and the customer. Developers of cloud-based face recognition can train on customer data sets if they are not contractually barred from doing so, and if the images are accompanied by ID labels.