National Academies Press: OpenBook

Fostering Responsible Computing Research: Foundations and Practices (2022)

Chapter:3 Sources of Ethical Challenges and Societal Concerns for Computing Research

« Previous: 2 Theoretical Foundations from Ethical and Social Science Frameworks
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page29
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page30
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page31
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page32
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page33
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page34
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page35
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page36
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page37
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page38
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page39
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page40
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page41
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page42
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page43
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page44
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page45
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page46
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page47
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page48
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page49
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page50
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page51
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page52
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page53
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page54
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page55
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page56
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page57
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page58
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page59
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page60
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page61
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page62
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page63
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page64
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page65
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page66
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page67
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page68
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page69
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page70
Suggested Citation:"3 Sources of Ethical Challenges and Societal Concerns for Computing Research." National Academies of Sciences, Engineering, and Medicine. 2022. Fostering Responsible Computing Research: Foundations and Practices. Washington, DC: The National Academies Press. doi: 10.17226/26507.
×
Page71

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

3 Sources of Ethical Challenges and Societal Concerns for Computing Research This chapter aims to illuminate the underlying causes of ethical and societal challenges for responsible computing research, grounding them in the ethical and sociotechnical concepts and analyses of Chapter 2. An understanding of these roots is essential to identifying practical steps that those who support and conduct this research can take toward addressing those concerns. The chapter’s discussion also makes evident the importance for computing research of incorporating into it consideration of ethical values and trade-offs, of methods from the social sciences, as described in Chapter 2, and of the multidisciplinary collaborations important to realizing ethically and societally responsible technologies and avoiding potential negative consequences of novel technologies. The chapter thus lays a basis for appreciating the report’s recommendations. As noted in Chapter 1, the multi-step translation of research results into deployed algorithms, devices, and systems is effected by researchers, research sponsors, entrepreneurs, investors, and corporate leaders. Actions by participants at any stage of the translation can affect the ethical and societal impact characteristics of any system that emerges from this translation process. The participants in the development and deployment of new technologies, whether individual people or corporations, draw not only on foundational science and engineering research, but also on relevant governmental and corporate governance policies and on legal regulations in determining the shape of a technology. As a result, actions they take depend not only on computing science and engineering but also on those societal level policies and regulations. Furthermore, an additional group of actors plays a role in the deployment of systems: the purchasers of those systems: ethical and societal impacts are determined by technology choices individuals, organizations, and governments make and the ways in which they use the technologies and systems they acquire. This chapter identifies a variety of situations, conditions, and computing practices that have potential to raise ethical and societal impact concerns and indicates responsibilities computing researchers have in addressing them. In many cases, the discussions identify challenges that arise because of decisions by those involved in downstream product design, deployment, and acquisition, many of them identified in presentations to the committee (see Appendix B). Some might object that computing researchers have no roles or responsibilities when it comes to downstream product design, implementation, or deployment. Even in these cases, however, computing researchers have obligations. Although they cannot prevent all development and deployment problems, they can minimize the likelihood of misinterpretation or misuse of their research by others through clearly delineating the limitations of the capabilities and the intended scope of applicability and use of the methods and artifacts their research yields. Their in-depth knowledge of their research also places them in a unique position to inform the public and advise government on such facets of these methods and artifacts as their intended situations of use, scope of validation, and limitations. Computing research itself is embedded in a range of social contexts: the university or research organization in which it is being carried out; the organization funding the research; and the society in PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 29

which these organizations exist form an ever evolving and complex human system. These contexts yield a range of societal factors that affect the ethicality and societal impact of the research across the four groups. They also influence choices of research problems, the membership and structure of the teams that conduct the research, and the stakeholders engaged during the research process. The chapter divides ethical and societal impact challenges into four groups; those that arise from (1) features of the societal settings into which new computing technologies are introduced; (2) limitations of human capabilities and ways they interact with features of computing technologies and the societal contexts in which computing technologies may be used; (3) features of the societal contexts into which computing technologies may be integrated that influence computing system design and deployment choices; and (4) system robustness problems caused by failure to follow best practices in design and implementation. The final section of the chapter highlights the limits of purely computing-technical approaches to meeting the challenges of societally responsible computing technologies. It provides two examples, each of which illustrates the need for policy and regulation to work in tandem with computing technology design and deployment. 3.1. SOCIETAL CONTEXTS The social ecosystems in which computing technologies participate give rise to challenges rooted in the diversity of human participants in these sociotechnical systems, which yields the possibilities for conflicting values and goals, the need to respect human dignity, and complexities for predicting individual behavior; a recognition that although computing technologies can help address societal challenges, they can do so only to a certain extent; the influences of the institutional structures and norms into which computing technologies may be integrated; and the societal-level impact these technologies might have. The examples given in this section make apparent a range of ethical and sociotechnical factors to which computing researchers should pay attention, including the need for inclusiveness in research of the various stakeholders potentially affected by research outcomes, the importance of elucidating limitations of research results as well as contributions and benefits, the need for computing researchers to consider the potential extreme societal level harms of computing, the importance of reshaping education and training in computer science and engineering, and the resulting needs to include multi-disciplinary expertise in computing research, reshape computing education and training, and to assist the public’s and government’s understanding of research outcomes. In doing so, it both reaches back to the foundations provided in Chapter 2 with examples of the general principles that chapter lays out, and forward to Chapter 4 recommendations. 3.1.1 Reconciling Conflicting Values and Goals of Stakeholders As noted in Chapter 2, in a pluralistic society, different individuals or groups may have very different values and interests. In any context, some values and interests can justifiably take priority over others. For example, in everyday life, one person’s interest in privacy almost always supersedes another’s interest in idle curiosity. All technologies, including computing technologies, prioritize particular values and interests. In her remarks to the committee, Sarah Brayne observed that, before turning to questions about technology, one must answer normative questions. In criminal justice, is the goal of deploying computing technology to reduce crime, reduce prison populations, better allocate scarce policing resources, or something else?1 Ece Kamar also pointed to a workplace surveillance technology conflict; such 1 Sarah Brayne, University of Texas, Austin, presentation to the committee on March 4, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 30

technologies can be used to incentivize only productivity gains or to also collect information useful for coaching employees in ways that foster their professional development.2 Computing researchers make value choices even when their research is not explicitly aimed at an application. A research project focused on faster chips prioritizes computing power; a research project focused on chips that use more environmentally friendly materials prioritizes environmental benefits. Furthermore, value conflicts can be resolved in many different ways, and as subsequent sections discuss, a variety of stakeholders’ interests should play a role in decisions about which values to prioritize in computing research. As Chapter 2 argues, computing research is itself not value-neutral, and also computing technologies have many different kinds of stakeholders. Computing researchers and the computing research community prioritize, often implicitly, some values over others, with such choices occurring not only in designing systems or empirical investigations and data gathering, but also in the choice of research problem area and the particular aspect of the problem considered. The more diverse the group involved and empowered in these choices—disciplinarily, demographically, and geographically—the more likely the group is to notice mistaken assumptions and recognize biases, and thereby act on them. Last, it is important to recognize that it is not an option to somehow prioritize everyone’s values equally. Computing technologies and research necessarily prioritize certain values over others. As an analogy, consider building a restaurant menu. The placement of items on the menu influences people’s orders, so one could build a menu that makes healthy choices more likely, or more profitable choices more likely, or advances some other values in terms of its influence on choices. But one cannot choose to prioritize all values equally (unless they all happen to coincide), because one has to pick some layout for the menu items. Similarly, one cannot conduct value-free or “value-uniform” computing research, and so cannot avoid deciding which values and interests will, for the purposes of this research, be prioritized. 3.1.2 Preserving Human Dignity The concept of human dignity is rooted in human intuitions and deeply held values as well as domestic and international law. Human rights documents such as the Universal Declaration of Human Rights underscore that concepts of dignity often transcend cultural and national differences, even if those differences also affect nuances in how particular aspects of the concept are understood or implemented. Dignity plays an important role in the American legal system, reflected in discussions of American legal commitments to due process, for example, and protection against unreasonable searches and seizures. Concerns about human dignity also arise well beyond the legal system, in domains as diverse as civic education, research on ethics, and broad efforts to reform public institutions and safeguard privacy in civil society. Human dignity is more than merely a subjective feeling. It encompasses a variety of closely related concerns, including the intrinsic ethical values of autonomy, well-being, and justice and legitimate power, as well as such instrumental ethical values as privacy, safety, and security. Computing technology can affect all these elements of human dignity. Computer systems and the data they use can affect labor and the marketplace, shape social activity, and structure the public’s relationship with their government and each other. Changes in applications, systems, and data have the potential to alter fundamental aspects of dignity, including privacy, freedom and agency, and physical security. For example, better design can afford better access for users with different usability needs, such as those with vision, hearing, or mobility impairments (see Section 3.4.4). Computing research can affect the downstream characteristics of these applications, systems, and data. To the extent that choices among values and prioritizations choose one group’s preferences over another, they may risk sacrificing some people’s human dignity. These issues reflect the importance for computing researchers to consider the full range of potential stakeholders and the diversity of values they may hold for technologies that incorporate 2 Ece Kamar, Microsoft Research presentation to the committee on March 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 31

the results of their research, as well as the need for multidisciplinary expertise in shaping computing research projects that could potentially affect these values. A range of examples help illustrate connections between computing research and potential impacts— both positive and negative—on human dignity. On the positive side, adjustments to workplace scheduling software to give more control to workers can significantly improve workers’ quality of life and sense of dignity.3 On the negative side, any strictly rule-based system in the administration of criminal justice will likely be both over- and under-inclusive, and so there need to be humans in the loop to ensure that all parties retain their human dignity.4 The double-edged potential of computing research for human dignity is readily seen in possible uses of automation to help handle the crush of civil litigation.5 On the one hand, many people currently lack realistic access to human expertise in situations where civil litigation might be appropriate for them, and so computing technology could help restore their autonomy and dignity. On the other hand, removing the human element of the civil litigation process might erode social stability and legitimacy, thereby harming everyone’s human dignity. Even the process of research can affect the dignity of individuals whose data or behavior are used, as they are often treated as mere data points, rather than humans with inherent dignity. Computing researchers thus need to carefully consider the potential consequences of their research projects––both positive and negative––for human dignity. 3.1.3 Responsibly Predicting and Shaping Individual Behavior Data-intensive computing methods, including machine learning, enable predictions that have the potential to be more accurate and less shaped by cognitive biases and heuristics, as well as by explicit or implicit discriminatory attitudes. Systems deploying these methods do not commit the base rate fallacy or adopt the availability heuristic.6 They have no intrinsic racist animus.7 They can detect patterns in massive data sets that are impossible to identify using other methods. With the vast expansion in computing power, the availability of large amounts of data, and the availability of predictive models, such methods seem easy to deploy in new areas and are being used to improve predictions of many sorts for which training data is available. The absence of human discriminatory animus explicitly in these systems, however, in no way ensures that those systems will not reproduce, in digital form, structurally biased social phenomena, as critiques of such technologies as facial recognition and predictive policing show.8 Computing research avenues for addressing this challenge include more intentional participation of diverse users in training systems and development of objectives for model training that counter such biases. Predictions of large-scale social phenomena are vital contributors to public policy and are central to social scientific research. Advances in computing systems, however, now offer the promise of predicting the behavior not of populations, but of individuals. And these acutely individualized predictions have been mobilized in contexts with very high stakes (ones that put at risk many of the ethical values in Section 2.1)—such as decisions over whether to release a defendant pretrial, whether to admit a student to the university, and whether to award or deny an application for credit. Computing researchers working to predict the behavior of individuals must be especially cautious about how their research will be used. 3 Karen Levy, Cornell University, presentation to the committee on March 11, 2021. 4 Andrea Roth, University of California, Berkeley, Law, presentation to the committee on March 4, 2021. 5 Ben Barton, University of Tennessee, Knoxville, and Gillian Hadfield, University of Toronto, presentation to the committee on May 11, 2021. 6 Kahneman, D., P. Slovic, and A. Tversky. 1982. Judgement Under Uncertainty, Cambridge: Cambridge University Press. 7 Kleinberg, J., et al. 2019. Discrimination in the age of algorithms, Journal of Legal Analysis, 10. 8 Noble, S.U. Algorithms of oppression. New York University Press, 2018. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 32

At the scale of public policy, the impact of predictive models on the population as a whole can be assessed, with errors being straightforwardly compensated for by improvements over other predictive tools. At the scale of decision-making about individuals’ lives, a mistaken prediction cannot be likewise compensated for. This difference is one reason for long-standing practices of due process in the legal system that go far beyond merely applying globally optimal probabilistic tools. At the population level, the performance of the model over time can be validated through established statistical methods. At the individual level, there is no ground truth against which to validate any particular intervention: if a defendant is not released pretrial, it is not possible to determine whether they would have been rearrested, had they been released. This fact does not imply that such models should not be developed, but that when they are developed or deployed, if at all, computing researchers, developers and deployers should do so with full knowledge of the significant moral risks associated with predicting individual behavior in uses that make high stakes decisions about people’s lives; researchers should therefore clearly delineate the limitations of the capabilities and the intended scope of applicability of their methods and systems. Second, computing systems are also being used to predict human behavior in order to calibrate interventions that are then used to shape that behavior. This can range from supercharging “nudging” to take account of particular characteristics of users,9 to designing recommender systems aiming to optimize user engagement, or advertising systems focused on increasing click through rate.10 Actors in society have always used any means available to shape others’ behavior; the concern with advances in computing research, combined with our increasing dependence on computing systems, is that the means now available for this purpose are much more effective, more pervasive, and more readily available. Even if computing systems are relatively ineffective at manipulating any particular individual, they are demonstrably effective at a kind of stochastic manipulation, whereby populations are moved by modestly influencing the behavior of their constituents.11 That computing systems may be used to make high stakes, highly sensitive and highly risky predictions about individuals’ lives, and their potential use to calibrate digital interventions intended to shape people's behavior are concerns not only for system development or deployment. If computing researchers become aware that their work could potentially be used to predict human behavior for these purposes, they need to consider the ethical ramifications of such uses and consider the context in which their research is done and the potential implications for society, groups, or individuals, and the values they are trading off. In many cases, it will be vital to build in robust procedural protections for those affected, because there is no ground truth against which to validate any particular intervention.12 Ensuring that their predictive models of human behavior are appropriately auditable as well as interpretable by users is a challenge computing researchers need to address. (See Section 3.2.4, “Understanding Behavior of Opaque Systems.”) These issues reflect not only the importance of considering a range of ethical values and trade-offs among them as well as the full range of potential stakeholders, but also the need for multidisciplinary expertise and for researchers to make clear the intended uses and the limitations of their research outcomes. 3.1.4 Proper Roles for Technologies in Addressing Societal Problems 9 Yeung, K. 2017. “ ‘Hypernudge’: Big Data as Regulation by Design,” Information, Communication and Society, 20/1:118–136. 10 Graepel, T., et al. 2010. “Web-Scale Bayesian Click-through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine,” in Proceedings of the 27th International Conference on Machine Learning (Icml-10), J. Fürnkranz and T. Joachims (eds.), Haifa, Israel: Omnipress. Pp. 13–20. 11 Benn, C. and S. Lazar 2021. “What’s Wrong with Automated Influence,” Canadian Journal of Philosophy, Forthcoming. 12 Citron, D.K. 2008. “Technological Due Process,” Washington University Law Review, 85/6:1249–1314. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 33

Technologies can help address social problems, but technological approaches cannot solve societal problems on their own, as evident from the discussion of sociotechnical systems (Section 2.2). There are numerous examples of problems caused by unconstrained technological solutionism—the default tendency to pursue purely technological solutions to societal problems. Ben Green’s book Smart Enough Cities13 describes how some of these difficulties14 play out in the setting of urban planning; similar challenges arise in many other settings.15 In designing technological interventions, it is important to distinguish between approaches that address the symptoms of societal problems and approaches that address the root causes of these problems. Consider, for example, the rapid growth of legal automation and arbitration software, aiming to address the radical under-provision of court time or affordable legal representation to resolve private disputes in the United States, or the development of legal aid software aimed to simplify complex legal documents for the large number of individuals who cannot afford a lawyer. Computing may be able to help address these social problems but to do so it needs to be designed to support people in roles that address those problems. Even so, there are deeper underlying problems beyond the sphere of computing research—the litigiousness of a society, the large volume of small-claims disputes, and difficulties interpreting legal documents—that need to be addressed. Computational approaches rely on abstracting problems to make them formally representable and mathematically tractable. This process of abstraction is both a crucial source of power in the design of solutions to computational problems and a significant source of risk for societal problems, as models can omit details that ultimately prove crucial or unintentionally focus on symptoms rather than underlying causes. Here too there are numerous examples of models that produce disastrous results because crucial domain specifics were omitted. To take another example cited Ben Green's writing, late 19th century loggers in Germany replaced ancient forests with managed alternatives, mathematically optimized for maximum timber yield per acre. The models on which the optimization was based were irremediably flawed and failed to represent the importance of biodiversity in sustaining forest ecosystems; after two generations the forests died and could not be revived.16 This example is also a reminder that one must monitor impacts and adjust in accordance with such feedback. These examples illustrate that for computing research to support the development of computing technologies that address societal problems, it needs to take into account the social contexts in which the models, methods or artifacts it produces might be deployed. For them to be able to address societal problems adequately and ethically for all potentially affected populations requires that all potential stakeholder populations are included in design decisions (whether of algorithms or of computing systems). The examples thus also show the importance of research which aims to address societal problems being multidisciplinary from the start. 3.1.5 Aligning with Existing Norms, Structures, and Practices Social relationships may be mediated by various technologies as Chapter 2 describes. When computer science researchers are thinking about developing new systems, they should work with social scientists and stakeholders to understand the relevant existing social norms, institutional structures, and practices. Institutional structures and existing norms can be buttressed or challenged by novel computing technologies. New technologies can alter power structures, shift work loads, change compensation 13 Green, B. 2019. The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future, MIT Press. 14 Morozov, E. 2013. To Save Everything, Click Here : The Folly of Technological Solutionism, New York: PublicAffairs. 15 See, e.g., Brayne, S. 2021. Predict and Surveil : Data, Discretion, and the Future of Policing, Oxford University Press. 16 Scott, J.C. Seeing Like a State. Yale University Press, 2008. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 34

incentives and affect the nature of work. Integration of new technologies done well can improve structures and practices; done poorly it can obstruct them. In many domains, including health care, work and labor and justice, the introduction of new technologies can affect equity and inclusion. Examples of computers reifying existing power structures are prevalent in discussions of work and labor. Employees might be surveilled and the information that is gathered is only shared by managers (who are not surveilled). This can give managers knowledge that workers do not have. Karen Levy’s remarks to the committee revealed that data-intensive technology in the workplace can be used to transfer burdens that have generally been borne by managers and owners to workers. Levy says that technologies that are ostensibly enacting efficiencies place extra burdens on workers. A bevy of other examples cited by Levy illustrate how social and economic relationships are being reorganized by computing technologies: Amazon used to compensate authors who published with their platform based on how many people downloaded a title, but then switched to only compensate authors for the number of pages of books that get read. (In music, this shift has altered the type of cultural products that are produced to navigate away from albums and navigate more toward individual songs.) Lastly, Levy discussed how computing technology has recently been deployed to monitor the tone of call center workers and how such monitoring could be quite overbearing and create an unproductive and inhuman psychological burden on those workers. Computing technologies shape social relationships between manager and employee in these examples.17 One area where computing technologies have not easily integrated with existing institutional structures and norms is healthcare. Perhaps the most familiar example is electronic health records (EHR), which have had a slow uptake and required an enormous amount of funding to widely deploy. In-home monitoring is another, emerging, use of computing technology, which has been propelled in part by the health-conscious patients who have been among the early adopters. Examples include digital watches, digital scales, and digital glucometers. As Robert Wachter explained in his remarks to the committee, doctors increasingly can monitor their patients between visits, making a great deal more information available that can be used to enhance treatment. The technologies allow the reach of health care to be extended from hospitals and medical offices to patients’ homes. However, this development raises questions about whether patients and their families will choose this type of healthcare relationship.18 In her remarks to the committee, Abi Pitts described her work during the COVID-19 pandemic with children in foster care and other precarious situations. The move to telemedicine was challenging because vulnerable patients did not always have access to Internet-connected devices, yet these technologies were integral to the main mode of care delivery. Further complicating the challenge of serving patients was the requirements of electronic health record systems and other kinds of medical technologies to have stable and caring parents or guardians as points of contact and to provide the consent required for minors in need of health care. The move to telemedicine resurfaced the need for more flexible approaches to adult oversight than the available technology platforms allowed.19 Madeline Claire Elish’s presentation to the committee on the SepsisWatch project illustrated just how much careful work and planning is involved in successfully integrating a new technology into a hospital setting. She discussed how a successful deployment of new computing technologies can be contingent on changes to existing social and institutional structures. The effective integration of the tool into hospital workflows was not possible until the expert knowledge of nurses was reflected in new practices and procedures.20 These observations about computing technology in work and health care settings exemplify the importance for sociotechnical systems design of including all potential stakeholders from the beginning of computing technology design and of multidisciplinary teams. They also illustrate the limitations of 17 Karen Levy, Cornell University, presentation to the committee on March 11, 2021. 18 Robert Wachter, University of California, San Francisco, presentation to the committee on March 16, 2021. 19 Maryann Abiodun (Abi) Pitts, Stanford University School of Medicine / Santa Clara Valley Medical Center, presentation to the committee on May 11, 2021. 20 Madeleine Claire Elish, Google, Inc., presentation to the committee on March 16, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 35

computing research’s abilities to ensure responsible computing technologies and the resultant need for informed corporate policy and government regulation, to which the expertise of computing researchers can contribute. Recommendation 8 indicates ways in which researchers can contribute to the development of such policies and regulations. 3.1.6 Addressing Environmental Externalities Computing research has generally not focused on environmental externalities but in recent years, attention has focused on the energy and materials use of all computing systems. To address environmental externalities, computing researchers will need to broaden how they view the impacts of computing technologies and consider the wide range of ethical values and trade-offs raised by these externalities. Significant strides have been made to enhance the energy efficiency of data centers and other infrastructure, drawing on computing research on more energy efficient components, architecture, and power and cooling designs as well as using renewable energy sources. At the same time, the energy consumption of this infrastructure continues to expand with growing demand for computing and storage. Battery constraints have propelled significant improvements in the power performance of laptops, smart phones, and embedded devices as well as the power performance of specific algorithms, architectures, and system designs. Yet not all computing research, even research involving systems such as Bitcoin that are known to be extremely power-hungry systems, necessarily takes energy externalities into account. Computers require a wide array of materials, some of which are extracted only at great cost in energy or come from mines and processing facilities that may pollute, provide poor working conditions, employ child labor, or fuel conflict. Commercial technology producers have few incentives to design and sell objects that are made to last, are recyclable, or can easily be repaired. The result is even more demand for materials as well as pollution from discarded items. Right to repair movements have blossomed from critiques of this mode of commerce, and some computing researchers have suggested new paradigms for technology designs that move away from “disposable” technology and enable longer life cycles, repair, and use sustainable materials. Researchers in fields such as HCI have also challenged researchers to consider the use of materials and opportunities for reuse and recycling in processes such as prototyping and design. Computing in a world of limited resources also raises questions about what computing capacity is necessary or socially desirable. Should every user’s click on a random website be saved for decades? Are the benefits of power-hungry proof-of-work blockchain systems such as Bitcoin worth the environmental price? Is it worth making devices harder to repair in order to make them thinner or lighter? Although a source of the unsustainable consumption that drives climate change, computing is also a key tool for understanding and managing that change. Climate analysis relies on enormous data sets and complex computational models, and computing is also an important tool for optimizing and managing energy-consuming activities in order to reduce their carbon footprint. 3.1.7 Avoiding Computing-Related Extreme Events Computing technologies can be associated with scenarios involving risks of extreme events— destructive failures and similar outcomes that are massively costly for society. At present, these events are likely to arise as a result of misuse (intended or not), failure, or unexpected properties that emerge from large, complex networks. Examples familiar from recent headlines include ransomware attacks that shut down courts or hospitals for days or weeks and computer failures causing the electric grid to fail for an extended period and flash crashes of stock markets. An example of a potential extreme risk would be a computing malfunction or cyberattack affecting the command and control of nuclear weapons. These examples are a reminder that computing researchers should not dismiss the possibility of catastrophic harms that may result from the imperfections or emergent and unanticipated properties of PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 36

sociotechnical systems. By educating government policy makers, and monitoring uses of computing technologies, computing researchers can help society better manage these risks. Although current computing researchers are not responsible for continued use of obsolete computing technology (e.g., in air traffic control or nuclear weapons command and control), they can contribute to the much-needed better development of good risk assessment tools going forward. Given that the risks depend on the whole sociotechnical system (humans and computers), further research and analysis will bring greater precision to the assessment of risks involving computing technologies. There is also some possibility that certain extreme risks will arise not only from deployment of computing innovations, but from the research process itself. While in the end the Morris worm21 was not so destructive, its origin as a research project is a reminder that research may inadvertently cause harm, a possibility that the continuous integration of research results into deployed systems makes more likely. 3.1.8 Influences of Social Structures and Computing Pedagogy This section examines two organizational facets of computing research of which responsible computing research requires computing researchers and the computing research community to be aware. Advancing Diversity, Equity, and Inclusion For computing research to be responsible requires taking into account the influence of a variety of organizational and social structures on decisions made in carrying it out. Two such structural influences on computing research in particular raise significant ethical and societal impact issues: the lack of diversity of the computing research community and the lack of inclusiveness of affected populations in the design, development and testing of computing research and the artifacts it produces. Insufficient awareness of and lack of attention to systemic racism and sexism—which includes the implicit perpetuating of biases that were once explicit and blatant—have raised and continue to raise ethical and societal impact concerns.22 Computing researchers and the computing research community at large cannot by themselves change these structures. Redress requires action at every level from society and government to individual researchers, and much collaborative work. The discussion in this section highlights issues that the computing research community needs to be aware of so it can be mindful of ways it can adjust processes of research, system design, and deployment to mitigate harms. Diversity has many different dimensions, including race, gender, ethnicity, geography (e.g., the Global South is often excluded from decisions about computing technologies), and cognitive and physical capabilities. Although there may remain questions about the extent to which various types of diversity contribute to responsible computing research, diversity is essential to better problem solving and design.23 The negative impacts of lack of inclusiveness are also clear now that broad swaths of that research and its use increasingly involve the collection, computational analysis of that data and its use. The biases that have been found in various face recognition and judicial sentencing systems are reminders of the problems that lack of inclusiveness in research design and lack of diversity on research teams can cause. 21 Federal Bureau of Investigation. “Morris Worm.” Federal Bureau of Investigation. https://www.fbi.gov/history/famous-cases/morris-worm. 22 See, for example, Benjamin, R. 2020. Race After Technology Abolitionist Tools for the New Jim Code. Cambridge, UK: Polity; Asprey, W. 2012. Women and Underrepresented Minorities in Computing: A social and historical study. Springer. 2016; Abbate, J. Recoding Gender. MIT Press; and Misa, T., ed. 2020. Gender Code: Why Women Are Leaving Computing. Wiley. 23 See, for example, Rock, D., and H. Grant. 2016. “Why diverse teams are smarter.” Harvard Business Review 4, no. 4:2–5 and Woolley, A.W., C.F. Chabris, A. Pentland, N. Hashmi, and T.W. Malone. “Evidence for a collective intelligence factor in the performance of human groups.” Science 330, no. 6004 (2010):686–688. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 37

Computing research remains a field dominated by White men, despite various efforts—including the National Science Foundation’s significant investments in its Broadening Participation in Computing program—to diversify. Small, token representations of currently underrepresented groups do not work. Until people from a broader range of perspectives and backgroundsmake up a significant portion of a group, their ideas and perspectives are likely to be frequently overlooked. This state of affairs affects both the pipeline and the challenge of retaining talent. The resulting computing research environment influences the perspectives that are brought to bear on any project, including what are understood to be the “best” or “appropriate” computing research priorities. The related problem of inclusion in relevant aspects of computing research design, implementation and deployment reflects the fact that increased diversity on a research team by itself is not sufficient. Differences in race and gender are not simple differences, but structurally hierarchical ones that almost always yield power and importance differences. A lack of attention to structural racism and sexism have led to many current problems of computing technology, problems which often have origins in a similar lack of attention in the computing research that underlies those systems. For instance, a health risk assessment model based on hospital admission data that was used as a predictor of who would need a hospital bed was developed using a data set that disproportionately favored people with insurance. This data set bias resulted from overlooking the fact that race plays a big role in who has insurance and can therefore afford to be hospitalized, and thus who will be admitted. As a result of overlooking this structural bias in the data, in its predictions, the model perpetuated an existing bias. In algorithmic screening of job candidates, it has proven extremely difficult to remove bias.24 Similarly, the design and use of predictive algorithms used in pretrial release determinations or sentencing assessments must take into account structural factors that affect both the data used to train their models and how those tools are used. For instance, because one is more likely today to be arrested or stopped without cause if Black than if White, feeding such data into models used for future predictions about criminality will replicate racial bias.25 Inclusion, ensuring that working environments foster participation by and advancement of members of underrepresented and historically marginalized groups, is also important in design. For instance, it requires that computing research as well as technology development recognize that social inequities yield differences in access to technology from an array of sources including lack of connectivity, app interface inaccessibilities (e.g., for those with vision or hearing challenges), and language and dialect variations that not adequately addressed in language and speech processing technologies. Absent inclusiveness of people whose abilities, perspectives, and experiences are different from researchers, results are likely to perpetuate or exacerbate existing inequities. By contrast, efforts to counter structural ableism in widely used computing platforms have led to much improved human-computer interaction for all users.26 Diversity, equity, and inclusion issues arise at all three sociotechnical levels described in Section 2.2. Long-standing cultural prejudices typically underlie lack of diversity and inclusion, and market and scholarly incentives may influence choices researchers make. Although structural responsibility does not preclude individual responsibility, social structures are often organized in such a way that another person in a similar position would make the same ethically problematic choice. The social facts that engender organizational problems are typically generated over time, often through uncoordinated acts of many individuals and groups. Some responsibility for the problems thus rests in these social structures of the organizations within which researchers carry out their work. To bring about different outcomes requires organizational commitments to changing research environments. 24 Raghavan, M. and S. Barocas. 2019. “Challenges for mitigating bias in algorithmic hiring.” The Brookings Institution. https://www.brookings.edu/research/challenges-for-mitigating-bias-in-algorithmic-hiring/. 25 Renee Hutchins, University of the District of Columbia School of Law, presentation to the committee on March 4, 2021. 26 This opportunity was realized several decades ago. See National Research Council. 1997. More Than Screen Deep: Toward Every-Citizen Interfaces to the Nation’s Information Infrastructure. Washington, DC: National Academy Press. https://doi.org/10.17226/5780, p. 41. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 38

Integrating Ethical and Societal Issues into Training Computing researchers are commonly trained in computer science, computer engineering or a related discipline. Typical curricula and research training in computer science and engineering still involve little or no exposure to the social and behavioral sciences. The absence of significant exposure to ethical and societal impact issues and to methods for identifying and addressing them is one possible explanation for computing researchers neglecting them in their research. Furthermore, and not surprisingly, most computer science researchers consider themselves unqualified to teach ethical and societal implications of their work, and many consider doing so to be outside of their sphere of responsibility.27 Some institutions have responded by adding an introductory course in technology and ethics. Such courses may be useful, but as is discussed below, there are both practical and pedagogical disadvantages to relying on a single course that is divorced from the core courses in computer science and engineering. If optional, the students who need it most may not take it; if required, it may be difficult to fit into an already crowded curriculum. A substantial literature on computing ethics has developed since Moor’s 1985 paper28 argued for it as an emerging important area of philosophical scholarship. Although computing ethics is an independent area of research in its own right,29,30 there is an ongoing need for more practically engaged and technically informed philosophical scholarship to help computing researchers better understand the ethical implications of their work. For responsible computing research, there is an urgent need for computing researchers to acknowledge that considering the societal and ethical implications of their work is an essential component of that work. This change of mindset, however, is only a first step. To identify and address ethical and societal impact issues, computing researchers need their research projects to incorporate relevant methods and tools from a broader range of disciplines. Research organizations need to make structural changes in the ways they organize research efforts, including providing incentives for scholars and researchers in all relevant fields to engage in such efforts, including modifying the way researchers are evaluated for promotion and tenure. The ways in which to effect such multidisciplinary efforts depend on career stage. Established researchers cannot be expected to become expert in other fields, but they can learn and benefit from acquiring sufficient knowledge of the approaches of those fields to identify those with whom they should collaborate on a given project. Recommendations 3.2 and 3.3 in Chapter 4 include several possibilities for them to acquire competencies in working with scholars in the humanities and social sciences. Students (undergraduate, graduate, and postdoctoral trainees) need broader educations than the heretofore standard ones in computer science and engineering. Neither the need to broaden the computer science curriculum to encompass ethics and consideration of societal impact nor the challenges to doing so are new31 but the need has become more urgent. The burgeoning interest in ethics and societal impact among students makes this time a propitious one for introducing such changes to the curriculum. 27 Ashurst, C., E. Hine, P. Sedille, and A. Carlier. 2021. “AI Ethics Statements—Analysis and lessons learnt from NeurIPS Broader Impact Statements.” arXiv preprint arXiv:2111.01705; Hutson, M., and J. Seabrook. 2021. “Who Should Stop Unethical A.I.?” The New Yorker. https://www.newyorker.com/tech/annals-of-technology/who- should-stop-unethical-ai. 28 Moor, J.H. 1985. “What is computer ethics?.” Metaphilosophy 16, no. 4:266–275. https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1467-9973.1985.tb00173.x. 29 Stahl, B.C., J. Timmermans, and B.D. Mittelstadt. 2016. “The ethics of computing: A survey of the computing-oriented literature.” Acm Computing Surveys (CSUR) 48, no. 4:1–38. https://dl.acm.org/doi/pdf/10.1145/2871196. 30 Pennock, R.T. and M. O’Rourke. 2017. “Developing a Scientific Virtue-Based Approach to Science Ethics Training.” Science Engineering Ethics. https://link.springer.com/content/pdf/10.1007/s11948-016-9757-2.pdf. 31 K. Miller. 1988. Integrating Computer Ethics into the Computer Science Curriculum, Computer Science Education, 1:1, 37–52, doi: 10.1080/0899340880010104. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 39

Internships along with course work can help students learn the importance of disciplinary diversity among the people taking part in the computing design process. Computing researchers, working in collaboration with colleagues in the social and behavioral sciences as well as ethics, need to apply the same creativity and discipline to this part of the work as to design, development and analysis. As Eden Medina said in her remarks to the committee, “Indeed, the kinds of quandaries that we see today often cannot be simplified in terms of right or wrong, which is how students sometimes view ethics. In fact, what we often see are people who are acting with good intentions, or who believe deeply in their work, but they don’t understand the full implications of their work and from one perspective, how could they possibly, especially since so much about computing is framed as new.”32 Furthermore, if it is to have the desired effects, the teaching of ethical and societal impact implications of computing research and technology development needs to be integrated into the computing courses across the curriculum and not offered only as “one-off” siloed separate classes.33 Various projects funded by the Responsible Computing Science Challenge,34 which are multidisciplinary efforts, have pioneered a variety of promising approaches.35 The academic institutions funded by this challenge are both public and private, and they are diverse in size, student populations, and resources. Their approaches and others’ address a number of challenges to incorporating ethics into computer science and engineering curricula, including scaling across the curriculum36 and providing the requisite expertise in relevant ethics and social science disciplines without expecting computer science and engineering faculty to develop it. These projects have also identified a variety of ways of reducing barriers and incentivizing participation. The embedding of ethics into existing courses has four important features: 1. It directly ties ethical and societal impact thinking to course material, so students learn that these considerations are part of computing research and technology development work and so students experience embodying ethical thinking in the various seemingly mundane decisions of their computing practices, so these skills are learned in the same integrated way as computing technical skills;37 2. Students who are not computing majors but take computer science and engineering courses and may go on to careers for which understanding of computing and computing research and its ethical implications are important (e.g., in government or in corporate management) also acquire the requisite knowledge and reasoning skills; 3. The integration and distribution throughout the curriculum address failure points identified in stand-alone ethics courses in engineering, including having greater promise of changing the culture than stand-alone courses; and 32 Eden Medina, Massachusetts Institute of Technology, presentation to the committee on May 25, 2021. 33 Grosz, B., D.G. Grant, K. Vredenburgh, J. Behrends, L. Hu, A. Simmons, and J. Waldo. 2019. “Embedded EthiCS: Integrating ethics across CS education,” Communications of the ACM, Volume 62, Issue 8, pp. 54–61, https://doi.org/10.1145/3330794. 34 “Responsible Computer Science Challenge.” Mozilla. https://foundation.mozilla.org/en/what-we- fund/awards/responsible-computer-science-challenge/. 35 “What Our Tech Ethics Crisis Says About the State of Computer Science Education—How We Get to Next.” 2021. How We Get to Next. https://www.howwegettonext.com/what-our-tech-ethics-crisis-says-about-the-state-of- computer-science-education/. 36 Shapiro, B.R., E. Lovegall, A. Meng, J. Borenstein, and E. Zegura. “Using Role-Play to Scale the Integration of Ethics Across the Computer Science Curriculum.” In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pp. 1034–1040. 2021; Cohen, L., H. Precel, H. Triedman, and K. Fisler. “A New Model for Weaving Responsible Computing into Courses Across the CS Curriculum.” In Proceedings of the 52nd ACM Technical Symposium on Computer Science Education, pp. 858–864. 2021. 37 Bezuidenhout, L., and E. Ratti. “What does it mean to embed ethics in data science? An integrative approach based on microethics and virtues.” AI and Society 36, no. 3 (2021): 939–953. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 40

4. It does not require adding course requirements to computer science and engineering majors, which are often already overfilled. 3.2 LIMITATIONS OF HUMAN CAPABILITIES Various characteristics of the physical and social worlds in which computing systems operate have potential incompatibilities with certain limitations of human capabilities for designing and understanding these systems. This section describes four arenas in which interactions of people with computing systems raise significant ethical and societal impact changes. The first focuses on the kinds of situations in which computing systems now often operate, the other three on aspects of human capabilities for interacting or working with computing systems. The discussion and examples presented in the subsections show that responsible computing research requires computing researchers to be transparent about the intended use situations for the computing methods and artifacts their research produces, limitations in their power and applicability, the assumptions about people’s capabilities their performance rests on, and the range of situations in which they have been tested. These responsibilities are reflected in Recommendation 6 (especially 6.5) and Recommendation 7 (most notably 7.2). 3.2.1 Designing for Open Worlds Computing systems are increasingly situated to operate in the physical and social worlds, with all of their complexities and interactions. Invariably, computing researchers will have limited knowledge about the situations in which a system will operate, because computing systems are now typically used in “open worlds” rather than closed, well-defined environments with a limited set of (usually trained) users. As a result, there is always the possibility of the system encountering an unexpected situation or of some additional information that might affect its behavior, possibly in unintended ways. Examples of different types of open worlds problems include the following:  Usage outside the anticipated physical conditions or environments or performance limits;  Deployment in systems where unexpected interoperability issues with other applications or systems could arise;  Users beyond those for which it was designed to be used by and assumptions about them and their understanding and/or motivations for use (e.g., cookies);  Usage in a wider set of use-cases beyond those for which it was designed;  Usage in open and/or unregulated environments (as opposed to controlled or regulated environments); and  Adversarial exploitation. An example of the first type of problem is the unanticipated conditions encountered by automated systems in automobiles. Reviewing a 2018 incident involving Tesla’s Autopilot technology, the National Transportation Safety Board (NTSB) found that the driver “over-relied” on the Autopilot system—the system, which was described as a partially automated system, was used as though it was a fully automated system. The NTSB recommended that automobile manufacturers “limit the use of automated systems to the conditions for which they were designed and … better monitor drivers to make sure they remain focused on the road and have their hands on the wheel.”38 Similar concerns had been raised in an earlier NTSB report on a 2016 crash between another Tesla and a tractor-semitrailer truck; the report also found 38 Chokshi, N. 2020. “Tesla Autopilot Had Role In ’18 Crash,” New York Times Feb 25, 2020, p. B4. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 41

that the car’s automated control system “was not designed to, and did not, identify the truck crossing the car’s path.”39 An example of the second type stemmed from the use of Bluetooth, a short-range wireless data communications standard that is widely used, including in medical devices. A vulnerability discovered by computer security researchers and known as “SweynTooth” was traced to faulty implementations of the Bluetooth Low Energy protocol, whereby a malformed data packet could result in security breaches and other adverse impacts on the receiving device.40 When such devices were deployed in the real world, the result was deadlocks, crashes, unpredictable behavior, and the potential for security breaches in the devices. Researchers attributed many of these flaws to inadequate specification of the edge cases, such as handling of partial packets, and inadequate testing in the certification process of the Bluetooth stack. As Kevin Fu observed in his remarks to the committee, the Bluetooth protocols in question were used in a wide array of medical devices.41 A large number of hospitals and medical offices had to be notified about the need to apply patches to protect patient safety and effectiveness. Such “open worlds problems” are of relevance to computing researchers whether they are developing systems for real-world deployment or developing methods that others may use for such purposes. 3.2.2 Confronting Cognitive Complexity of Oversight An approach frequently suggested for handling situations in which computing technologies may err is to recommend human oversight as a remedy, including but not limited to suggesting such oversight to compensate for the biases and limitations of algorithms in criminal justice, work and labor, and health care. This recommendation often takes the form of requiring a human “in the loop” (i.e., engaged in decision-making) or “on the loop” (i.e., monitoring the decision-making). Unfortunately, the burden such oversight places on the people providing oversight is enormous, sometimes one that is impossible to meet. Automated planes provide an interesting and provocative example of how this is so. Captain Chesley Sullenberger correctly noted in a recent interview with Wired: “it requires much more training and experience, not less, to fly highly automated planes.” The article goes on to observe that “[p]ilots must have a mental model of both the aircraft and its primary systems, as well as how the flight automation works.”42 In a presentation to the committee on computing and civil justice, Ben Green noted that “the vast majority of empirical evidence suggests that people are unable to play the types of roles that … human oversight and quality control policies imagine.” He further argued that “rather than protect against the potential harms of algorithmic decision-making in government, human oversight policies provide a false sense of security in adopting algorithms and enable vendors and agencies to shirk accountability for algorithmic harms.”43 Many tasks can be done better by human-computer teams rather than by a person alone or a system alone.44 The design of systems for such situations must, however, consider human capabilities from the 39 National Transportation Safety Board. 2017. “Collision Between a Car Operating with Automated Vehicle Control Systems and a Tractor-Semitrailer Truck Near Williston, Florida, May 7, 2016,” Accident Report NTSB/HAR-17/02-PB2017-102600, Adopted Sept. 12, 2017. https://www.ntsb.gov/investigations/AccidentReports/Reports/HAR1702.pdf. 40 Garbelini, M.E., C. Wang, S. Chattopadhyay, S. Sumei, and E. Kurniawan. 2020. SweynTooth: Unleashing Mayhem over Bluetooth Low Energy, USENIX Annual Technical Conference, July, pp. 911–925. https://www.usenix.org/conference/atc20/presentation/garbelini. 41 Kevin Fu, U.S. Food and Drug Administration, presentation to the committee on May 11, 2021. 42 Malmouist, S. and R. Rapoport. 2021. “The Plane Paradox: More Automation Should Mean More Training.” Wired. https://www.wired.com/story/opinion-the-plane-paradox-more-automation-should-mean-more-training/. 43 https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3921216. 44 Grosz, B. 1994. https://aaai.org/Library/President/Grosz.pdf; Kamar, E., S. Hacker, and E. Horvitz. Combining Human and Machine Intelligence in Large-Scale Crowdsourcing. In Proceedings of AAMAS 2012; Wilder, B., E. Horvitz, and E. Kamar. Learning to Complement Humans, IJCAI 2020. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 42

start, and assign to both the person and the system only tasks they are known to be able to handle correctly. Most often doing so will require interdisciplinary work. 3.2.3 Managing Pro-Automation Bias and Automation Aversion Automation bias and aversion are predictable features of how humans use computing systems, and the conditions in which they occur have been heavily studied. Automation bias refers to people’s tendency to defer to (automated) computing systems, leading to their disregarding potentially countervailing possibilities or evidence or failing to pursue them.45 It has been identified in many different sectors, including aviation,46 medical care,47 as well as in the use of computing systems to support the administration of such government functions as welfare, healthcare and housing48 and in the criminal justice system.49 Renee Hutchins spoke to this point in the context of predictive assessments in the criminal justice system: “We love easy fixes to really complex problems.”50 Significant ethical and societal impact concerns have arisen with the increased reliance on such technology. For example, tragedy has resulted when too much trust has been given to a semi-autonomous vehicle dubbed an autopilot, and individuals have suffered harms from misplaced reliance on predictions from AI systems developed on biased data. In the other direction, under trust by a user could also result in deadly impacts, if people disable sensor systems that provide warnings because of too many false alarms. As data-intensive applications become more fully integrated into people’s day-to-day activities, most users may not initially recognize the risks and profound consequences associated with handing over decisions to an intelligent agent. Automation bias is caused by many different factors. Some of these involve predictable fault on the part of the humans using the system—they choose the path of least cognitive resistance, deferring to the automated system because it is easier to do so than to verify its recommendations; or their role requires oversight of the automated system, and their attention wanders so that they are not properly overseeing it51 (this is more accurately described as “automation complacency”52). Some have to do with the particular affordances of the system, and don't imply human fault. For example, decision support systems that represent their outputs as being highly precise are more likely to be assumed authoritative than if they 45 Parasuraman, R. and V. Riley. 1997. “Human and Automation: Use, Misuse, Disuse Abuse.” https://journals.sagepub.com/doi/10.1518/001872097778543886. 46 Cummings, M.L. 2004. “Human Supervisory Control of Swarming Networks.” Massachusetts Institute of Technology. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.470.5969&rep=rep1&type=pdf. 47 Lyell, D. and E. Coiera. 2017. Automation Bias and verification complexity: a systematic review. Journal of the American Medical Informatics Association Mar 1;24(2):423–431.; Wachter, R.M. 2015. “The Digital Doctor: Hope, Hype, and Harm at the Dawn of Medicine’s Computer Age.” University of California. https://www.hqinstitute.org/sites/main/files/file-attachments/the_digital_doctor_wachter.pdf. 48 Citron, D.K. 2008. “Technological Due Process.” Washington Law Review. Vol. 85, Issue 6. https://openscholarship.wustl.edu/cgi/viewcontent.cgi?article=1166&context=law_lawreview. 49 Freeman, K. 2016. “Algorithmic Injustice: How the Wisconsin Supreme Court Failed to Protect Due Process Rights in State v. Loomis.” North Carolina Journal of Law and Technology. Vol. 18, Issue 5. https://scholarship.law.unc.edu/cgi/viewcontent.cgi?article=1332&context=ncjolt. 50 Renee Hutchins, University of the District of Columbia School of Law, presentation to the committee on March 4, 2021. 51 Bainbridge, L. 1983. “Ironies of Automation.” Analysis, Design, and Evaluation of Man-Machine Systems. Proceedings of the IFAC/IFIP/IFORS/IEA Conference, Baden-Baden, Federal Republic of Germany, 27–29 September 1982. Pages 129–135. 52 Parasuraman, R. and D.H. Manzey. 2010. “Complacency and Bias in Human Use of Automation: An Attentional Integration.” Human Factors: The Journal of the Human Factors and Ergonomics Society. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 43

explicitly represent model uncertainty.53 And if a decision support system has made a particular recommendation in a high stakes context, such as a decision over pretrial detention or parole, then the human decision-maker knows that, should they overrule the automated system and should its judgment ultimately be vindicated, the human decision-maker is likely to be held accountable. Whereas if they defer to the automated system then the human decision-maker is better able to at least share if not pass on all responsibility. Furthermore, for highly complex decisions, it is understandable that human decision- makers should defer to automated systems, on the grounds that the computer is more likely to have assimilated all of the relevant information than they are. People also may exhibit algorithm or automation aversion: an unwillingness to use reliable algorithms, particularly after seeing the system make a seeming mistake.54 Algorithm aversion is more likely when people are unable to exhibit any control over the algorithm functioning, or for decisions where they perceive themselves to be (relative) experts or if they are unable to understand the reasons for a system’s decisions or actions. Algorithm aversion can also lead to ethically and socially problematic outcomes when useful information or guidance is rejected. At the same time, algorithm aversion typically leads to maintenance of the (potentially problematic) status quo, rather than creation of novel challenges as in the case of automation bias. 3.2.4 Understanding Behavior of Opaque Systems The inadvertent misuse of any computing system is one consequence of people not understanding the limitations of a system, the reasons it chooses certain actions, or the rationale behind its recommendations. Presentations to the committee in the domains of health care, work and labor, and justice revealed many situations in which such problems arose. In some cases, more robust training of users, with training materials being transparent about the intended use of the systems and the advanced methods they embed and the limitations of those capabilities, will suffice. There are also some methods for building inherently interpretable models. However, most data-intensive AI applications are essentially opaque, black-box systems, and new systems capabilities are needed for users to be able to understand the decisions made by the algorithms and their potential impacts on individuals and society. AI researchers are well aware of these challenges and have begun to explore potential solutions. In particular, research on explainable and interpretable systems is attempting to make it possible to understand the decisions made by such systems well enough to determine their trustworthiness and their limitations given that they do not have access to a predictive model’s internal processes.55 The as yet largely unmet goals of assurance related to explainability are to provide people with the information needed for them to understand the reasoning behind a system’s decisions. Interpretable machine learning systems can explain their outputs. For instance, one approach taken toward developing such systems is to consider the degree to which a person can consistently predict the outcomes of such a computing system; the higher the interpretability, the easier it is for a user to comprehend why certain decisions or 53 Bhatt, U., J. Antorán, Y. Zhang, Q. Vera Liao, P. Sattigeri, R. Fogliato, G. Gauthier Melançon, R. Krishnan, J. Stanley, O. Tickoo, L. Nachman, R. Chunara, M. Srikumar, A. Weller, and A. Xiang. 2020. Last revised 2021. “Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty.” https://arxiv.org/abs/2011.07586. 54 Burton, J.W., M.-K. Stein, and T. Blegind Jensen. 2020. “A systematic review of algorithm aversion in augmented decision making.” https://onlinelibrary.wiley.com/doi/abs/10.1002/bdm.2155; Dietvorst, B.J., J.P. Simmons, and C. Massey. 2015. “Algorithm Aversion: People Erroneously Avoid Algorithms After Seeing Them Err.” Journal of Experimental Psychology: General. Vol. 144, Issue 1; and Prahl, A. and L. van Swol. 2017. “Understanding algorithm aversion: When is advice from automation discounted?” Journal of Forecasting. doi: 10.1002/for.2464. 55 Kim, B., and Doshi-Velez, F. 2021. “Machine Learning Techniques for Accountability.” AI Magazine. 42(1), 47–52. Retrieved from https://ojs.aaai.org/index.php/aimagazine/article/view/7481 and F. Doshi-Velez and B. Kim. 2017. “Towards a Rigorous Science of Interpretable Machine Learning.” arXiv:1702.08608 [stat.ML]. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 44

predictions have been made.56 Computing research has only begun to address the need for transparency of these systems. 3.3 SOCIETAL CONTEXTS AND DESIGN AND DEPLOYMENT CHOICES Many of the adverse ethical and societal impacts of computing technology described in the preceding sections result from choices made during design or deployment of a technology. Some such choices are inherited from the research on which later stages of technology design are based. The first subsection discusses several such choices at the design stage, indicating responsibilities researchers have toward avoiding such consequences and, again, the importance of multidisciplinary collaborations. It also describes challenges of multidisciplinary work and the responsibilities of research sponsors and research institutions toward enabling these kinds of efforts; Recommendations 2, 3, and 4 include specific steps these organizations should take. The second subsection describes ways that researchers and the research community can help those deploying or adopting technologies based on their research make wiser decisions. Practical steps toward these ends are provided by Recommendation 7. 3.3.1 Ideation and Design Stage Failure to consider a full range of consequences early in the process of developing computing research increases the risk of and adverse ethical or societal impacts because researchers have less time to find them and processes long under way (with a variety of investments in the research protocol already undertaken) are more difficult to reform or stop. In addition, even where researchers do manage to consider the right elements late in the research process and are willing to make necessary changes, the costs—intellectual as well as financial—will tend to be higher where the existing research program needs to be drastically changed or even scrapped. Scholarship in the field of design (see Section 3.4.4) has developed theory and methods that enable principled considerations of potential consequences and envisioning alternatives in the design space. This work provides an important foundation for addressing the challenges described in this subsection. Specifying Intended Functions and Uses of Research and Systems Computing research is often misdescribed, or even worse, offered with no clear explanation of the appropriate use, function, or domain. Insufficient description is often unintentional, as the researcher herself may not know exactly what application problems the research could assist in addressing. It may even result from good intentions, as the researcher may be trying to avoid biasing others about how the research might be used. Regardless, insufficient descriptions have the potential to lead to adverse behaviors of computing systems that incorporate research outcomes. For example, large language models built to support chatbots, voice assistants, and other language-centric systems are often described as “learning a language” rather than “learning large-scale statistics of word co-occurrence.” These misdescriptions may lead to inappropriate research and deployment uses of these language models. For instance, dynamic employee scheduling software is frequently described as “empowering employees” but all too frequently is researched, designed, and developed to empower employers.57 A related problem arises with rule-based decision systems that are described as aiming to “translate the law into code,” when 56 These terms are used inconsistently in the literature. This explanation differentiates them to emphasize that interpretability is but one approach to explanation. 57 Min Kyung Lee, University of Texas, presentation to the committee on March 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 45

they actually “translate particular legal texts into rules with similar domain.”58 The difference between these goals can be quite important to a potential defendant. The failure to appropriately specify the intended functions could lead future researchers or deployers to misunderstand the intended functions and uses of the original research and contexts and scopes for which it was developed—and use it in inappropriate ways. Future computing research can be led down dead-end paths and future computing technology development can result in systems that fail in harmful ways. For instance, the description of facial recognition systems using the generic term “computer vision algorithm” implies relative domain- and data-independence. However, the performance of current facial recognition and other perception systems is almost always highly dependent on particular training data sets. The resulting models exhibit biases in that data (e.g., reduced performance on darker-skinned faces).59 Researchers should not assume that others (including themselves in the future) will be able to determine or reconstruct the problems that the original research was intended to address. They need to ensure that they have, to the best of their ability, provided enough information that others can appropriately use the results of their research. Designing Training and Benchmark Data As Meredith Broussard said,60 all data are socially constructed. For data sets to be of scientific value and provide a foundation for subsequent research or deployed systems, they need to be intentionally designed and their sample population understood. A number of well-publicized adverse outcomes resulting from data-intensive systems illustrate the problem of bias and coverage inadequacies in training and benchmark data. There are two major sources of coverage inadequacies. One is that there are hard scientific problems to solve in some cases; for example, speech recognition researchers do not know how to handle the challenges of, for example, accent variation. The second is convenience sampling. In 2018, The Washington Post worked with two research groups to assess the performance of AI voice assistants against the range of accents present in the U.S. population.61 The researchers found that for some people with nonnative accents, the inaccuracy rate was 30 percent higher. In another case, biases were discovered in Gmail’s Smart Compose feature, which offers suggested text for finishing sentences and for replying to emails. In 2018, a research scientist at Google found that in response to typing, “I am meeting an investor next week,” Gmail suggested adding the question, “Do you want to meet him?”—making the assumption that an investor is male.62 Google’s response,63 removing such suggestions, resulted from an inability to fix the structural gender bias reliably. This example also illustrates the importance of extensive testing before release. Biased outcomes often result from sampling inadequacies, in particular when predictive functions are developed using data that happens to be available—data gathered from what is called “convenience sampling”—rather than data from carefully designed empirical work or curated with attention to the categories and types of data absent from the data set. Such bias and coverage inadequacies in training and benchmark data typically result from using data collected from online sources or preexisting collections 58 Ben Barton, University of Tennessee, Knoxville, presentation to the committee on May 11, 2021. 59 Buolamwini, J., and T. Gebru. “Gender shades: Intersectional accuracy disparities in commercial gender classification.” In Conference on fairness, accountability and transparency, pp. 77–91. PMLR, 2018. 60 Broussard, M. Artificial unintelligence: How computers misunderstand the world. MIT Press, 2018. 61 Harwell, D. 2021. “The Accent Gap.” The Washington Post, 2021. https://www.washingtonpost.com/graphics/2018/business/alexa-does-not-understand-your-accent/. 62 Vincent, J. 2021. “Google Removes Gendered Pronouns from Gmail’s Smart Compose to Avoid AI Bias.” The Verge. https://www.theverge.com/2018/11/27/18114127/google-gmail-smart-compose-ai-gender-bias- prounouns-removed. 63 A. Caliskan-Islam, J. Bryson, and A. Narayanan. 2016. Semantics derived automatically from language corpora necessarily contain human biases. Science 356(082016). https://doi.org/10.1126/science.aal4230. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 46

of research study data from conveniently available study participants. For instance, speakers of African American English are today less likely to be present in the research settings where much speech data is collected.64 In most cases, these data sets are not the products of purposeful sampling designed to identify the representativeness of the population that generated the data nor do these data sets typically come with metadata to contextualize the social variables that might matter to understanding the data and being able to appropriately apply it to a specific scientific question. For data sets to be of scientific value and provide a foundation for subsequent research or deployed systems, they need to be intentionally designed and their sample population understood. Of course, even carefully designed empirical work can yield data sets that have biases rooted in the biases of the people who participated in the empirical study, particularly if the data set or empirical study in some way assimilates people’s biased modes of thought and practices. An additional coverage problem arises in cases where the data needed are not available in any existing data set. In some cases, that situation results from a population not participating in the activity for which data is being collected or analyzed. This situation arises for minorities in healthcare. Mays and Cochran noted in their remarks to the committee that this problem occurs especially for data that involves intersectional characteristics—that is, when multiple axes of disadvantage or underrepresentation. In other cases, there may be ethical questions related to collecting the data.65 For example, Amy Fairchild discussed in her remarks to the committee the challenges of conducting HIV surveillance in a country that criminalizes homosexuality. She observed that such decisions have implications for a group’s power to advocate or seek resources and that decisions about how such information is used must be made in consultation with the community members who would be most affected.66 The data may also interact with other system features (e.g., algorithm or objective function choice) to yield biases. For example, a shopping algorithm may be designed to vary its output based on information about shopping behaviors. This information might inadvertently correlate directly to gender or race (even if that specific data label is not fed into the training algorithm), so that the resulting predictive system becomes a biased decision-making vehicle. Furthermore, the complexity of these algorithms can make it almost impossible to validate the systems or understand their results (see the subsection “Validation” below), making it likely that the harms they engender will outweigh their benefits.67 In her remarks to the committee, Sarah Brayne noted that “[the] premise behind predictive policing algorithms and the training data [is] that you can learn about the future from the past. And so, any inequality in the historical data is going to be reflected and projected into the future.”68 Ben Green observed in his remarks to the committee that “given existing racial and other disparities in outcomes such as creditworthiness, crime risk, educational attainment, and so on, even perfectly accurate predictions would reproduce social hierarchies. Striving primarily for more accurate predictions of outcomes may enable public policy to naturalize and reproduce inequalities.”69 Illustrating the real consequences, the harm such systems can cause was described by Renee Hutchins, “recent data comparing black and white stop and arrest rates suggest that you are twice as likely to be arrested if you’re black and five times more likely to be stopped without cause.70 And while stops and arrests may ultimately be shown to be unconstitutional within the criminal justice system, in the 64 Koenecke, A., A. Nam, E. Lake, J. Nudell, M. Quartey, Z. Mengesha, C. Toups, J.R. Rickford, D. Jurafsky, and S. Goel. “Racial disparities in automated speech recognition.” Proceedings of the National Academy of Sciences 117, no. 14 (2020): 7684–7689 65 Vickie Mays and Susan Cocharan, University of California, Los Angeles, presentation to the committee on May 6, 2021. 66 Amy Fairchild, The Ohio State University, presentation to the committee on March 16, 2021. 67 Howard, A. 2019. “Demystifying the Intelligence of AI.” MIT Sloan Management Review. 68 Sarah Brayne, University of Texas, Austin, presentation to the committee on March 4, 2021. 69 Green, B., Escaping the Impossibility of Fairness: From Formal to Substantive Algorithmic Fairness (January 21, 2022). Available at SSRN: https://ssrn.com/abstract=3883649 or http://dx.doi.org/10.2139/ssrn.3883649. 70 Renee Hutchins, University of the District of Columbia School of Law, presentation to the committee on March 4, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 47

interim, they are fed into data modeling that is used for future predictions about criminality.” The proliferation of sensors throughout society has led to increased numbers of people who have had no police contact being included in law enforcement corpora. Computing researchers are aware of these issues and making efforts to address them, through developing techniques for correcting biased data, through developing different algorithms for learning, and through collaborative research with subject domain experts. For example, computing researchers working with those with legal expertise may be able to help mitigate bias. Jens Ludwig noted that arrests for lower-level offenses are more subject to discretion, and hence bias than arrests for serious, more serious offenses, and that the court system and convictions are less prone to bias than arrests.71 “So, we built a tool that focuses on using convictions for relatively more serious offenses, ignoring less serious offenses, and the result is you can see a tool that gives almost identical release recommendations [for Blacks and Whites].”72 It is crucial that efforts addressing bias engage social science and ethics expertise as they involve applying a variety of nuanced social scientific concepts (as Section 2.2 describes). Defining Objective Functions Data-intensive machine learning methods maximize (or minimize) some objective function during training. There are many types of objective functions, including cost and loss functions, which evaluate how well a specific search algorithm models a given set of data. The selection of objective functions significantly influences what is learned. They reflect those values the designer considers important to the decisions or predictions the system will make. Several presentations to the committee pointed to cases in which the choice of objective function resulted in outcomes that favored one group over another. For instance, the optimizations of algorithmic management systems may omit certain factors important to workers: when developing shift work schedules, management may prioritize workplace efficiency and economic value while attention to worker well-being might prioritize stability and consistency of a schedule.73 Participatory design approaches can enhance worker well-being.74 When learning ways to identify a good worker the objective function may incorporate focus on things that are easy to count, like the number of email messages responded to or number of lines of code without knowing first whether such measures have a positive correlation with productivity rates.75 Kamar noted that AI systems (research and development) typically are optimized for fully automated work, assessing accuracy as if systems are going to be working alone; teamwork is not part of such optimizations.76 Changing the objective function to a team centric view 71 Jens Ludwig, University of Chicago, presentation to the committee on March 4, 2021. 72 For example, previous studies such as Mitchell and Caudy (2013) use survey data to estimate the probability of arrest for low-level offenses, such as drug charges, conditional on self-reported involvement in such offenses and find large disparities by race in arrest likelihood. In contrast, studies such as Beck and Blumstein (2018 J Quant Crim, see attached) find that racial disparities in sentencing outcomes are much smaller in proportional terms conditional on current charge and prior record, especially for the most serious offenses. O. Mitchell and M.S. Caudy (2015) Examining Racial Disparities in Drug Arrests, Justice Quarterly, 32:2, 288–313, DOI: 10.1080/07418825.2012.761721. 73 Min Kyung Lee, University of Texas, presentation to the committee on March 11, 2021. 74 M.K. Lee, I. Nigam, A. Zhang, J. Afriyie, Z. Qin, and S. Gao. 2021. Participatory Algorithmic Management: Elicitation Methods for Worker Well-Being Models. Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. Association for Computing Machinery, New York, 715–726. https://doi.org/10.1145/3461702.3462628 75 Karen Levy, Cornell University, presentation to the committee on March 11, 2021. 76 Ece Kamar, Microsoft Research, presentation to the committee on March 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 48

enables prioritizing learning about things that humans are not very good at. And doing that yields performances that are better than having either the computing system or the human doing tasks alone.77 Extreme risk scenarios can arise from an insufficiently thought-out objective function. For example, systems designed to maximize the defense and protection of military assets that fail to consider how their adaptive behaviors could affect the risk of war and systems designed to optimize electric grid efficiency can exacerbate cybersecurity risk. The choice of the objective function, as well as such other key components of a learning system as the algorithm to select and the random seed function are key elements of the design process. For example, ensemble methods, learning systems that combine different kinds of functions each with their own different biases, have long been credited as often performing better than completely homogeneous methods. Engaging Relevant Stakeholders The outcomes of computing research may be directly integrated into deployed systems or inform their design. As a result, the inclusion of the interests, values, and needs of a variety of stakeholders at the earliest stage of computing research becomes important not only for system success but also for alerting fellow researchers and society about potential limitations and concerns. The more obvious stakeholders of computing research are the “end-users” who use artifacts and other research products, but as noted earlier, there are many others. In the case of algorithmic systems supporting pretrial release decisions, defendants and not just judges and prosecutors are stakeholders. Defendants are not “users” of those systems in the traditional sense but are stakeholders because their lives will be profoundly impacted by how the system behaves—that is, by the algorithmic design. Even though the research itself (e.g., algorithm design or development of a new human-computer interaction method) may not directly involve all stakeholders, lack of attention to the values and needs of the wider communities affected by the systems the computing research enables, may nevertheless have adverse outcomes for them. Most often, it is these neglected or overlooked stakeholders who incur the greatest risks; with some forethought and attention, responsible computing research and the technologies that follow from it can reduce the risks that computing technology will introduce more harm than social good. The need to engage the full spectrum of stakeholders may be most pronounced when technological solutions are sought for social problems, because “technological solutionism” often involves prioritizing the needs of or directing resources to private actors without adequate community involvement or democratic oversight.78,79 The values and interests of people and groups who are not well-represented in computing research are at particular risk of being systematically ignored. In the absence of rigorous methodologies and frameworks for identifying the complicated social dynamics (outlined earlier in the report) that shape the problems that computing research strives to address, computing research is much less equipped to produce theories, products, or artifacts, not to mention deployed systems into which that research feeds, that adequately solves for those most in need of what computing has to offer. Panel presentations by healthcare experts illuminated the importance of engaging stakeholders by highlighting the striking contrast between the non-involvement of clinical staff in electronic medical record design and their very successful incorporation into the design of an early warning system for sepsis. The key difference in the sepsis research outcomes came from engaging nurses to understand their 77 B. Wilder, E. Horvitz, and E. Kamar. Learning to Complement Humans, International Joint Conference on Artificial Intelligence 2020. 78 Brayne, S. 2021. Predict and Surveil: Data, Discretion, and the Future of Policing [online text], Oxford University Press. 79 Green, B. 2019. The Smart Enough City: Putting Technology in Its Place to Reclaim Our Urban Future, MIT Press. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 49

expert knowledge of handling of sepsis and the workflows that prove critical to them executing that expertise. An interdisciplinary team of researchers developed Sepsiswatch so that it could track the stakeholders of the current approaches and systems built to monitor patient infections and understand who benefits the most from the current workflows. From there, computing researchers, informed by social scientists on the team, came to their designs with a clearer sense of how to distribute the risks and benefits of a system for monitoring patient outcomes as equitably as possible. A more effective, responsible mechanism for sepsis management in a clinical setting resulted from including the individuals who regularly contributed to patient monitoring, as well as a broader group of stakeholders including the patients, nursing staff, and hospital administration. Panelists also spoke about the challenges of engaging stakeholders in a way that crystallizes their needs, conveying constraints and possibilities to systems designers.80 Another example from work and labor of the importance of computing research considering a fuller range of stakeholders in its designs comes from a different area of health care. In the 1990s Kaiser Permanente began developing a robotic system to assist Environmental Service workers in cleaning the hospital. Engineers engaged these workers alongside medical providers in the design. The environmental service workers’ knowledge of practical ways to combat infection and bacteria in rooms made the system design better than if the engineers had only talked with hospital clinical staff or administrators. Presentations to the committee on labor and work also provided examples that illustrated the problems of not engaging stakeholders: fast food workers left out of a design for food safety could not use a system that did not understand their everyday workflows;81 groundskeepers had to contend with the noise and disconcerting worker surveillance of a drones system designed to help landscaping efforts that did not assume people’s work might be made less productive by the drone’s presence,82 and dynamic scheduling of work schedules negatively impacted the well-being of shift workers because these systems assumed work assignments were the most important factor while workers needed to prioritize other life demands such as from commutes and childcare.83 Technical computer science and computing research training does not currently provide computing researchers with the knowledge and skills needed to move beyond the instinct to develop new technologies that they imagine would be terrific for themselves or people with whom they regularly interact. Nor are there incentives for computing researchers to draw on social scientific expertise to identify and engage stakeholders to better map out the social dynamics that could inform a system’s design. For instance, surveillance cameras might make janitors on the night shift feel safer or make them feel as if they are being surveilled.84 Different choices of algorithms may lead to different ways of balancing the trade-off between these two likely effects. To know how to think about such eventual outcomes, computing researchers recognize the importance of thinking about the ways a new technology (incorporating or based on their research) might be used, by whom and in what contexts and with what potential impacts. It is unrealistic to expect all computer scientists to develop such expertise, but they should appreciate its importance and learn how to work with those with such expertise. The successful development of all the systems discussed by our expert panelists had one thing in common: they involved computing researchers incorporating the insights and subject matter expertise of a range of stakeholders who were not obvious end-users of their systems. In most cases, these success stories involved computing researchers working with social scientists trained to see the stakeholders in the mix. Stakeholders are sometimes obvious. More often, they are groups or individuals harder to see if one is tightly focused on who might buy or use a piece of technology or worse, if there's an assumption that it doesn't matter to think about who might use a system. 80 Latanya Sweeney, Harvard University, presentation to the committee on March 16, 2021 81 Mary Kay Henry, Service Employees International Union, presentation to the committee on April 29, 2021. 82 Mary Kay Henry, Service Employees International Union, presentation to the committee on April 29, 2021. 83 Min Kyung Lee, University of Texas, presentation to the committee on March 11, 2021. 84 Mary Kay Henry, Service Employees International Union, presentation to the committee on April 29, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 50

Integrating Computing and Domain Expertise Computing systems are increasingly essential infrastructure for other disciplines as well as having impact across much of daily life. For them to work well requires expertise in the domain of application (see also Section 3.1.5, “Aligning with Existing Norms, Structures, and Practices”). It also requires expertise in the social sciences that is important both for bridging and synthesizing across a range of subject matter expertise. For instance, they can provide expertise in ways to understand the distribution of risks and benefits as well as to resolve tensions among stakeholders. Recent work on epidemiological modelling, projects in linguistics and large language models, and contact tracing have all shown the importance of engaging with domain experts. Additional challenges arise when data-driven systems are used for advocacy by multiple parties who may be competing or engaged in an adversarial decision process such as those used in the legal system. As computing researchers as well as researchers and scholars in other disciplines have limited expertise, designing systems that work well requires a partnership between computing and domain experts. Absent such a partnership, systems typically fail. For example, Robert Wachter pointed out that the “battle” for becoming the dominant electronic health record (EHR) company was not won by any of the leading companies that originally tried (including IBM, General Electric, Google, and Microsoft), because they lacked sufficient healthcare domain knowledge and focus.85 Instead, the two leading HER vendors were companies built solely for the purpose of creating and selling EHRs: Epic and Cerner. That domain knowledge proved more crucial than competencies in data analytics, artificial intelligence, data visualization, and consumer facing cloud tools. The experience of M.D. Anderson with IBM Watson on cancer therapy recommendations efforts is another notable example.86 There are many notorious examples of poor outcomes when computer science researchers work without regard to the bodies of knowledge in other disciplines,87 demonstrating the importance of interdisciplinary partnerships to responsible computing research. One example of a true and successful interdisciplinary research partnership is the decryption of the Copiale Cipher through a collaboration between a computer scientist and two linguists.88 The success of this effort led in turn to a long-term research project involving computer scientists, linguists and historians.89 Interdisciplinary involvement in many areas of computing research requires a collaboration of computing researchers and disciplinary experts as equals. Too often, interdisciplinary research that involves computer scientists devolves into a “consultant” model—either the computer scientists are treated as software developers or the researchers from other disciplines are minimally included. For example, an analysis of projects funded under the National Science Foundation’s Information Technology 85 Robert Wachter, University of California, San Francisco, presentation to the committee on March 16, 2021. 86 Strickland, E. 2019. “How IBM Watson Overpromised and Underdelivered on AI Health Care,” IEEE Spectrum. https://spectrum.ieee.org/how-ibm-watson-overpromised-and-underdelivered-on-ai-health-care. 87 “Bad Research and Practice in Technology Enhanced Learning,” Education Sciences (ISSN 2227-7102) https://www.mdpi.com/journal/education/special_issues/technology_education; Evgeniou, T., D.R. Hardoon, and A. Ovchinnikov. 2020. “What Happens When AI Is Used to Set Grades?” https://hbr.org/2020/08/what-happens-when- ai-is-used-to-set-grades; Coldewey, D. 2020. “Google medical researchers humbled when AI screening tool falls short in real-life testing.” TechCrunch. https://techcrunch.com/2020/04/27/google-medical-researchers-humbled- when-ai-screening-tool-falls-short-in-real-life-testing/; Heaven, W.D. 2021. “Hundreds of AI tools have been built to catch covid. None of them helped.” MIT Technology Review. https://www.technologyreview.com/2021/07/30/1030329/machine-learning-ai-failed-covid-hospital-diagnosis- pandemic/. 88 Markoff, J. 2011. “How Revolutionary Tools Cracked a 1700s Code.” The New York Times. https://www.nytimes.com/2011/10/25/science/25code.html. 89 Automatic Decryption of Historical Manuscripts: The DECRYPT Project. https://de-crypt.org/. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 51

Research program found that nearly a third of senior personnel on these projects did not publish together.90 Interdisciplinary research projects are, however, more difficult to conduct; it is necessary for the collaborators to develop understanding of the terminologies, concepts and methods of each discipline, and this takes time.91 As discussed in the subsection “Integrating Ethical and Societal Issues into Training,” earlier in this chapter, a broader education than is the current standard in computer science is needed and can start at the undergraduate level. Furthermore, the structure of academia and academic promotion processes continues to inhibit the formation of such partnerships. Interdisciplinary research may be disregarded as part of tenure and promotion; different fields value different types of research productivity (e.g., conference versus journal publications) and non-core computing research may be considered soft. As Madeleine Elish pointed out, “it doesn’t count for tenure to … work in new spaces.”92 Ben Green remarked that “it’s very hard at universities to actually create these types of deeply integrated interdisciplinary environments” and pointed out the need for “funders to create mechanisms for actually doing that.”93 Thus, research organizations and scientific and professional societies need to adapt their structures and evaluation processes, so they properly recognize such research. 3.3.2 Deployment Characteristics of computing systems design and the information provided about system capabilities influence decisions made by those deploying new technologies and can affect the societal impact of deployed systems. Although deployment is downstream from computing research, researchers and the computing research community incur responsibilities related to enabling acquirers of new technologies to make wise decisions. For them to meet these responsibilities requires their taking into account various features of deployment. This section describes three sources of potential ethical and societal impact concern: institutional pressure on procurement of technologies to address societal problems, challenges presented by the complex nature and development of computing systems, and challenges of ensuring appropriate use. It also discusses challenges of disparate access to new technologies and the importance of governance mechanisms and regulation. Recommendations 7 and 8 include practical steps researchers can take to help address these concerns. Recommendation 3.4 indicates steps academic institutions can take in educating students to help. Acknowledging Institutional Pressures Some of the factors that drive organizations to deploy new computing technologies can lead to problematic outcomes either for those organizations or for individuals or groups affected by the actions and decisions of those organizations. Presentations to the committee relating to the use of computing technology in the public sector revealed three types of challenges for those making acquisition decisions: (1) pressures to improve the efficiency of the organization, (2) pressures to improve the accountability of the organization, and (3) insufficient knowledge in institutions about the technologies they are 90 Cummings, J.N. and S. Keisler. “Who Collaborates Successfully? Prior Experience Reduces Collaboration Barriers in Distributed Interdisciplinary Research.” https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.352.4322&rep=rep1&type=pdf. 91 Brister, E. 2016. Disciplinary Capture and Epistemological Obstacles to Interdisciplinary Research: Lessons from Central African Conservation Disputes. Studies in History and Philosophy of Biological and Biomedical Sciences 56:82–91. https://philarchive.org/archive/BRIDCA-4. 92 Madeleine Claire Elish, Google, Inc., presentation to the committee on March 16, 2021. 93 Ben Green, University of Michigan, presentation to the committee on May 25, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 52

considering procuring. In each case, there may be opportunities for computing researchers to help such institutions in making better decisions. Institutions under pressure to enhance efficiency will sometimes turn to computing technologies even if the case that they will in fact yield greater efficiency has not been made or the groundwork needed to realize those benefits has not been laid or sufficient attention given to the fact that technological approaches alone cannot solve societal problems (see Section 3.1.4, “Proper Roles for Technologies in Addressing Societal Problems”). Discussing adoption of predictive analytic policing tools by the Los Angeles Police Department, for example, Sarah Brayne94 observed that the tools appeared to have been adopted not necessarily because there was empirical evidence that their use would actually improve outcomes of interest but rather because the department, like many other government agencies, was facing institutional pressures to adopt data analytics. These pressures arose from an impression that their use would result in more efficient allocation of law enforcement resources as well as improve objectivity and reduce bias in the department’s decision-making. Health care delivery provides another example: a desire to improve the efficiency and quality of U.S. healthcare prompted the federal government to adopt incentives and penalties for hospitals and medical offices to adopt EHRs. As the systems were rolled out, both the medical practitioners who used them and the institutions that deployed them came to understand that merely digitizing health records was far from sufficient to achieve the efficiency and quality goals. Much work would be needed to realize the vision of EHRs and to understand and transform the work, workflows, and relationships associated with delivering medical care needed to take full advantage of those EHRs. Institutional efforts to improve efficiency through computerization can be simultaneously overly ambitious and not ambitious enough. In his remarks to the committee,95 Ben Green observed that “algorithmic reforms are simultaneously too ambitious and not ambitious enough. On the one hand, algorithmic interventions are remarkably bold: algorithms are expected to solve social problems that couldn’t possibly be solved by algorithms. On the other hand, algorithmic interventions are remarkably timid and display a notable lack of social or political imagination: such efforts rarely take aim at broad policies or structural inequalities, instead opting merely to alter the precise mechanisms by which certain existing decisions are made.”96 Government agencies and other institutions also face pressures to use computing technologies in an attempt to improve accountability. For example, Sarah Brayne described how the Los Angeles Police Department responded to a consent decree by deploying a new data-driven employee risk management system and associated capabilities for data capture and storage that subsequently raised a set of societal issues because of mission creep.97 (See the subsection “Mission, Function, and Scale Creep” below.) Last, government agencies and other institutions frequently lack the in-house technical expertise to make informed decisions about the design and implementation of computing technologies. This is exacerbated when developers of technology are unclear about a technology’s limitations. In his presentation to the committee, Jens Ludwig noted that Broward County, Florida, might routinely procure millions of dollars of laptop computers and cell phones, able to draw on quality information that is widely available to consumers. Ludwig contrasted this with the county’s procurement of the Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) case management and decision support tool used to assess the likelihood of recidivism. Ludwig concluded that it “would be fair to 94 Sarah Braye, The University of Texas at Austin, presentation to the committee on March 4, 2021. 95 Ben Green, University of Michigan, presentation to the committee, May 25, 2021. 96 Green, B. “Algorithmic Imaginaries: The Political Limits of Legal and Computational Reasoning.” Law and Political Economy Blog (2021), https://lpeproject.org/blog/algorithmic-imaginaries-the-political-limits-of-legal-and- computational-reasoning/. 97 Sarah Brayne, University of Texas, Austin, presentation to the committee on March 4, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 53

wonder to what degree COMPAS was evaluated prior to deployment by Broward County in terms of its accuracy … and fairness in Broward County.”98 At the same time, vendors frequently consider the detailed working of their systems to be proprietary. In his remarks to the committee, Dan Ho cited as an example a U.S. Customs and Border Protection procurement of biometric systems for use in border entry. The agency’s efforts to identify the cause of failure with an iris scanning system were stymied by the inability to understand proprietary technology. When contractors cannot provide additional detail, according to Ho, the agency’s ability to oversee the program can be undermined. The lack of in-house technical knowledge and vendors claims that their technologies are proprietary lead to what economists refer to as a principal agent problem,99 in which vendors know a great deal more about a system’s performance, limitations, and shortcomings than the acquirer does. This informational asymmetry can have significant effects outside of the acquiring institution. With systems that make consequential decisions such as where to focus policing or whether to grant bail, there can be significant negative consequences for individuals and for groups in the communities the institutions serve. Ensuring Appropriate System Characteristics Continuous integration and continuous deployment. The pace of innovation in computer science is rapid, which is both a blessing and a curse. In the best case, low barriers to implementing new ideas enables real problems to be solved quickly and efficiently. However, in the worst case, the urge to immediately release “the next big thing” leads to a reckless disregard for downside risk, and to a “building for building’s sake” mentality that deprioritizes the fundamental goal of research and development: to generate new insights and new technologies that serve a higher societal purpose. Compounding the problem is the popular practice of continuous integration and continuous deployment (CI/CD) in which a tech product is expected to have flaws throughout its deployment, and to receive a constant stream of tweaks along the way. Kevin Fu observed that testing before deployment is understood in the healthcare setting to be life critical: “even if the software patch is available, it might not be deployed overnight” because it takes time to analyze the impact of a patch on behavior of the overall system.100 The CI/CD model is seductive because it might be seen as absolving technologists of the burden of forethought; post-deployment problems are seen as inevitable, and as an acceptable cost of progress. However, the costs of addressing problems after the fact is often much higher than the cost of addressing them during the initial design of a system. Indeed, it is considered good industrial practice for each system update to go through multiple review steps for quality, safety, privacy, and reliability. Sometimes, however, research teams may release experimental systems with less scrutiny. For example, the Microsoft Tay chatbot was released directly from a research and development (R&D) group 98 COMPAS was purchased by Broward County, Florida, in 2008. Prior to that point, evaluations of COMPAS by Northpointe (the software’s creator) had apparently been limited to other jurisdictions, specifically parole systems in New York and California. In 2009 the Broward County auditor’s office published an evaluation of the Pretrial Service Program in the county (Evaluation of the Pretrial Services Program Administered by the Broward Sheriff’s Office, Report No. 09-07, May 18, 2009, https://www.broward.org/Auditor/Documents/pretrial_final060909.pdf) noting that the COMPAS tool had by then still not been validated in Broward County specifically. The first validation of COMPAS in Broward County specifically that is publicly available seems to be a 2011 analysis by Florida State University (Florida State University College of Criminology and Criminal Justice, 2011, “Managing Broward County’s Jail Populations: Validation of the COMPAS Risk Assessment,” January 19, https://criminology.fsu.edu/sites/g/files/upcbnu3076/files/2021-03/Broward-COMPAS-Validation.pdf). 99 Jens Ludwig, University of Chicago, presentation to the committee, March 4, 2021; Laffont, J-J, and J. Tirole. A theory of incentives in procurement and regulation. MIT Press, 1993. 100 Kevin Fu, U.S. Food and Drug Administration, presentation to the committee on May 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 54

to the public, and within 24 hours had been trained to be sexist and racist.101 Ece Kamar commented that there is an “organizational question for anybody who is deploying these complex computational systems in their workplaces, how to ensure that there are feedback loops.”102 CI/CD reduces the friction of deploying changes to market but risks the devaluation of substantive (albeit slower moving) discussions about the unexpected consequences of a change. Put another way, there is a tension between the natural desire of researchers to test their latest idea “in the wild” and institutional processes that rank values such as safety and privacy more highly. Validation. Why should one have confidence in a computational system? Why should it be entrusted with sensitive data, or the ability to make decisions with important consequences in the real world? In part, the ability to trust a system is derived from characteristics that lend themselves to objective definitions and measurements. One can measure how often a system crashes, and how many seconds it needs to handle a request; one can verify that the system has installed up-to-date security patches and protects data with encryption and access controls. However, a system which is secure, fast, and highly available may still produce results that humans (or other systems) should not trust. So how should one define the validity of a system’s results? In other words, how should one evaluate our confidence that the results are appropriate reflections of the ultimate goals of the system? These computer-systems questions are related to the instrumental ethical values of trustworthiness, verifiability, assurance, and security described in Section 2.1. Answering the systems validity question is increasingly difficult for the complex systems of the modern era. These systems are typically built without a formal specification that produces rigorous, comprehensive test cases by which concrete implementations can be evaluated. For example, the high-level goals of an “operating system” like Linux are sufficiently well-understood to enable independent groups of developers to work on different parts of the OS in parallel, with the pieces eventually integrating to work together in a (mostly) cohesive whole. However, an operating system in practice is sufficiently complicated that emergent problems can and do occur, with the appropriate solution often requiring subjective reasoning. For example, what should happen when a change in the OS’s scheduling algorithm, introduced to make certain workloads run faster, has a detrimental effect on other ones?103 Even though the Linux community has a variety of tools for determining how new kernel updates impact objective performance metrics,104,105 many updates involve trade-offs between different metrics, requiring human reasoning to decide if the net result is positive. The challenge of defining validity exists in every subdiscipline of computer science, but the rise of machine learning has provided salient examples. For instance, the oft-lamented problem of biased training data is really a problem of validity. A good example is how distributional biases in associating professions with gender in training data lead to not just biased, but actually incorrect translations.106 Another illustration comes from the problem of face recognition. An image database of human faces, used to train a facial recognition algorithm, is invalid if the data set lacks demographic diversity. From a narrow mathematical lens, the invalidity can be defined with respect to the statistical divergence of the data set’s images from the richness of faces that exist in real-life. However, this statistical notion of invalidity is fundamentally given meaning by a values-based decision that humans must make—the 101 Wolf, M.J., K.W. Miller, and F.S. Grodzinsky. 2017. “Why We Should Have Seen That Coming: Comments on Microsoft’s Tay “Experiment,” and Wider Implications.” The ORBIT Journal. Volume 1, Issue 2, 2017, Pages 1– 12. https://www.sciencedirect.com/science/article/pii/S2515856220300493. 102 Ece Kamar, Microsoft Research, presentation to the committee on March 11, 2021. 103 D. Chiluk. 2019. “Unthrottled: How a Valid Fix Becomes a Regression.” Indeed Engineering Blog. https://engineering.indeedblog.com/blog/2019/12/cpu-throttling-regression-fix/. 104 Chen, T., L.I. Ananiev, and A.V. Tikhonov. “Keeping Kernel Performance from Regressions.” Proceedings of the Linux Symposium. June 2007, Ottawa, Ontario, Canada. https://www.kernel.org/doc/ols/2007/ols2007v1- pages-93-102.pdf. 105 Intel. 2021. Linux Kernel Performance project. Intel. https://01.org/lkp. 106 Webster, K., and E. Pitler. 2020. “Scalable Cross Lingual Pivots to Model Pronoun Gender for Translation.” arXiv preprint arXiv:2006.08881. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 55

statistical divergence is only important because we as humans should prefer to live in a society where facial recognition technology works equally well for all people, regardless of characteristics like age or gender or race. Thus, defining “validity” in computer science requires something beyond mere technical skill; it requires moral imagination. Mitchell Baker of Mozilla drew attention to yet another kind of validation challenge—the difficulty of academic researchers being able to investigate the ethical and societal implications of today’s Internet- scale computing systems.107 To do so would require that the researchers have access to the large data sets these systems rely on, the models built with that data, and large-scale computational platforms. However, large data sets of great interest to researchers, such those containing user-generated content and user interactions, is highly concentrated among a handful of companies. Some of these firms have developed mechanisms to share data with some academics. Initiatives such as the National AI Research Resource aim to widen access to computational resources and data so that a broader swath of academia will be able to carry out research in this arena. However, as discussed further at the end of Section 3.5.2, there are manifest tensions among the interests of researchers, the proprietary interests of companies, and the privacy interests of users. Last, validation also requires the courage to admit that our assumptions may be incorrect. For example, the idea that “more data leads to better decision-making” is intuitively appealing. However, algorithmically generated decisions do not automatically lead to better outcomes. As Dan Ho described in his remarks to the committee, “Beginning in the 1990s, criminologists advanced predictive policing as a method to forecast crime hotspots and drive down crime. Several jurisdictions conducted rigorous evaluations and leading studies showed no benefit in terms of crime reduction.108 We should not underestimate the value of rigorous inquiry; if only limited parties have access to the data to evaluate systems, accountability is not going to be possible.” Ensuring Appropriate System Use Mission, function, and scale creep. Computing technologies are typically developed and deployed to address particular needs or challenges, and a deployed computing technology can have many different intended missions, functions, or scales at which it is intended to operate, where some might be unstated or implicit. Because computers are universal machines, part of their power is that computing technologies developed for one purpose might be used for a range of other purposes. All is well and good if the new use is an appropriate one. In general, however, technologies—including computing technologies—that are developed and deployed for one function might be inappropriate, or even harmful, for other functions. In many cases, significant challenges arise when the mission, function, or scale change over time, particularly if the changes are not explicitly noted (as there is “creep” of various sorts). For example, when used by small groups, over short time, or with limited reach, algorithms for collaborative filtering of news items can help people quickly access information that is more likely to be relevant to their current needs. However, there are concerns that when deployed at global scale those same algorithms can in some cases contribute to the creation of echo chambers, information bubbles, and increased polarization. Presentations to the committee in health care, work and labor, and justice discussed several others, including 107 Mitchell Baker, Mozilla Corporation, presentation to the committee on June 24, 2021. 108 Hunt, P., Saunders, J., and Hollywood, J.S. 2014. Evaluation of the Shreveport predictive policing experiment. Santa Monica, CA: RAND Corporation and Saunders, J., Hunt, P., and Hollywood, J.S. 2016. Predictions put into practice: A quasi-experimental evaluation of Chicago’s predictive policing pilot. Experimental Criminology 12:347–371. Ho also indicated that some studies did find effects; see some studies did find effects: Mohler, G.O., Short, M.B., Malinowski, S., Johnson, M., Tita, G.E., Bertozzi, A.L., and Brantingham, P.J. 2015. Randomized controlled field trials of predictive policing. Journal of the American Statistical Association 110(512):1399–1411. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 56

 Workforce monitoring technologies can be beneficial in terms of identifying potential safety risks, but their mission can easily creep into harmful and invasive surveillance.  Many EHR systems work well for information recording and transfer, particularly for billing purposes. However, they are now also expected to support the functions of healthcare delivery, including diagnosis and planning. Numerous studies have shown that EHR systems create numerous challenges and healthcare failures, in part because they are being used for different functions for which they were not appropriately designed.  As Sarah Brayne described in her remarks to the committee the Los Angeles Police Department deployed a new data-driven employee risk management system and associated capabilities for data capture and storage as part of a consent decree with the U.S. Department of Justice. Intended to improve accountability, it spurred, according to Brayne, a proliferation of automated decision making throughout the department and the repurposing of data.109 Significant societal impact problems can arise if computing methods are deployed at a larger scale than originally intended. Systems that work at a small scale may not work at a large scale as Twitter learned that the hard way during the 2010 World Cup when the increase in tweets per second led to short periods of unavailability.110 Such experiences point to the conflicts between moving quickly in a competitive environment and following good engineering practice. For computing researchers, two key issues are how to create experimental frameworks that facilitate safe staged deployment and how to teach researchers to accept a slower pace to their impact in service of greater care in avoiding unintended consequences. These kinds of “creep” clearly present significant issues for those deploying computing technology, and better practices around implementation can help guard against problematic creep. But the possibility of problematic creep also raises issues for computing researchers. Appropriate development and deployment require an understanding of the capabilities of the underlying computing technology. Moreover, research and deployment often occur in a distributed or decentralized manner, so that no individual or group is involved in every step from research idea to final implementation. Although computing researchers cannot possibly anticipate every possible use of their work, they have an obligation to be clear about the exact functionality and appropriate uses of the products of their research (see also the subsection “Specifying Intended Functions and Uses of Research and Systems” earlier). Doing so will help those using their research results to make better informed decisions about what missions, functions, or scales are appropriate for a computing technology. Strategic behavior by individuals and institutions. Negative societal outcomes from computing systems may also come from a failure to adequately anticipate strategic behavior by end-users or by other computing systems. Strategic behavior can take many different forms. Most commonly, a computing system is designed with the goal of optimizing for some property, which is not directly observable by the system. The system therefore optimizes for some measurable feature. Often that feature diverges from the goal by enough that, when users learn what is being optimized for, they are able to manipulate the system to receive preferred outcomes without actually better exemplifying the underlying property for which the system is intended to optimize. Examples of this abound, including search engine optimization (heaps of links), students gaming the computer grading of papers, and consumers manipulating their FICO scores.111 There are some notable cases where industry practitioners have had success in reducing 109 Sarah Brayne, University of Texas, Austin, presentation to the committee on March 4, 2021. 110 Reichhold, J., D. Helder, A. Asemanfar, M. Molina, and M. Harris. 2013. “New Tweets per second record, and how!” Twitter. https://blog.twitter.com/engineering/en_us/a/2013/new-tweets-per-second-record-and-how. 111 Hu, L., N. Immorlica, and J. Wortman Vaughan. 2019. “The disparate effects of strategic manipulation.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 259–268.; Milli, S., J. Miller, PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 57

detrimental strategic behavior including combatting Web spam that attempts to bring questionable content to the top of Web search results. Sometimes strategic behavior is intended to directly thwart the goal of the designers of the computing system. For example, consider the AdNausium browser extension112 designed to counteract online advertising by randomly clicking on ads in the background, obstructing the attempt to profile the user, while also imposing costs on advertisers who pay per click. This example also illustrates the sort of value conflict tensions (see Section 2.2) that designers of computing systems must consider, here between consumers and content creators reliant on advertising. Or consider how protesters in Hong Kong (and others) developed new strategies for avoiding detection by facial recognition cameras.113 These are cases in which a computing system is designed to extract something from users, and the users resist that extraction. In other cases, strategic behavior is deployed in order to co-opt the computing system, turning its outputs to the advantage of the user. This co-option can be relatively trivial, as with the way users of Microsoft’s Tay taught it swear words and hate speech before Tay was quickly taken down. But this dynamic can also have very significant social and political consequences. For example, content creators have sought to understand the recommender systems that allocate attention online, in order to optimize the visibility of their content. This interplay is described at length in Bucher.114 Some argue that the tendency of social media companies to promote highly engaging, potentially divisive content, has been operationalized by extremists in order to advance their economic and political interests, with deleterious effects on democratic public cultures.115 In addition, in recent years computing researchers have had to pay attention not only to strategic behavior by users, but to strategic behavior by competing computing systems, such as generative adversarial networks, whose role is to learn the behavior of a computing system and then confound it. Importantly, this feedback loop between competing systems—one producing increasingly difficult problem instances and one trying to learn from them—has itself led to powerful new methods for generating and classifying image and text data.116 Of course, strategic behavior by those who are subject to a computing system need not always be socially deleterious. If the system is optimizing for some feature that is itself a valuable attribute for people to display, then strategic behavior can actually advance the goals of the system.117 This depends on the system being sufficiently interpretable to those affected by it, so that they can rationally respond to the incentives it creates.118 Similarly, strategic behavior to resist the use of computing systems to exercise social control, or to extract value from users, should generally be welcomed. A.D. Dragan, and M. Hardt. 2019. “The social cost of strategic classification.” In Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 230–239. 112 Adnauseam, https://adnauseam.io. 113 Holmes, A. 2019. “These clothes use outlandish designs to trick facial recognition software into thinking you're not human.” Business Insider, Australia. https://www.businessinsider.com.au/clothes-accessories-that- outsmart-facial-recognition-tech-2019-10. 114 Bucher, T. 2018. If … Then: Algorithmic Power and Politics, New York: Oxford University Press. 115 See, for example, Munger, K. and J. Phillips. 2020. “Right-Wing YouTube: A Supply and Demand Perspective.” The International Journal of Press/Politics; Metz, C. 2021. “Feeding Hate with Video: A Former Alt- Right YouTuber Explains His Methods.” The New York Times. https://www.nytimes.com/2021/04/15/technology/alt-right-youtube-algorithm.html. 116 Goodfellow, I., J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. 2020. “Generative adversarial networks.” Communications of the ACM 63/11:139–144. 117 Kleinberg, J. and M. Raghavan. 2019. “How Do Classifiers Induce Agents to Invest Effort Strategically?” Proceedings of the 2019 ACM Conference on Economics and Computation (Phoenix, AZ: Association for Computing Machinery), 825–844. 118 Selbst, A.D. and S. Barocas. 2018. “The Intuitive Appeal of Explainable Machines,” Fordham Law Review, 87:1085–1139. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 58

The strategic behavior of users and agents in response to computing systems poses a challenge for responsible computing research, because it requires integrating computing research with a deep understanding of human psychology and social scientific insight into the effects of computing systems when deployed in the real world, often at massive scale. Fulfilling Societal Responsibilities Disparate access to technologies. Computing technologies have become increasingly essential to economic, social, and political life, and have in some cases dramatically improved economic opportunity for individuals and small businesses through, for example, better access to markets and financial services. Such benefits point to the value to society of making technologies more widely accessible. At the same time, the consequences of disparities in access to them have grown more pronounced. The disparities have multiple sources including the cost of computing hardware, software, and communications and other services. Although the price-performance ratio for smartphones and laptops has fallen over time, they are still expensive on an absolute basis. Another source of disparity is availability—broadband Internet service is unavailable in some areas, especially rural ones and even when service is available it may be of poor performance or high cost. Caps on monthly data use are another constraint. Last, not everyone has the skills and experience needed to make effective use of computing technologies. Several speakers to the committee discussed ways these issues play out in the delivery of healthcare. As Vickie Mays put it, computing technologies were enormously beneficial during the COVID-19 pandemic but because those technologies are not distributed in an equitable fashion, the benefits of the computing technologies have been unavailable to the very people who need them most.119 For example, those without broadband Internet were at a distinct disadvantage in obtaining information about the COVID-19 vaccine let alone in making an appointment to receive it. Abi Pitts echoed these concerns, noting that the shift to telemedicine during the pandemic took place without full consideration of the impacts on more vulnerable populations, thereby widening disparities in access to healthcare in those groups.120 For computing researchers, one lesson is that disparities in access remain an issue for any service that needs to reach the entire population. It is also a reminder that computing research aimed at reducing cost, increasing availability, or improving usability can reduce disparities in access to the services needed for well-being and participation in society. Governance principles for new technologies. When the results of computing research are integrated into systems with potential societal impact, some facets of social responsibility need to be addressed through governance mechanisms and by regulatory bodies. In many of these settings there are diverse goals and incentives and varying regulatory and governance policies and structures. There may also be considerable variation in the maturity of these policies and structures. Institutions across technologies and domains, and institutions responsible for developing or carrying out oversight can have blind spots or inadequate controls and governance mechanisms and sometimes capacity challenges. Some particularly distinct and challenging arenas include the following: commercial sector developing computing technologies for highly competitive and fast-moving markets and innovation arenas; public sector services such as health, education, and human services; and national security. A variety of approaches are possible, and a variety of challenges arise. For instance, possibilities for oversight in the commercial sector include internal advisory boards (e.g., Microsoft’s Aether Committee, Office of Responsible AI, and Responsible AI Strategy in Engineering) and risk management activities by 119 Vickie Mays, University of California, Los Angeles, presentation to the committee on May 6, 2021. 120 Maryann Abiodun (Abi) Pitts, Stanford University School of Medicine / Santa Clara Valley Medical Center, presentation to the committee on May 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 59

a company’s board of directors. In the public sector, governance procedures are needed to ensure that public goals and incentives inform technology procurement and deployment. For national security, governance and oversight are particularly difficult because of secrecy needs but also ever more important. The governance challenges in such circumstances are not the direct responsibility of computing researchers but they do create opportunities for computing researchers to engage with government agencies, private sector institutions, and civil society to help develop or enhance governance principles and frameworks. 3.4 SYSTEM ROBUSTNESS In computing research and development, some ethical dilemmas and problematic societal impact are difficult to predict. Many of these dilemmas, including some of those described in earlier sections, involve interactions between technology and society that are novel and unprecedented, and so ongoing monitoring and reevaluation (as Chapter 2 argues) and willingness to adapt the technology post- deployment are the only possible ways to adequately handle ethical and societal impact issues. Many other ethical and societal impact problems in computing are not, however, of this nature. Rather, they arise from failures to apply known best practices for ethical design, and failures to devoting enough time, thought and imagination to pondering ways a technological system might be used or exploited in unanticipated manners. This section describes several of the major computing technical arenas in which failures are of this second nature. Recommendation 7 provides several steps researchers can take to avoid them. 3.4.1 Trustworthy, Secure, and Safe Systems Typical computer scientists’ answers to questions of what makes a program “secure,” is to provide a specific set of attacks that should be prevented, or a particular collection of defenses that a system should employ. For example, data theft is bad, therefore a system must enforce access controls and keep data encrypted by default; running arbitrary code from unknown origins is bad, so a system must run malware scanners to identify and quarantine such code. Such responses reflect the nebulous nature of the concept of “security,” and so applying well-known techniques like encryption enables forward progress toward the goal of a trustworthy system. Conceiving of security in terms of the mere composition of known techniques threatens to miss the forest for the trees because systems differ in their purposes, users, and data. Designing a trustworthy system thus cannot be a checklist-based exercise. Instead, the design effort must center around the following idea: A secure system is one that behaves correctly, despite the active malice or unintentional incompetence of users, administrators, and developers. Frustratingly, this idea, when applied to any particular system, raises more questions than it answers. Crisply defining the correct behavior for a non-toy system is hard. Indeed, generating explicit models for what a program should and should not do is a primary challenge of making a trustworthy system. Furthermore, as a system, its users, and the rest of its operating environment changes, the definition of “correct behavior” may change. There are, nonetheless, a variety of well-known techniques and practices that can be brought to bear to ensure that a program behaves correctly. Many of these approaches have been known for quite a long time. A classic 1975 paper by Saltzer and Schroeder,121 for example, enumerates design principles that remain instructive today. These include open design (which states that systems should be designed in a way that makes them secure even if attackers know everything about the system design except for the cryptographic keys used by the system), 121 Saltzer, J.H., and M.D. Schroeder. 1975. The Protection of Information in Computer Systems, Communications of the ACM, vol 17, no. 7 (July). PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 60

and least privilege (which means that each program and user in a system should have the least amount of authority necessary to perform the relevant tasks). Experience shows that security must be considered during the earliest phases of system design. This advice can be difficult for computing researchers to follow, though, because many of the systems that they build are not ostensibly oriented around security goals. However, as computational technology becomes increasingly ubiquitous, reaching deeper into people’s lives and operations of the public and private sectors, the consequences of incorrect system behavior are multiplying. For example, consider Internet of Things (IoT) systems which embed sensors, actuators, and other computational elements into homes, factories, office buildings, and cityscapes. IoT systems enable “smart” physical environments which can self-adjust their temperature or respond to other environmental stimuli. IoT systems also allow users to perform remote inspection or administration of locales for which placing a human on-site would be difficult, inconvenient, or expensive. Adding technology to an environment that previously did not embed technology, however, exposes that environment to new risks that must be explicitly considered. For instance, embedded medical devices introduce the risk that a person's health becomes directly vulnerable to attacks and placing IoT technology inside of cars introduces new security threats to cars because attackers can break into your car not only by physically breaking a window but also by hacking into the IoT subsystem and getting the car to unlock its own doors. Unfortunately, business interests in bringing IoT technology to market and an absence of government regulation led to a flood of insecure systems. For example, smart light bulbs controllable via Wi-Fi networks used weak encryption systems, expose Wi-Fi passwords to network eavesdroppers.122 The Mirai botnet exploited the fact that many network-controllable video cameras drew their administrator login credentials from a small set (e.g., username “root” and password “admin1234”); using those credentials, Mirai logged into and commandeered hundreds of thousands of devices, using them to generate hundreds of Gbps of denial-of-service traffic.123 In some cases, fixing these IoT security problems was impossible because the devices stored their software in read-only memory. Experience also shows that developing a “threat model,” the formal or semi-formal description of the security problems that are in-scope and out-of-scope for a system to prevent is likewise important because it drives subsequent design work: if a threat is in-scope according to the threat model, then the design must handle that threat. When technologists craft an explicit threat model, they are forced to think like attackers and reckon with possible vulnerabilities; this reckoning typically helps technologists to better understand their systems, and to remove or mitigate possible vulnerabilities. Experience also shows the importance of a post-deployment security strategy for discovering, prioritizing, and fixing security bugs is critical; modern devices (whether they be medical or otherwise) are extremely complicated, and thus will almost certainly contain security bugs at deployment time. Even if vendors employ best-practice security measures at design time, after devices are released to actual consumers, they need to constantly monitor those devices for evidence of unexpected security problems. Researchers and developers must consider potential security threats, even if, historically speaking, attackers have ignored a particular kind of system, or have been unsuccessful in subverting that system. As Kevin Fu stressed in his remarks to the committee on medical device security, attackers are clever, motivated, and relentless, so a lack of prior successful exploits does not imply that no security problems will emerge in the future.124 122 Goodwin, D. 2014. “Crypto weakness in smart LED lightbulbs exposes Wi-Fi passwords.” Ars Technica, https://arstechnica.com/information-technology/2014/07/crypto-weakness-in-smart-led-lightbulbs-exposes-wi-fi- passwords/. 123 Antonakakis, M., T. April, M. Bailey, M. Bernhard, et al. 2017. “Understanding the Mirai Botnet.” Proceedings of USENIX Security. 124 Kevin Fu, Food and Drug Administration, presentation to the committee on May 11, 2021. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 61

Many IoT security follies arose from a neglect to apply these and other well-known best practices for security; others reflected technical myopia in IoT’s early days.125 This instance is an example of where some researchers attempted to anticipate future problems but to little avail because the incentives for industry were in a different direction. IoT devices resemble, approximately, traditional network servers that are exposed to potentially malicious clients. Designers and developers of these devices would have done well to apply the decades of wisdom that the technology industry has accumulated about how to protect network servers. This neglect of wisdom from past experiences and measures to avoid or circumvent serious ethical and societal impact problems is true of many other software systems. Research systems often trod new ground; they are speculative and exploratory, making it hard to predict how intentional evildoing or accidental misbehavior could force an unfinished system to behave incorrectly. To protect these systems, researchers require a knowledge of history and prior art, but perhaps more importantly, some imagination. For example, consider machine learning (ML). Broadly speaking, the goal of an ML system is to analyze a piece of input data and output a classification or prediction involving that data point. To generate such analyses, an ML system must first be trained using a training data set. The quality of that data set influences the quality of the learned observations. Specific experiences with social networking applications and comment sections teach that a non- trivial number of users will intentionally submit maliciously designed content. The recent emphasis in the ML community on poisoned data sets126 and adversarially chosen examples127 recognizes this problem. It would, however, have been better had researchers anticipated malicious behavior based on an understanding that the systems they were producing were sociotechnical. To design trustworthy systems, one must always assume that inputs are untrusted by default. Of course, hindsight is always 20/20. Consider Web technology. Could we fault the developers of the unencrypted, unauthenticated HTTP protocol for not predicting that the protocol would become a foundational technology that would serve as a conduit for emails, financial information, and other sensitive data? Could we blame the inventors of Web cookies for not predicting that cookies, intended to make online shopping carts easier to implement, would eventually be used to support a vast ecosystem of online tracking? Perhaps not—the Web has been catastrophically successful in a way that would boggle the minds of a computer scientist from the early 1990s. However, the modern era is a different one. Now, and for the foreseeable future, technology will be embedded into every aspect of human life. As a result, computer scientists have a solemn responsibility to ponder what would happen if their technologies became catastrophically successful. To do so, computer scientists must think holistically, at a sociotechnical level, about what their programs should and should not do, and then take explicit steps to enforce those expectations. 3.4.2 Software Engineering: Lessons and Limitations In computationally intensive fields, code is a frequent research product, with that code being adopted by other researchers and sometimes making its way into products. Bugs have consequences.128 A range of 125 Early on, IoT CPUs were so low-powered that they could not do “real” cryptography or accept the overhead of standard security measures. Some rushed to deploy devices lacking the power to implement proper security measures. Per Moore's Law, that changed, but too many developers never twigged to the change, instead repeating the shibboleths from a few years earlier. 126 Shafahi, A., W.R. Huang, M. Najibi, O. Suciu, C. Studer, T. Dumitras, and T. Goldstein. 2018. “Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks.” arXiv.org, https://arxiv.org/abs/1804.00792. 127 Goodfellow, I., J. Shlens, and C. Szegedy. 2015. “Explaining and Harnessing Adversarial Examples.” arXiv.org, https://arxiv.org/abs/1412.6572. 128 See, e.g., Tay, A. 2020. “Three ways researchers can avoid common programming bugs.” Nature Index. https://www.natureindex.com/news-blog/three-ways-researchers-science-can-avoid-common-programming-bugs- errors; Soergel, D.A.W. 2015. “Rampant software errors may undermine scientific results.” F1000Research, 3, 303. https://doi.org/10.12688/f1000research.5930.2. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 62

software engineering best practices have been developed to help ensure the trustworthiness of a program,129 including the following:  Design before implementation: Describe what the code should do, plan the architecture of the software artifact, and then fill in the algorithm details.  Test: Implement tests of the code’s correctness.  Peer review: Have another developer examine the code and provide feedback.  Document: Document the code. Computing researchers, with a focus on exploring new ideas and developing new kinds of systems, typically do not follow these practices. When research code is subsequently used by other researchers, its flaws may adversely affect their research and then further leak into the scientific literature. In one notable case, a bug in a script led to potentially incorrect findings in over one hundred publications.130 Also, given the rapid pace of technological development, research code may easily make its way into products. Software engineering best practices are also limited in the range of software to which they apply. As the nature of software artifacts evolves, best practices development can struggle to adapt to technological advances. For example, standard software engineering best practices are not adequate for machine learning artifacts (models), which require different kinds of testing.131 More generally, software engineering best practices do not encourage or reward the consideration of such downstream impacts of the code as unintended consequences and unforeseen uses and misuses. They assume the purpose, scope and application of a project are already defined, and these assumptions discourage the type of critical, creative, big-picture thinking necessary for responsible computing research artifact release.132 3.4.3 Data Cleaning and Provenance Tracking Data are a central component of certain areas of computing research (e.g., data science, human- computer interaction, and much of artificial intelligence), but computing researchers are generally not taught the basics of data management and handling. The Internet and the open science movement have made certain types of data very easy to obtain, most notably data generated by and about people who are on the Internet and research data in publications. This ease with which researchers can find and use “found data” obscures a plethora of concerns critical to responsible computing research, each with potential ethical and societal impact. In particular, computing researchers undertaking a data-intensive research project should at the start of using any data set ask several questions about the data. To illustrate the five key questions below, we will use as an example OpenWebText2 (a corpus of user submissions to the social media platform Reddit).133 129 Trisovic, A., M.K. Lau, T. Pasquier, and M. Crosas. 2021. “A large-scale study on research code quality and execution.” arXiv, https://arxiv.org/abs/2103.12793. 130 Bhandari Neupane, J., R.P. Neupane, Y. Luo, W.Y. Yoshida, R. Sun, and P.G. Williams. 2019. “Characterization of Leptazolines A-D, Polar Oxazolines from theCyanobacterium Leptolyngbya sp., Reveals a Glitch with the “Willoughby-Hoye” Scripts for Calculating NMR Chemical Shifts.” Org. Lett. 21, 20, 8449–8453 https://pubs.acs.org/doi/10.1021/acs.orglett.9b03216. 131 Paleyes, A., R.-G. Urma, and N.D. Lawrence. 2020. “Challenges in Deploying Machine Learning: A Survey of Case Studies.” arXiv, https://arxiv.org/abs/2011.09926. 132 Gogoll, J., N. Zuber, S. Kacianka, T. Greger, A. Pretschner, and J. Nida-Rümelin. 2021. “Ethics in the Software Development Process: From Codes of Conduct to Ethical Deliberation.” Philosophy and Technology, https://link.springer.com/article/10.1007/s13347-021-00451-w. 133 See https://openwebtext2.readthedocs.io/en/latest/?badge=latest. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 63

 Is this data fit for purpose? Is this data a good fit for the research question being addressed? Because some data is easy to obtain, the temptation may be to use it even if it’s not a good fit for the research project. For example, OpenWebText2 would not be appropriate for studying language change over time because it only covers the years after 2005.  Is this data permissible to use? Data may be available yet still protected. For example, some data from OpenWebText2 may be copyright or protected by user agreements. Copyright data may still be permissible for use in research under fair use,134 but it may be harder to defend the use of data protected by user agreements.135  Does this data set comprise an appropriate sample? In the age of “big data,” it is easy to assume that any sufficiently large quantity of data is a good sample, but it’s impossible to know that without examining the data. For example, OpenWebText2 may seem like a “good sample,” but is likely to underrepresent content from China and India, two of the world’s largest countries by population. That may or may not be acceptable for the research goal.  Does this data need to be protected? For example, data may contain personally identifiable information. Even if a data set has been cleaned and anonymized, it may be possible for it to be deanonymized.136  How should data be cleaned and normalized (i.e., structured in a standardized fashion)? OpenWebText2 contains not just natural language but also code, links, tables, and so on. These concerns arise even for data sets curated by other researchers. A new use requires reconsideration137 and despite having a data sheet (as a data sheet may be incomplete, incorrect or out of date138) derivatives of a data set may pose additional concerns.139 For example, the GPT series of models from OpenAI, trained on OpenWebText and other data, are not merely condensed versions of the input data. They can be used to generate text, for example to help student learners of English, or in spam bots. The Copilot model from Microsoft, trained on code from github, can reproduce code that is not licensed for reuse (Risk Assessment of GitHub Copilot140). Furthermore, owing to variations in data collection, sampling, cleaning and normalization, it can be very difficult to trace exactly what data is included in any derivative of a data set. Last, the use of data from humans creates specific challenges. Institutional review board reviews cover only certain types of data intensive research, and only some facets of the use of data from humans, as discussed in a recent Health and Human Services panel.141 By contrast, responsible computing research 134 Levendowski, A. 2018. “How Copyright Law Can Fix Artificial Intelligence’s Implicit Bias Problem.” 93 Wash. L. Rev. 579 (2018) https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3024938. 135 See https://www.aclu.org/press-releases/aclu-sues-clearview-ai. 136 Narayanan, A., and Shmatikov, V. 2019. Robust de-anonymization of large sparse datasets: A decade later. https://www.cs.princeton.edu/~arvindn/publications/de-anonymization-retrospective.pdf. 137 Birhane, A., and Prabhu, V.U. 2021. “Large image datasets: A pyrrhic win for computer vision?” 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). Pp. 1536–1546. 138 Yang, E., and Roberts, M.E. 2021. Censorship of online encyclopedias: Implications for NLP models. In Proc. of the ACM Conference on Fairness, Accountability, and Transparency (pp. 537–548); Baeza-Yates, R. 2018. Bias on the web. Communications of the ACM 61(6):54–61. 139 Doge, J., M. Sap, A. Marasović, W. Agnew, G. Ilharco, D. Groeneveld, M. Mitchell, and M. Gardner. 2021. Documenting Large Webtext Corpora: A Case Study on the Colossal Clean Crawled Corpus. https://aclanthology.org/2021.emnlp-main.98.pdf. 140 GitHub. 2021. “Risk Assessment of GitHub Copilot.” GitHub. https://gist.github.com/0xabad1dea/be18e11beb2e12433d93475d72016902. 141 Department of Health and Human Services. Secretary's Advisory Committee on Human Research Protections (SACHRP) July 21–22, 2021, Meeting, https://www.regulations.gov/docket/HHS-OPHS-2021- 0015/document. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 64

requires looking at a broader set of issues associated with data-intensive computational modeling.142 This point was emphasized by Eric Horvitz of Microsoft in his presentation to the panel.143 3.4.4 Designing for Responsibility The usability of a computing technology is a crucial determiner of whether it will function as intended. Its usability, and more specifically its accessibility, will determine the variety of users who will be able to interact with it. Research in human-computer interaction (HCI) has developed various methods and design principles for making computing systems usable and useful among other considerations.144 These HCI methods can be used to guide not only the design of human-computer interaction systems (aka “user interfaces”), but also usability evaluation and often requirements analysis. More broadly, the theory and methods of design have yielded various tools for imagining possible designs and the futures they might engender. For responsible computing research these tools enable the design of computing methods and systems that increase societal good as well as ones that avoid or mitigate unforeseen consequences and unintended uses of computing research outcomes.145 Research about values in design146 provides methods for value sensitive design and for design justice. Participatory design approaches emphasize the active involvement of current or potential users of a system in design and decision making. Design has become an essential part of human-computer interaction and considering how people really use computer systems as well as an essential complement to ethics and the sociotechnical perspective outlined in Chapter 2. Early usability efforts focused on error rate, efficiency, learnability, memorability, and satisfaction. Basic usability requires that developers observe the cognitive limits of users. For example, people can remember only roughly 7 +/– 2 chunks of information (Miller’s Law147), limiting their capability to understand dense information on a display. Another constraint is that people can handle only a small number of notifications, often leading to users ignoring alert boxes when the notifications occur too fast, which can lead to safety issues. These cognitive requirements can be most easily viewed in guidelines at https://www.usability.gov/148; these should be followed by any system or application aimed at users. These basic cognitive limitations are too narrow to incorporate the full range of use, however. Understanding that technical systems are inherently sociotechnical systems, and therefore wrapped up in their social context of use, requires additional considerations:  Designs need to take into account the different kinds of people that may use the system and the contexts of their use. Different user interfaces may be required by different groups of users, for example. Different user groups—such as inexperienced users, expert users (e.g., airplane pilots), 142 Santy, S., A. Rani, and M. Choudhury. 2021. Use of Formal Ethical Reviews in NLP Literature: Historical Trends and Current Practices. arXiv preprint arXiv:2106.01105; Jordan, S.R. 2019. “Designing Artificial Intelligence Review Boards: Creating Risk Metrics for Review of AI,” 2019 IEEE International Symposium on Technology and Society (ISTAS), 2019, pp. 1–7, doi: 10.1109/ISTAS48451.2019.8937942. 143 Eric Horvitz, Microsoft Research, presentation to the committee on June 10, 2021. 144 Shneiderman, B., and C. Plaisant. Designing the user interface: Strategies for effective human-computer interaction. Pearson Education, 2016. 145 Van den Hoven, Jeroen. 2013. “Value Sensitive Design and Responsible Innovation” in R. Owen, J. Bessant and M. Heintz (Eds.), Responsible Innovation (pp. 75–83). John Wiley & Sons, Ltd.; Van den Hoven, J., Lokhorst, G-J, and Van de Poel, I. (2012). Engineering and the problem of moral overload. Science and Engineering Ethics, 18(1):143–155. DOI: 10.1007/s11948-011-9277. 146 Friedman, B., and D.G. Hendry. Value Sensitive Design: Shaping Technology with Moral Imagination. MIT Press, 2019. 147 Miller, G.A. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review 63(2):81. 148 See https://www.usability.gov/. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 65

elderly, and people with disabilities—may have different usability requirements, such as changes to the user interface or the system functionality. Linguistic variation owing to such factors as accent, dialect, age and gender, or code switching is another important factor for any systems that interact through spoken or written language. Accessibility is still a significant issue for those with disabilities,149 such as people who are vision-impaired or hearing-impaired, or those with movement disabilities. System designers should take careful account of the range of accessibility issues. Special guidelines can be found at Web Content Accessibility Guidelines.150  It is also vital to understand how a system might be potentially used in context, for example, in the specifics of social/organizational processes or the constraints of specific classes of users. For example, an application to help homeless people remember their medication might also require considering where they might find refrigeration to store their insulin. Or notifying family members of an elderly person's fall could be seen either as promoting safety or alternatively as invading privacy; developing health applications depends heavily on the specifics of the social context. Understanding the users in their contexts, in study after study, has been seen to facilitate understanding what system capabilities were required for adoption and effective use by differing users.  One must consider differences in users’ mental models (hence requiring training or assistance) as well as their having potentially very different goals.151 Designers must consider appropriate reward systems or incentive structures in systems that will incorporate groups of users, especially large-scale systems such as social computing or social media systems. Differences in mental models, goals, and reward systems can lead to the benefits and pathologies of anonymity, maladaptive or antagonistic sharing of information, and support for informal roles in organizations and social groupings. Recent research has included a stronger understanding of both network structures, social and computational, including the propagation of information cascades (e.g., misinformation)152 and the role of networks in sharing expertise.153  Recent work in HCI has included a further reconsideration of usability in the sociotechnical context of use. Computing technologies no longer replace current work practices or standard operating procedures with digital practices; much of people's everyday lives have already become digital. HCI is now envisioning how to design new computational contexts and how to include all types of users into those contexts. These requirements for usable systems are not easily considered. Important methods for uncovering the requirements for research projects and products include task analyses and cognitive walkthroughs.154 Testing usability can include think-aloud evaluations and many other techniques.155 As researchers and practice began to consider system use in its social context, additional methods were developed, including 149 Holtzblatt, K., and H. Beyer. 1997. Contextual design: defining customer-centered systems. Elsevier. 150 Web Content Accessibility Guidelines (WCAG). https://www.w3.org/WAI/standards-guidelines/wcag/. 151 Orlikowski, W.J. 1995. “Learning from notes: Organizational issues in groupware implementation.” In Readings in Human–Computer Interaction, pp. 197–204. Morgan Kaufmann. 152 Easley, D., and J. Kleinberg. 2010. Networks, crowds, and markets. Cambridge: Cambridge University Press. 153 Zhang, J., M.S. Ackerman, and L. Adamic. 2007. “Expertise networks in online communities: structure and algorithms.” In Proceedings of the 16th International Conference on World Wide Web, pp. 221–230. 154 Shneiderman, B., and C. Plaisant. 2016. Designing the user interface: Strategies for effective human- computer interaction. Pearson Education. 155 Shneiderman, B., and C. Plaisant. 2016. Designing the user interface: Strategies for effective human- computer interaction. Pearson Education. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 66

contextual inquiry.156 General guidelines for developing usable systems have been developed,157,158 and software developers would be remiss if they do not use all methods applicable to their systems. 3.5 LIMITS OF A PURELY COMPUTING-TECHNICAL APPROACH The sections below discuss two areas of concern—privacy and content moderation—that have arisen with the proliferation of highly networked computing environments and data-dependent AI systems in widespread use. They illustrate the need for close integration of a wide range of disciplinary perspectives pertaining to ethical and societal factors of the type described in previous sections—including social and behavioral science, policy, and governance—and the importance of bringing these perspectives into consideration at the earliest stages of the computing research pipeline. They also show the limits on what can be achieved from a purely technical point of view if this range of perspectives is not invoked until after systems have been deployed. As part of this integration, the computing research community has important roles to play in informing approaches to the key questions in these domains and in developing new methods to assist in addressing them. 3.5.1 Limits to Privacy Protection and Risk Assessments Computational systems regularly process and store sensitive information. Thus, privacy is a central design principle across many areas of computing. However, the mere act of even defining privacy is a difficult one because there are different meanings in different contexts (see also Section 2.1, “The Value and Scope of Ethics”). For example, viewed through the narrow lens of access control and system administration, privacy might refer to who may control how your data is accessed, or who can associate their data with yours, or who can influence the decisions that you make within the system. However, privacy also speaks to higher-level concepts involving human expression and societal organization; privacy rules help to define the relationships between individuals and institutions (public and private) with differing amount of power, encouraging or discouraging individuals to engage in free expression, association, and intellectual engagement without the chilling effects of being monitored or restricted. All of these concerns long predate the development of computing systems. However, the rise of computing has added new urgency for reasons of automation and scale. Data sets that were public in theory but of limited accessibility in practice are no longer costly or difficult to obtain; for example, information about home sales and political donations are now digitized and straightforward for almost anyone to download and analyze. Although the democratization of access has many positive aspects, it also raises new privacy questions. Furthermore, as various aspects of life increasingly involve an online component, new data sets are being generated and being made widely available. These data sets, sometimes held privately by a single company, other times shared with other businesses via often-opaque arrangements, have dramatically increased the amount of data available to analyze. The rise of cheap, commoditized computing-as-a- service has lowered barriers to storing that information and extracting insights from it. Given all of this, thinking about privacy in computational settings demands a reckoning with several fundamental questions. First, what kinds of privacy approaches are desirable? Second, what kinds of privacy approaches are technologically possible? Third, how can different approaches to privacy protection in different countries or regions best be reconciled or accommodated? The questions are related; pondering them together is helpful for identifying both risks and opportunities. For example, imagine a database that stores sensitive information about a variety of users. A desirable privacy goal 156 Holtzblatt, K., and H. Beyer. 1997. Contextual design: defining customer-centered systems. Elsevier. 157 Improving the User Experience, https://www.usability.gov. 158 Nielsen, J. Usability engineering. Morgan Kaufmann, 1994. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 67

might be “queries about this database do not reveal whether my particular information resides in the database.” Starting from that high-level goal, a variety of computer science research has explored the technical possibility of achieving it, using approaches like differential privacy159 (which itself was motivated by a desire to minimize the information leakage allowed by earlier anonymization techniques).160 As information-hiding techniques become more advanced (i.e., as the scope of what is technologically possible becomes broader), the notion of which privacy approaches are desirable will change. Of course, no solution is perfect. For example, differential privacy is not effective if the size of the aggregate data set is insufficient to mask the contribution of specific individuals. Differential privacy also may not provide mechanisms to hide a user’s IP address from the differentially private database server. Issues like these often arise when considering a third fundamental question: what are the privacy implications of how an end-to-end system is designed? For example, once network communication between clients and servers is introduced, privacy of the network data and client-specific information like IP addresses and browser fingerprints must be considered. Access control is also important—how does a system determine whether a user is authorized to access a particular piece of sensitive data? Various point solutions for each of those challenges exist. For example, encryption provides data confidentiality, while mixnets161 can hide IP addresses by routing traffic through intermediate servers. However, integrating specific technical solutions or research ideas into a cohesive system is challenging. For example, Tor162 uses both mixnets (to hide client IP addresses) and encryption (to hide the identity and content of visited websites). However, a network observer can still confirm that a user has visited given sites by looking at the size and arrival times of network packets, even though those packets are encrypted.163 The existence of such side channels is an example of the practical complications that arise when designing privacy-preserving systems. Tor’s designers were aware of side channels involving packet size and packet arrival times. In considering the trade-off between user-perceived performance and expensive techniques for side channel mitigation—a kind of design-values choice that is common for secure systems—they chose performance. This choice was driven in part by an assumption that no realistic network attacker could possess a sufficiently large number of vantage points to run the necessary correlation analyses. At the time of the writing of this report, the number of Tor routers is, however, small enough to make it possible for a nation-state-level actor to subvert or spy upon a significant fraction of Tor traffic. Larger societal concerns add significantly to the complexity of these questions. To the extent that privacy can be conceptualized as a set of conditions on appropriate information flows between different parties,164 the circumstances that give rise to privacy considerations in everyday interaction are essentially ubiquitous. And as mentioned earlier, many computational settings where privacy concerns arise—with respect to governments, companies, employers, or families—also involve power differentials, and the desire to limit the uses of power. Understanding the real-life relationships between these parties is critical for understanding the desirable technology-mediated privacy relationships between those parties. For 159 Dwork, C. and Roth, A., 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4). 160 Ohm, P. 2009. “Broken promises of privacy: Responding to the surprising failure of anonymization.” Ucla L. Rev. 57: 1701; Sweeney, L. 1997. “Weaving Technology and Policy Together to Maintain Confidentiality.” The Journal of Law, Medicine, and Ethics. https://journals.sagepub.com/doi/10.1111/j.1748-720X.1997.tb01885.x. 161 Chaum, D. 1981. “Untraceable electronic mail, return addresses, and digital pseudonyms.” Communications of the ACM 4(2), 84–88. 162 Dingledine, R., N. Mathewson, and P.l Syverson. 2004. “Tor: The Second-Generation Onion Router.” Proceedings of USENIX Security. 163 Rimmer, V., D. Preuveneers, M. Juarez, T. Van Goethem, and W. Joosen. 2018. Automated Website Fingerprinting through Deep Learning. 10.14722/ndss.2018.23115. 164 Nissenbaum, H. 2011. A contextual approach to privacy online. Daedalus 140, no. 4: 32–48. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 68

example, how is private phone data accessed in the context of intimate partner violence?165 To what extent should employers be able to surveil their employees for performance or safety reasons?166 These questions cannot be answered from a purely technical perspective, and significant responsibility rests with companies and governments. 3.5.2 Limits of Content Moderation The World Wide Web was created to be a platform for communication and expression, and it implements a type of many-to-many interaction that is hard to achieve with other media. Relative to earlier print and broadcast media, it dramatically lowered the barriers to mass dissemination of content; on the Web, the fact that a piece of content has high production values and wide reach does not necessarily mean that the author had access to expensive mechanisms for creating it. As a result, the Web has led to an abundance of content, both in raw volume and in the heterogeneity of the participants involved in contributing to it. The complexity of the resulting content creation ecosystem has led to profound challenges for the designers of on-line platforms. A first challenge lies in determining what kinds of content should be prohibited from a platform. This brings into play classical questions about the fundamental trade-offs involved in balancing the benefits and costs of unrestricted speech—including the representation of diverse viewpoints, and the creation of a public environment where these different viewpoints can engage with and compete with each other—albeit in an environment where these decisions are being made by private operators of on-line platforms. Moreover, the problem of managing on-line content is much broader than simply the question of what should be restricted; platforms must constantly deal with the questions of where users’ attention should be directed, and which pieces of content should be amplified. The central role of user attention in these questions was identified early in the history of computing research; for example, Herb Simon wrote in 1971, “In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients.”167 Editors, publishers, and other gatekeepers have always made decisions about how to allocate attention; however, the challenge of doing so has increased in proportion to the volume of content now available for recommendation, as well as the size of the audience affected; and the concentration of on-line attention on a few platforms has led to a corresponding concentration of power in the hands of the people running those platforms. Any system design will lead to a distribution of attention that is focused on certain items, viewpoints, and perspectives at the expense of others; this is inevitable in an environment where the abundance of content dramatically outpaces the available attention. These interlocking questions of amplification and restriction lead to important problems for the computing research community. Even with agreement on general principles for the allocation of user attention, there is only limited understanding of how ranking algorithms and user interface design serve to create the social feedback loops that drive user attention.168 For content moderation, though there may be agreement that platforms are not simply pipes through which information flows, notably, unlike with 165 Freed, D., S. Havron, E. Tseng, A. Gallardo, R. Chatterjee, T. Ristenpart, and N. Dell. 2019. “Is my phone hacked?” Analyzing Clinical Computer Security Interventions with Survivors of Intimate Partner Violence. Proceedings of the ACM on Human-Computer Interaction 3, no. CSCW: 1–24. 166 Ajunwa, I., K. Crawford, and J. Schultz. 2017. Limitless worker surveillance. Calif. L. Rev. 105. 167 Simon, H.A. Designing organizations for an information-rich world. In M. Greenberger, editor, Computers, Communications, and the Public Interest, pages 37–72. Johns Hopkins Press, 1971.; Zeynep Tufekci has made the point that modern censorship takes advantage of this property, by flooding people with information of dubious quality. (Tufekci, Z. 2017. Twitter and Teargas. Yale University Press.) 168 Salganik, M.J., P. Sheridan Dodds, and D.J. Watts. 2006. “Experimental study of inequality and unpredictability in an artificial cultural market.” Science 311.5762:854–856. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 69

privacy, there is not such agreement on what kinds of information should or should not be allowed to spread. That is, there is no consensus on the values and associated goals that content moderation is to realize. Furthermore, there are significant computational limitations. Determining whether any particular information should not be spread is a context-sensitive issue: in isolation a single word or video clip may be fine (e.g., a health care setting in which an anatomical part is mentioned; a sci fi movie in which a terrorist attack happens), while in others they would not be (e.g., pornography, or a plan to launch an attack). As a result, even where there might be agreement on values and general principles on the types of content that should be restricted, the richness of language and visual media create strong limitations on the power of algorithms to identify particular instances of prohibited content.169 In addition, people labeling data do not always agree on whether a particular piece of content should be blocked, so it is difficult to get good quality control for training data. Last, languages evolve and expressions that were once acceptable may become unacceptable.170 One challenge is developing frameworks for considering the types of outcomes to aim to achieve in such environments, and how to achieve them given the strengths and limitations of content moderation and filtering algorithms. Within this broad category of challenges are particular questions concerning social feedback effects in on-line platforms that are the focus of active research sub-communities. The effect of personalized content filtering on polarization, through the creation of “filter bubbles” and the facilitation of on-line organizing is one of these central questions;171 another is the management of misinformation and its effects.172 The evolution of both of these topics illustrates the inherent challenges in connecting design choices and immediate user responses to long-range outcomes on the platform. This problem of reasoning about long-range outcomes for platform content from the interaction of users and algorithmic filtering and recommendation is the subject of a growing line of research.173 An additional issue is that it is very difficult for academic researchers to conduct open scientific research content moderation because of the large amount of data needed and the high risks to those people whose data might be shared in making such data widely available. Greater progress might be made on these topics through more robust connections between industrial and academic research. There are clear structural challenges in creating these connections, including the proprietary interests of the platforms and the privacy interests of their users. But there is progress in exploring mechanisms for these kinds of collaborations,174 and they may prove crucial for deeper research into design choices for on-line content and its longer-range societal implications. 169 Warner, W., and J. Hirschberg. 2012. “Detecting hate speech on the world wide web.” In Proceedings of the second workshop on language in social media, pp. 19–26. 170 Many people think the impressive ability of algorithms to detect spam suggests it would also be possible to detect undesirable content. The detection of spam is not, however, done by natural language processing analysis of what is said in a message, but rather by looking at range of signals (e.g., metadata on the messages, for instance that indicates the source), and on the ability to determine the ways in which money flows to spammers and shut them down. 171 Bakshy, E., S. Messing, and L.A. Adamic. 2015. “Exposure to ideologically diverse news and opinion on Facebook.” Science 348, no. 6239:1130–1132; Freelon, D., A. Marwick, and D. Kreiss. 2020. “False equivalencies: Online activism from left to right.” Science 369.6508:1197–1201; Pariser, E. 2001. The filter bubble: How the new personalized web is changing what we read and how we think. Penguin, 2011; Sunstein, C. “The daily we: Is the internet really a blessing for democracy?” Boston Review 26, no. 3: 4. 172 Wardle, C., and H. Derakhshan. 2017. “Information disorder: Toward an interdisciplinary framework for research and policy making.” Council of Europe report 27. 173 Dean, S., S. Rich, and B, Recht. 2020. “Recommendations and user agency: The reachability of collaboratively-filtered information.” In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 436–445.; Mladenov, M., E. Creager, O. Ben-Porat, K. Swersky, R. Zemel, and C. Boutilier. 2020. “Optimizing long-term social welfare in recommender systems: A constrained matching approach.” In International Conference on Machine Learning, pp. 6987–6998. PMLR. 174 King, G. and N. Persily. 2020. “A new model for industry—academic partnerships.” PS: Political Science and Politics 53, no. 4: 703–709. PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 70

PREPUBLICATION COPY – SUBJECT TO FURTHER EDITORIAL CORRECTION 71

Next: 4 Conclusions and Recommendations »
Fostering Responsible Computing Research: Foundations and Practices Get This Book
×
Buy Paperback | $40.00
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

With computing technologies increasingly woven into our society and infrastructure, it is vital for the computing research community to be able to address the ethical and societal challenges that can arise from the development of these technologies, from the erosion of personal privacy to the spread of false information.

Fostering Responsible Computing Research: Foundations and Practices presents best practices that funding agencies, academic organizations, and individual researchers can use to formulate and conduct computing research in a responsible manner. This report explores ethical issues in computing research as well as ways to promote responsible practices through education and training.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!