National Academies Press: OpenBook

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief (2023)

Chapter: Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
images Proceedings of a Workshop—in Brief

Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data

Proceedings of a Workshop—in Brief


Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023.

The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. Participants explored real and perceived vulnerabilities associated with research involving advanced data analytics and biological data, discussed the potential ways AI research and applications could be exploited to cause harm, and identified existing security practices along with additional measures that could further enhance security.

As noted by Nathan Price (Thorne HealthTech and workshop co-chair) in opening remarks, AI developments in life sciences research may help address challenging questions in biology. Promoting responsible innovation, developing best practices, and balancing the benefits and risks of these technologies may be aided by awareness of the technological opportunities and limitations, understanding of the social context of their deployment, and effective cross-disciplinary and cross-border partnerships.

The workshops consisted largely of small-group discussions designed to elicit perspectives and ideas from participants from the United States, South Asia, and Southeast Asia. Five speakers set the stage for these discussions by highlighting examples of emerging applications of AI in biology and biometrics, pointing to key issues and questions raised by these technologies, and identifying some of the practices that can reduce the associated security risks. Following these presentations, workshop participants divided into small groups for guided discussions.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

This Proceedings of a Workshop—in Brief provides the rapporteurs’ high-level summary of the workshop series. Comments that are not attributed to individual speakers reflect the issues raised during group discussions, while comments attributed to speakers stem from plenary presentations. This proceedings highlights potential opportunities for action but they should not be viewed as consensus conclusions or recommendations of the National Academies.

EMERGING APPLICATIONS AND RISKS

Workshop attendees examined emerging applications of AI and other advanced data analytics approaches involving biological data in three main areas: life sciences research, biomedicine, and biometrics. Speakers offered examples of beneficial uses of AI in these contexts and identified some of the associated risks. Participants delved deeper into each area through a series of breakout group discussions based on hypothetical scenarios.

AI for Life Sciences Research

Researchers are applying AI and machine learning (ML) technologies to biological data sets to explore a variety of research questions in the life sciences. Sean Ekins (Collaboration Pharmaceuticals) described how his company’s experiments with generative AI have demonstrated the promise of these approaches for drug discovery while also bringing attention to the risk that they could be used to design new chemical weapons.

Collaboration Pharmaceuticals uses a suite of ML and generative design tools to identify and develop new molecules to serve as therapeutic candidates. The company’s scientists use available data and AI tools to identify or design molecules with biological activity, model their characteristics, and generate predictions, then perform experiments and curate the resulting data for use in subsequent cycles of design and experimentation.

Recently, Ekins and colleagues tested whether the generative software could be exploited to design toxic molecules. Using publicly-available databases, and in less than 24 hours, this experiment generated designs for more than 40,000 molecules with potentially toxic effects; a small subset had characteristics similar to VX, an extremely toxic nerve gas.1 The results generated alarm among communities concerned with novel biochemical weapons development and oversight, including scientists and arms control experts.2 While developing chemical weapons was never a goal of this endeavor and no chemicals were ever actually synthesized, the demonstration was a forewarning to the field, especially as more generative AI and AI-driven platforms for molecule design are used with large datasets.3

To prevent the misuse of generative AI for creating chemical threats, Ekins and colleagues shared suggestions for recognizing and disclosing potential dual uses; increasing awareness and training among the research community; incorporating guidance from experts and established frameworks such as the Hague Ethical Guidelines;4 creating regulations to limit access to tools; and implementing practices to control how models are used such as keeping human(s) in the loop, using federated learning, employing waitlists to restrict access, and using encrypted data.5 He added that these measures are becoming more urgent as the technology advances and becomes ever more accessible.

In response to a question about opportunities for responsible data sharing, Ekins noted that ensuring data integrity is critical and added that creating synthetic data for AI training purposes could provide one avenue to advance AI technologies while avoiding the complexities of consent or data anonymization.

Scenario-based Discussion

Two breakout groups explored the use of AI and ML-based approaches to advance scientific discovery in the life sciences, potential harmful uses, and real and perceived security concerns. To frame their discussions,

__________________

1 Urbina, F., F. Lentzos, C. Invernizzi, and S. Ekins. 2023. AI in drug discovery: A wake-up call. Drug Discovery Today 28(1):103410.

2 Urbina, F., F. Lentzos, C. Invernizzi, and S. Ekins. 2022. Dual use of artificial intelligence-powered drug discovery. Nature Machine Intelligence 4(3):189–191.

3 Urbina, F., F. Lentzos, C. Invernizzi, and S. Ekins. 2022. A teachable moment for dual-use. Nature Machine Intelligence 4(7):607.

4https://www.opcw.org/hague-ethical-guidelines (accessed April 7, 2023).

5 Urbina, F., F. Lentzos, C. Invernizzi, and S. Ekins. 2023. Preventing AI from creating biochemical threats. Journal of Chemical Information and Modeling 63(3):691–694.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

participants focused on the following hypothetical scenario as an initial starting point:

Participants discussed a wide range of promising and beneficial applications for this area of emerging research. In particular, using AI-based approaches to integrate complex biological data sets could significantly advance understanding of human health and disease through improved predictive modeling of biological systems and disease causality. This integration may help advance research in precision medicine, environmental health, accelerated drug discovery, predictive diagnostic imaging, and improved population and public health. In the bioengineering space, current and emerging research areas include enhanced protein design and genetic engineering; drug design and development; and synthetic biology. New developments in AI can advance multi-omics data (e.g., genomics, transcriptomics, proteomics, exposomics) integration and analysis.

Such research could lead to many benefits for people and the environment. For human health, these approaches could enable improved and quicker diagnosis, more accurate prediction, and tailored therapeutics with fewer side effects; early detection and prevention of pandemics; and accelerated vaccine development. These benefits in turn would help people live longer, live healthier lives and potentially reduce health care costs. Applying these approaches to study and model ecosystems and environmental dynamics, including through genomic surveillance of the natural world, could help improve understanding and better inform approaches to environmental remediation and climate change mitigation.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

While the groups noted many benefits of the research, participants also discussed a number of potential opportunities for misuse and exploitation, including anticipating future and hypothetical possibilities that are not yet feasible given current developments. The increasing availability of biological data and leveraging of AI and advanced data analytics poses privacy and security risks for both individuals and systems with increased vulnerability to cyberattacks. The exploitation of highly sensitive biological data, such as genomics and genetics, which are at risk for re-identification, could lead to genetic discrimination, population selection through genetic screenings, and misinformation. While such a scenario is not yet possible, future developments in AI and complex biological data could be combined to lead to the design of pathogens and toxins that target populations based on specific genomic information. Other biosecurity risks include AI-assisted design of biochemical weapons (e.g., viruses and other pathogens, biological toxins, chemical agents), as highlighted by Ekins.

Even in the absence of malicious intent, there may be unintended or unanticipated consequences. For example, AI-enabled experiments in synthetic biology could lead to unanticipated repercussions in the environment. AI and ML-based technologies can also be leveraged as a vehicle for fraud or misinformation, such as through poisoned AI training sets or deepfakes.

To begin to address such risks, several participants said that it is important to address ethical and governance questions such as who owns and makes decisions about data and resource allocation and oversight for cybersecurity

AI for Biomedicine

Su-In Lee (University of Washington) reviewed recent developments in the application of AI in biomedical research. A major barrier to leveraging AI in biomedicine, and in particular, to using AI to inform clinical decision-making, has been that many AI systems operate as a “black box,” where it is unknown how decisions or predictions are derived making it difficult for clinicians to trust results. Lee highlighted how explainable AI, a framework that allows users to understand and interpret results, can bring the full potential of AI to fruition from basic biology to bedside care.

Lee suggested that key benefit of AI is that it can help make sense of large-scale biomedical data and determine what single features contributes to a particular outcome. Explainable AI can make complex AI modeling more accurate, reliable, interpretable, and trustworthy by disclosing why a certain prediction was made.

Lee offered several examples of how explainable AI is being applied to advance biomedical research and practice. Explainable AI can be used to audit existing AI models, for example catching flaws that cause errors in COVID-19 detection and in smartphone applications used to screen for skin cancer.6 Researchers have also applied explainable AI to overcome challenges and identify promising leads in understanding and treatment of Alzheimer’s disease by improving overall understanding of the molecular basis of the disease;7 Finally, researchers have used explainable AI to aid cancer therapy design by combining the features of hundreds of known drugs with an individual patient’s gene expression data to determine the best combination therapy.8, 9

Scenario-based Discussion

Two groups focused on the potential applications, benefits, harms, and ethical considerations of AI in biomedical research data and healthcare, using the following scenario as a starting point:

__________________

6 DeGrave, A. J., J. Janizek, and S. I. Lee. 2021. AI for radiographic COVID19 detection selects shortcuts over signal. Nature Machine Intelligence 3(7):610–619.

7 Beebe-Wang, N., S. Celik, E. Weinberger, P. Sturmfels, P. L. De Jager, S. Mostafavi, and S. I. Lee. 2021. Unified AI framework to uncover deep interrelationships between gene expression and Alzheimer’s disease neuropathologies. Nature Communications 12(1):5369.

8 Lee, S. I., S. Celik, B. A. Logsdon, S. M. Lundberg, T. J. Martins, V. G. Oehler, E. H. Estey, C. P. Miller, S. Chien, J. Dai, A. Saxena, C. A. Blau, and P. S. Becker. 2018. A machine learning approach to integrate big data for precision medicine in acute myeloid leukemia. Nature Communications 9(1):42.

9 Janizek, J. In press. Nature Biomedical Engineering

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

Participants considered many emerging areas of research with promising applications of AI in biomedicine and health. Examples include using AI to improve analysis of clinical diagnostic data and biological data to improve prediction and prognostic capabilities for holistically understanding overall disease risk and likely treatment outcomes.. In addition, AI based technologies combined with wearables and digital apps could be used for tracking and modulating behavior for improving health outcomes, tracking health status, and treatment adherence. At the population level, digital epidemiology is another emerging field of research to improve spatial modeling of infectious diseases.

Such applications could help to improve population health outcomes and mitigate future health threats; reduce the burden on public health care resources; and optimize supply chains, operations, and resources for high-need areas.

The use of AI in biomedical contexts may also pose security risks and ethical concerns. Data theft or privacy breaches are risks that may lead to exploitation of individual patient and healthcare data. Theoretically, unauthorized access to large volumes of patient data may allow attackers to identify vulnerabilities across a population. AI models are also at risk for being inaccurate, inappropriate, or biased, whether intentionally or unintentionally, and can result in misdiagnosis and errors in medical treatments. Ensuring informed consent is a key ethical issue; since future uses of data are often not fully understood at the time when that data is collected, it can be difficult to ensure truly informed consent for the use of personal genomic and health data.

To address these issues, many participants suggested that researchers could prioritize fairness, bias, explainability, and integration into “real-life” environments when designing and using AI-based approaches in biomedical contexts. It is also important for a wide range of stakeholders to better understand the technology and its limitations, pursue on-pace legislation and policy solutions, and implement early-stage safeguards to understand and protect against serious risks.

AI for Biometrics

Arun Ross (Michigan State University) discussed emerging advances and considerations in the automated recognition of individuals based on biometrics, which are facial, biological and behavioral traits (e.g., fingerprints, irises, voices, vasculature patterns, and gait). Although biometric recognition technologies are still vulnerable to errors, rapid advances in this area have resulted in a decreasing rate of false negatives in identification. This includes emerging work in the incorporation of convolutional neural networks, a deep learning algorithm to analyze images, into facial recognition systems.10 Ross noted that as facial and biometric technologies grow in accuracy and their applications expand into new areas of people’s daily lives, it is important to consider the opportunities, limitations, and security risks associated with their use. This includes applications in health such as utilizing facial recognition in combination with genetics to facilitate diagnosis of certain medical conditions.

Ross suggested that in order to use biometrics to determine someone’s identity, it may be necessary to have a correctly labeled reference image or data point. However, even without such a reference point, researchers have shown that a single biometric datum, such as a face, gait, or iris, can still be used to deduce certain attributes like age, gender, ethnicity, and health status, as well as information about the technology used to record the artifact or the environmental conditions where it was recorded.11 Often, enough data

__________________

10 Grother, P., M. Ngan, and K. Hanaoka. 2023. Face Recognition Vendor Test (FRVT) part 2: Identification. National Institute of Standards and Technology Interagency Report 8271.

11 Dantcheva, A., P. Elia, and A. Ross. 2016. What else does your biometric data reveal? A survey on soft biometrics. IEEE Transactions on Information Forensics and Security 11(3):441-467.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

can be extracted to develop a detailed profile of an individual, if not their full identity. Researchers have also demonstrated that people can be identified with relative ease using aggregated data from online social networks.12 Such demonstrations raise important ethical questions since people who produce and share images and other data sets are often unaware that doing so can make it easier to automatically glean—and potentially misuse—additional pieces of information from them.

Biometric technologies can be used to identify individuals for beneficial or malicious purposes. Privacy and consent are primary concerns.13 It is often unclear whether a given AI program was trained with proper consent and citation protocols.14

Ross noted a few strategies and security practices to reduce and manage and risks. Privacy enhancing technologies designed to protect individual identities include: controllable privacy, where only certain traits which the user has consented can be extracted from data sets; homomorphic encryption, which is a type of data encryption; and using semi-adversarial networks to manipulate facial images to preserve biometric utility but prevent extraction of additional attributes.15

When asked about the extent to which AI technologies can be employed by unskilled users, Ross noted that these technologies are indeed becoming more accessible to people without specialized knowledge and training, underscoring the urgent need for education and enforcement mechanisms to support responsible use.

Scenario-based discussion

Two groups focused on the opportunities, benefits, risks, and ethical considerations associated with facial and biometric technologies using the following scenario as an initial point of discussion:

A few participants noted the difference between facial recognition (FR), which they defined as the attempt to recognize an individual, and facial analytics, which is the attempt to derive attributes of an individual, such as age, gender, or health status, from an image. FR can be a highly convenient and seamless way to verify a person’s identity, such as to unlock a smartphone, access an application, or enter a facility. It can also be used to quickly identify people with certain traits, such as medical conditions or injuries.

An area of research with promising applications combines facial recognition technology with biological data, such as genomics, to assist in diagnosis of certain metabolic syndromes, rare genetic disorders, and facial neuromuscular diseases FR can also be used to detect early signs of medical conditions such as a stroke by analyzing voice patterns and facial movements. AI-based facial recognition can also be used to support public health surveillance through digital contact tracing and understanding population behavior.

FR technologies are being employed or considered for a variety of applications in emergency response and law enforcement. For example, they could help to identify people in distress, recover missing persons, or locate people suspected of committing crimes. AI-based biometric technologies could even be used for “DNA phenotyping,” in which a person’s DNA is used to reconstruct their facial appearance.

Some participants also discussed the potential to exploit FR in combination with other biometrics data. For example, a data breach may allow actors to obtain and link the biological data of a targeted individual or population. Impersonation attacks have a range of consequences from causing inconveniences, to significant disruptions, to holding data hostage or for

__________________

12 Acquisti, A., R. Gross, and F. Stutzman. 2014. Face recognition and privacy in the age of augmented reality. Journal of Privacy and Confidentiality 6(2):1.

13 Meden, B., P. Rot, P. Terhörst, N. Damer, A. Kuijper, W. J. Scheirer, A. Ross, P. Peer, and V. Štruc. 2021. Privacy-enhancing face biometrics: A comprehensive survey. IEEE Transactions on Information Forensics and Security 16:4147-4183.

14 Murgia, M. 2019. Microsoft quietly deletes largest public face recognition data set. The Financial Times https://www.ft.com/content/7d3e0d6a-87a0-11e9-a028-86cea8523dc2 (accessed March 3, 2023).

15 Mirjalili, V., S. Raschka, A. Namboodiri, and A. Ross. 2018. Semi-adversarial networks: Convolutional autoencoders for imparting privacy to face images. arXiv 1712.00321.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

ransom. Exploiting FR in identification applications can lead to unauthorized entry to a secure system or facility, or, conversely, lock people out of systems or facilities to which they should have access. Biased data sets and algorithms may lead to discrimination in various applications in healthcare, surveillance, and law enforcement. Even without a malicious actor, changes in facial appearance, such as from an injury, could lead to false rejections and inadvertently lock people out of systems, with potentially serious consequences in emergency situations.

As these technologies find their way into more and more applications, several participants said it will be important to understand the populations of people for whom facial recognition is more or less accurate, address bias in training data, ensure informed consent, and guard against government overreach. Underlying these concerns is the risk of “function creep”, in which a technology developed for one purpose gradually takes on other uses that go beyond the original intent, potentially raising new issues with regard to consent, ethics, and security.

Given that many of these security risks and ethical issues could be challenging to mitigate once a technology is in use, many participants stressed the importance of addressing them before deployment, in the research and development stage.

SECURITY CONSIDERATIONS AND PRACTICES

Anja Kaspersen (Carnegie Council for Ethics in International Affairs) and Kwok-Yan Lam (Nanyang Technological University) set the stage for a deeper discussion of the security risks at the confluence of AI and biological data and outlined some of the frameworks and mechanisms available to address those risks. Expanding on these talks, participants engaged in a series of guided discussions focusing on security issues and practices.

The National Security Context

Kaspersen discussed the nuances of national security implications of AI. Like other technologies with dual use potential, AI creates tensions in the relationship between academic and commercial research and use of technologies in national security. A unique attribute of AI in this respect is that AI innovation has been driven primarily in the commercial sphere. This feature creates a situation in which technologies are rapidly advancing outside of the traditional oversight structures that have helped guide the responsible development and the use of other dual-use technologies.

Kaspersen noted that AI technologies applications involving biological data in particular are growing more ubiquitous in society, therefore ensuring there are appropriate safeguards is important. For instance, popular home DNA test kits can advance genetic research and offer personalized ancestry or health data, but they also open new privacy and security risks with little transparency about these risks. AI in combination with biological data could also aid in the creation of new biological weapons. Kaspersen offered that biological warfare is not a new concept, but techniques like AI and gene editing create points of convergence that provide new pathways for designing new pathogens or other biological weapons, including ones that might target particular groups according to their biological traits. AI-enabled bioweapons could be pursued by state actors or by nonstate actors and privately funded research.

Against this backdrop, Kaspersen expressed her hope that the workshop discussions could help to demystify some of the challenges and risks in the application of AI in scientific research and could help to clear a pathway for creating appropriate oversight for responsible and reliable AI. To this end, one option is for industry to collaborate on rules to ensure responsible and reliable use of AI-based biotechnologies. Local governments could be a part of the discussions as well; their perspectives are especially important in the context of biometrics due to the increasing adoption of these technologies for law enforcement purposes. Kaspersen emphasized that open conversations and regulations are needed and can effectively reduce harmful consequences.

Given the seriousness of the potential threats, Kaspersen stressed that people have a collective responsibility to balance incentives, investments, benefits, and risks and to ensure fully informed consent from those who share their biological data. For further exploration of

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

these issues, she highlighted Filippa Lentzos’ work for additional insights on the public safety and national security risks associated with lethal pathogen research and Elina Noor’s writings on how the challenges of AI, data security, and biosafety are being approached in Southeast Asia.16,17

Security Risks and Approaches

Kwok-Yan Lam discussed security risks involved in the application of AI to biological data and some of the approaches being used to address them in Singapore. ML training involves massive amounts of data, which in biological contexts is often highly sensitive and requires appropriate consent and cyber-protection. Working with this data over a public communications network or interacting with unknown parties can lead to security risks, especially if data privacy and encryption requirements are not prioritized. Data with financial implications, such as health data that could influence health insurance eligibility and rates, is perhaps particularly vulnerable to breaches and misuse.

Lam noted that algorithmic bias poses another security risk by affecting the accuracy and relevance of AI systems. Data and results can be intentionally exploited or manipulated to cause adverse consequences. For example, attackers can deliberately interfere with AI systems by “poisoning” the data to bias the results toward, or away from, a particular outcome. Researchers have demonstrated how gradually introducing incorrect images into an image analysis AI system can ultimately cause the system to identify images incorrectly18, an approach that could be used to trick facial-recognition based biometric security systems.

Lam highlighted several practices and regulatory approaches that can be employed to counter these risks. He said that strict regulations, strong cybersecurity protocols, and transparency in decision-making are important to protect against data privacy breaches, deliberate data poisoning, and algorithmic or personal biases. Established and emerging approaches in cybersecurity, privacy-enhancing technology, distributed ledgers, misinformation and fraud detection, and trusted and decentralized identities can help to bolster security and reduce risks. In addition, Lam said that explainable AI represents an important improvement over traditional AI tools. While traditional AI techniques can be suitable for providing useful insights and decision support, he said, explainable AI is stronger due to its ability to uncover why a model predicts outcomes and enables better decision-making.

Lam described how Singapore’s “Smart Nation” initiative19 has leveraged various regulatory approaches and technologies with the aim of building a safer digital economy, government, and society. Since there is no single practice that can create total security, he said that it is useful to employ a combination of techniques to minimize risk and maximize trust.

Considerations for AI Algorithm Development

As AI tools are increasingly used to analyze biological, biometrical, and multi-omics data, the diversity and volume of data required to train these algorithms create unique challenges. In group discussions, participants examined considerations for training data and data security during AI algorithm development.

The quality and representativeness of training data areimportant factors. For any AI system, it is important to consider whether there is access to enough training data to be representative, identify sources of possible biases in available data, and ensure access to sufficient metadata. The lack of shared data or benchmarks for large data sets as well as limited resources and standards throughout a data lifecycle management are important limitations. One challenge of balancing these limitations is to encourage transparency, and at the same time, to guard against reidentification of anonymized data. Several participants also underscored the importance of generalizable prediction models and data inclusion/exclusion criteria, approaches to integrate human knowledge, and strategies to ensure truly informed consent.

__________________

16 Lentzos, F., G. D. Koblentz, and J. Rodgers. 2022. The urgent need for an overhaul of global biorisk management. CTC Sentinel 15(4):23-29.

17https://asiasociety.org/policy-institute/raising-standards-data-ai-southeast-asia (accessed April 7, 2023).

18 Finlayson, S., Bowers, J., Ito, J., Zittrain, L., Beam, A., and Kohane, I. Adversarial attacks on medical machine learning. Science 363,1287-1289(2019).DOI:10.1126/science.aaw4399

19https://www.smartnation.gov.sg/ (accessed April 7, 2023).

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

Looking forward, some participants suggested that researchers could explore the use of explainable AI to perform data audits and detect heterogeneity, flaws, biases, and confounding data. Researchers could also craft a federated learning approach to encourage data sharing; build central collections of training data, including synthetic data, which avoids consent and anonymization concerns; re-weight or dynamically fix non-diverse data; and adapt existing ethical frameworks to guide responsible AI development, some participants suggested.

Considerations for Informing Security Practices

To close the workshop series, participants gathered in small groups for focused discussions of opportunities to improve security in the use of AI with biological data in research and applications. These discussions explored existing mechanisms for securing data and systems, additional opportunities to further enhance security in this space, resources to implement security practices, and considerations for translating these security practices into different institutional contexts.

Biological data and AI-enabled capabilities are already widely available and employed for a variety of uses. Many participants stressed that to collect, encrypt, transfer, hold, and interact with anonymized data securely involves resources, incentives, infrastructure, and regulations that span sectors and countries. It may also include a deeper understanding—among stakeholders—of the myriad ways in which data use or aggregation might enable or be subject to misuse.

Examples of Current and Needed Cybersecurity and Data Security Practices

Participants discussed multiple data security measures that currently exist including various data encryption methods, data masking and anonymization tools, and policies and systems that limits access to data and information. However, several participants noted that enhanced cybersecurity measures, encryption systems, and logging systems to track provenance are also necessary as technologies are constantly and rapidly evolving. In addition, developing AI-based tools for the express purpose of detection and recognition of patterns of possible misuse and exploitation is an area of ongoing research. Novel approaches to data training sets, such as the use of synthetic data instead of real data, is another avenue that may benefit from further exploration as a data security practice.

Several participants suggested that some of these issues could be addressed through the Fair Information Practice Principles (a framework for privacy policy and information practices)20; enhanced digital literacy; and policies to respond appropriately when security breaches occur. For teams and institutions involved in AI and ML work, some participants proposed establishing practices for supporting data security, “locking” proprietary or confidential data, and creating accountable data stewardship positions. Several participants also noted that diversity is a key factor in training data to avoid bias, and diversity of members of research teams is important to support broad awareness of ethical and security risks and generate innovative solutions.

Many countries already have relevant policies or some level of regulatory oversight regarding health and biological data to varying degrees. Examples include the Health Insurance Portability and Accountability Act (HIPPA) in the United States21, the General Data Protection Regulation (GDPR) in the European Union22, the Personal Data Protection Act (PDPA) in Singapore23, the Personal Data Protection Act (PDPA) in Malaysia24, and Personal Data Protection (PDP) in Indonesia25. However, some participants highlighted the benefits of policies that apply across multinational companies, governments, and academia. Given that technologies will inevitably cross international borders, it is important to determine who creates these policies and how they are implemented and enforced, especially in middle- and low-income countries that have different resources and risk exposures than wealthier countries.

Further enhancement of data security likely involves detailed discussions to balance positive uses with the

__________________

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

risks of both intentional and unintentional harms. Both external policies and internal controls are important. It is important for policies for data access, sharing, and storage to cover a wide range of security risks, including human rights violations. Some participants posited that controlling allowable use may be easier than controlling access. To this end, it is important to define the scope of use through fully developed and detailed legislation, policies, and/or institutional oversight; account for unanticipated future use scenarios to the extent possible, especially with regard to potential impacts on marginalized or oppressed groups; and keep up with the pace of innovation and commercial development.

Several participants highlighted particular considerations in the context of biometric applications. To prevent, detect, or deter the malicious use of biometric dataand protect against unintended uses enabled by third-party software or outsourcing, many participants suggested that data access could be tightly controlled and underscored the importance of including ethics as an integral part of algorithm development and training. In particular, some participants suggested that image data used in biometric technologies could be homomorphically encrypted to disallow identification, masked with noise and other techniques, and attributed to enable provenance tracking, and protection of provenance metadata. In addition to strong cybersecurity and network security protections, including intrusion detection, some participants said it is important that encrypted data should not be decrypted, even for researchers. Logs to track provenance and enable secure access, guided by the Fair Information Practice Principles, can help to ensure accountability, and AI systems themselves could also potentially be employed to flag suspicious activity or misuse.

Since the implementation of security protections ultimately depends on people, several participants said that it is vital for researchers working with AI and biological data to be adequately trained and educated to facilitate awareness of relevant security considerations and the dual use potential of their technologies. Finally, several participants stressed the need to bolster frameworks for ensuring informed consent and accounting for potential downstream-use risks.

Implementing security practices in different institutional contexts

Many participants discussed resources to implement, incentivize, and sustain effective security practices. To aid in the adoption of security practices and facilitate responsible innovation, institutions would likely benefit from strong security leadership; clear policies and legal frameworks for supporting AI safety, data sharing, and risk assessment; and adequate support, collaboration, legislation, and funding on the part of governments, some participants said.

Computer scientists, life sciences researchers, and engineers who work on these technologies come from many different nations and backgrounds. Many participants stressed that adequate training on standardized workflows, protocols, and best practices is important to ensure all those who are involved in the research pipeline for AI-enabled systems and applications understand and are equipped to mitigate the potential harms that can result if ethics, security, and privacy are not prioritized throughout the entire design process. Several participants’ discussions noted that cybersecurity practices and resources differed across countries (e.g., between high-, middle-, and lower-income countries), and sectors (e.g., industry, academia, government), and with a push towards digitalization, standards and practices may be needed.

Finally, it is important to acknowledge that industry, academia, and governments have very different roles, threats, security measures, powers, cultures, and spheres of influence, which impacts implementation of security practices worldwide. Nevertheless, several participants suggested that having guidance from a well-informed international coalition that performs a careful study of the potential challenges, downstream effects, and solutions could help to encourage broader adoption of effective security practices. Discussions among experts and researchers across disciplines are important to address these gaps in ways that promote and support innovation while also protecting against exploitation. Given the real and hypothetical security risks discussed, several participants mentioned the importance of researchers—especially the next generation—working to address existing issues and forecasting future risks.

Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×

DISCLAIMER This Proceedings of a Workshop—in Brief was prepared by Anne Johnson, Lyly Luhachack, Nancy Connell, and Carmen Shaw as a factual summary of what occurred at the workshop. The statements made are those of the rapporteur(s) or individual workshop participants and do not necessarily represent the views of all workshop participants; the planning committee; or the National Academies of Sciences, Engineering, and Medicine.

WORKSHOP ORGANIZING COMMITTEE This workshop was organized by the following experts: Devika Madalli (Co-Chair), Indian Statistical Institute; Nathan Price (Co-Chair), Thorne Health Tech; Elvan Ceyhan, Auburn University; Yoke-Fun Chan, University of Malaya; Nikhil Dave, Arizona State University, Lydia E. Kavraki, Rice University; Georg Langs, University of Vienna; Kwok-Yan Lam, Nanyang Technological University; Suryesh Namdeo, Indian Institute of Science; Vanny Narita, ASEAN Secretariat; Mei Ngan, National Institute of Standards and Technology; Kok Keng Tee, University of Malaya; Kathleen M. Vogel, Arizona State University.

REVIEWERS To ensure that it meets institutional standards for quality and objectivity, this Proceedings of a Workshop—in Brief was reviewed Devika Madalli, Indian Statistical Institute, and Pragya Chaube, Centre for Policy Research. We also thank staff member Brittany Segundo for reading and providing helpful comments on this manuscript.

SPONSOR This workshop was funded by a grant from the United States Department of State. The opinions, findings and conclusions stated herein do not necessarily reflect those of the United States Department of State.

For additional information regarding the workshop, visit https://www.nationalacademies.org/our-work/engaging-scientists-to-prevent-harmful-exploitation-of-advanced-data-analytics-and-biological-data-a-workshop-series

SUGGESTED CITATION National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. https://doi.org/10.17226/27093.

Division on Earth and Life Studies

Copyright 2023 by the National Academy of Sciences. All rights reserved.

images
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 1
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 2
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 3
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 4
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 5
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 6
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 7
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 8
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 9
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 10
Suggested Citation:"Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop - in Brief." National Academies of Sciences, Engineering, and Medicine. 2023. Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief. Washington, DC: The National Academies Press. doi: 10.17226/27093.
×
Page 11
Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief Get This Book
×
 Engaging Scientists to Prevent Harmful Exploitation of Advanced Data Analytics and Biological Data: Proceedings of a Workshop—in Brief
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

Artificial intelligence (AI), facial recognition, and other advanced computational and statistical techniques are accelerating advancements in the life sciences and many other fields. However, these technologies and the scientific developments they enable also hold the potential for unintended harm and malicious exploitation. To examine these issues and to discuss practices for anticipating and preventing the misuse of advanced data analytics and biological data in a global context, the National Academies of Sciences, Engineering, and Medicine convened two virtual workshops on November 15, 2022, and February 9, 2023. The workshops engaged scientists from the United States, South Asia, and Southeast Asia through a series of presentations and scenario-based exercises to explore emerging applications and areas of research, their potential benefits, and the ethical issues and security risks that arise when AI applications are used in conjunction with biological data. This publication highlights the presentations and discussions of the workshops.

READ FREE ONLINE

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!