Page 35 Cite

Suggested Citation:"7 Plenary Session." National Academies of Sciences, Engineering, and Medicine. 2019. Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25534.

×

7

Plenary Session

TOWARD TRUSTWORTHY MACHINE LEARNING

Dawn Song, University of California, Berkeley

Dawn Song, University of California, Berkeley, explained that deep learning has advanced rapidly—for example, AlphaGo, a computer program, beat the world champion Go player, a human, and deep learning now powers everyday products. With this exponential growth in artificial intelligence (AI) and deep learning comes an increase in attacks, both in terms of scale and sophistication. These attacks are now occurring in new landscapes of the security arena, such as in the power grid and the banking system. It is important to consider the presence of attackers when thinking about machine learning, Song said. History reveals that attackers always follow in the footsteps of technology development or sometimes even lead it. The stakes are even higher with AI: as AI controls more systems, attackers will have greater incentives. And, as AI becomes more capable, the consequence of misuse by attackers will become more severe, she continued.

Song noted that attackers might try either to attack AI directly or to misuse it. When attackers attack the integrity of a system, they prevent the learning system from producing the intended or correct results and instead produce a targeted outcome, which they have designed. Attackers can also attack the confidentiality of a learning system in order to learn sensitive information about individuals. Song emphasized that better security in learning systems is needed to address these problems. When attackers misuse AI, they find vulnerabilities in other systems, target attacks, and devise attacks.

In order to prevent adversarial examples, the integrity of systems has to be protected. For example, self-driving cars need to recognize signs correctly in order to make safe decisions. If an attacker manipulates a stop sign with perturbations, thus creating an adversarial example, an image classification system can be fooled into thinking it is a speed limit sign instead, for example. Although most adversarial examples have arisen in the digital world, it is now possible to produce adversarial examples in the physical world (with physical perturbations). These real-world adversarial examples remain effective under different viewing distances, angles, and other conditions. This highlights the need to protect the integrity of a learning system so that it still generates the correct predictions or labels, even when under attack.

Page 36 Cite

Suggested Citation:"7 Plenary Session." National Academies of Sciences, Engineering, and Medicine. 2019. Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25534.

×

**FIGURE 7.1** Transferability attack. SOURCE: Dawn Song, University of California, Berkeley, presentation to the workshop, December 12, 2018.

Song noted that adversarial examples are also prevalent in deep learning systems. Most existing work in this field is in image classification tasks and when the target model is known. Song’s team is investigating adversarial examples through generative models, deep reinforcement learning, and visual question answering (VQA)/image-to-code. Her team also studies threat models—mostly white-box attacks in which the attacker knows the parameters of the neural networks, although adversarial attacks can also be effective on black-box models when the attacker does not know anything about the architecture. Her team discovered that state-of-the-art VQA models suffer from targeted adversarial attacks, with image perturbations that are typically undetectable by humans. Adversarial examples can also fool deep reinforcement learning agents (e.g., game-playing agents in Atari and the MuJoCo environment). Song emphasized that all of these examples indicate that adversarial attacks are prevalent in a variety of domains, tasks, and models.

Song explained that one way to generate adversarial examples is with generative adversarial networks. She noted that most of the examples she had discussed thus far were based on white-box attacks, in which it is assumed that the attackers know the details of the learning model (e.g., the architecture and parameters). Even when attackers do not know any details about the model, called a black-box attack, their attacks can be very powerful. Song described two types of black-box attacks. In a zero-query attack, which includes a transferability-based attack (see Figure 7.1), the attacker does not have query access to the target model and uses local information to generate a successful attack.

For the transferability-based attack, the attacker’s objective is to attack a remote model. The attacker has local access to another learning system (i.e., white box) and generates adversarial examples to this local model. These adversarial examples can then succeed in transferring to the black-box system. To make a transferability attack especially effective, an attacker can use an ensemble targeted black-box attack based on transferability: the attacker can have an ensemble of different local white-box models (e.g., using AlexNet, ResNet, VGGNet, etc.) and generate targeted adversarial examples, fooling all of these different models. If an attacker generates adversarial examples in this way, there is increased likelihood that the examples will be transferred to the remote system to create a successful targeted attack on the remote model. She called this the most effective type of zero-query black-box attack. In a query-based attack, the attacker has query access to the target model. In this case, finite-difference gradient estimation and query-reduced gradient estimation are used to generate results that have similar effectiveness to white-box attacks. Using a query interface, an attacker can gain more information and attack more effectively, using finite-difference gradient estimation.

Song said that adversarial machine learning is about learning in the presence of adversaries. This can happen at inference time, when the adversarial example fools the learning system (i.e., gives the wrong

Page 37 Cite

Suggested Citation:"7 Plenary Session." National Academies of Sciences, Engineering, and Medicine. 2019. Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25534.

×

prediction) with evasion attacks (i.e., evades malware detection, fraud detection). It can also happen at training time, when the attacker poisons the data set to fool the learning system to learn the wrong model. The attacker selectively shows the learner the training data points (even with correct labels) to fool the learning system to learn the wrong model. Defending against data poisoning is particularly challenging with crowdsourcing and insider attacks. Overall, it is very difficult to detect when a model has been poisoned.

Adversarial machine learning is particularly important for security-critical systems. Song said that more than 100 defenses have been proposed over the past 18 months, but no sufficient defense exists today. Strong, adaptive attackers can easily evade today’s defenses, and obfuscated gradients give a false sense of security. An ensemble of weak defenses does not, by default, lead to strong defenses. She explained that image classification is susceptible to adversarial examples just by the nature of the task specification, and she added that human vision is more complex than image classification.

Song’s team proposed a new defense. She said that it is possible to characterize adversarial examples based on spatial consistency information for semantic segmentation. The learning system tries to segment an image into different objects. It is still simple to fool this learning system for segmentation with adversarial examples. The defense is based on spatial consistency (i.e., the consistency of segmentation results for randomly selected patches from an image). Such spatial consistency information from benign and adversarial instances is distinguishable. They apply mean intersection over union to compare the segmentation results between patches (Xiao et al., 2018). When one adds spatial and temporal constraints, it becomes difficult for attackers to generate adversarial perturbations, so this approach works for detecting adversarial examples.

Song emphasized that security will be one of the biggest challenges in deploying AI. She said that it is imperative to think about security at the software, learning, and distributed levels. It is challenging to ensure that no software vulnerabilities (e.g., buffer overflows and access control issues) exist; attackers can exploit such vulnerabilities and take control of learning systems. At the learning level, it is crucial to evaluate systems under both normal events and adversarial events. It is important to do both security testing and regression testing.

Another challenge at the learning level is to be able to reason about complex and non-symbolic programs. Currently, although reasoning techniques exist for symbolic programs, there are no sufficient tools to reason about non-symbolic programs, such as deep learning systems. Additionally, it is important to design new architectures and approaches with stronger generalization and security guarantees. Neural program synthesis is an exciting area of work, but the domain has had a number of challenges including the fact that neural programs do not generalize well and that they provide no proof of generalization. Song’s approach to address these problems is to introduce recursion and to learn recursive neural programs. Recursion enables provable guarantees about neural programs and the ability to prove the perfect generalization of a learned recursive program via a verification procedure (i.e., explicitly testing on all possible base cases and reduction rules). This recursion approach also enables faster learning and generalization. This work revealed that neural program architecture impacts generalization and provability; recursive, modular neural architectures are easier to reason with, prove, and generalize; and designing new architectures and approaches enabling strong generalization and security properties for broader tasks is desirable and is a promising direction to explore.

Another challenge for security at the learning level is the ability to reason about how to compose components. Building large, complex systems requires compositional reasoning (i.e., each component provides abstraction; hierarchical, compositional reasoning proves properties of whole system). A question remains as to how to do abstraction and compositional reasoning for non-symbolic programs.

At the distributed level, where each agent makes local decisions, a question remains about how to make good local decisions that will lead to good global decisions.

In addition to attacking the integrity of a system, attackers can also attack its confidentiality, Song explained. Neural networks have high capacity, and attackers can exploit them to extract secrets in training data by querying learned models. For example, by simply querying a trained language model on an email data set that has users’ credit card and social security numbers, an attacker could automatically extract the original social security numbers and credit card numbers. When training deep learning models, one has to

Page 38 Cite

Suggested Citation:"7 Plenary Session." National Academies of Sciences, Engineering, and Medicine. 2019. Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies: Proceedings of a Workshop. Washington, DC: The National Academies Press. doi: 10.17226/25534.

×

be careful with privacy protection for the training data set. Song’s team proposed a solution to prevent such instances of memorization by training a differentially private neural network. In this case, the exposure is lessened and the attacker is unable to extract secrets.

Attackers can also misuse AI for large-scale automated, targeted manipulation. Thus, many questions remain about the future of machine learning systems and security: How do we better understand what security means for AI and learning systems? How do we detect when a learning system has been fooled or compromised? How do we build more resilient learning systems with stronger guarantees? How do we build privacy-preserving learning systems? How do we democratize AI? She emphasized that security will be one of the biggest challenges in deploying AI, and it requires a community effort.

A workshop participant said that the perturbations were visibly noticeable in Song’s black-box attack examples and asked if she thought black-box attacks would get more sophisticated, with perturbations as indistinguishable as those in white-box attacks. Song reiterated that there are two types of black-box attacks: zero query and query based. The zero-query attack will always be much more challenging—the generated adversarial perturbations are larger in scale than in white-box attacks. With query-based attacks, the question becomes how many queries are allowed. With a larger number of queries, it is possible to generate better adversarial examples with less noticeable perturbations. A workshop participant asked why there is emphasis on defense from attacks rather than understanding why the underlying technology is flawed in the first place. Song said that deep learning systems are not learning the right representations. Although deep learning has issues, she continued, it is an approach that is currently working for vision tasks. Meanwhile, the community is continuing to search for better solutions that can address issues more fundamentally. In response to a question about Song’s discussion of differential privacy, Song said that her recent work proposed a specific type of measurement to ascertain how much the neural network has remembered. Tom Goldstein, University of Maryland, asked if people have a moral responsibility in this field to propose a defense whenever proposing an attack. Song said that there are ethical standards about responsible disclosure when proposing attacks and added that proposing an attack without a defense is still progress.

Robust Machine Learning Algorithms and Systems for Detection and Mitigation of Adversarial Attacks and Anomalies: Proceedings of a Workshop (2019)

Chapter: 7 Plenary Session

7

Plenary Session

TOWARD TRUSTWORTHY MACHINE LEARNING

Welcome to OpenBook!

Get Email Updates