Read "The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum" at NAP.edu

Page 11 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

2	Machine Learning Challenges

TECHNOLOGICAL CONSIDERATIONS

Different Needs for Different Applications

Much of the excitement generated by machine learning arises from its broad applicability across different subjects or domains. This also presents a challenge for the field: while many approaches can cut across applications, each use case involves different considerations and the construction of different types of systems. The following examples describe how several research areas in biomedicine require customized machine learning techniques. Similar customization is needed in many other fields.

Medical researchers, for example, have begun to use genetics data, medical records, patient registries, activity logs, environmental data, and medical imaging to better understand all aspects of human health. In the field of genomics, modifications in gene regulation and gene function resulting from mutated segments of DNA allow researchers to identify which regions of the genome influence a particular trait associated with a particular disease. Statistical methods related to machine learning are being developed to enable researchers to design medical interventions that would improve the lives of people with particular health issues in the future.

Mobile health interventions that employ a smart phone application or wearable technology to encourage specific behavior, such as increased exercise, are another potential applica-

Page 12 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

tion area for machine learning techniques. Two possible schemes for generating notifications to patients are “pull” interventions, which rely on individuals’ requests for interventions, and “push” interventions, which employ sensors, self-reports, and computer algorithms to decide when an intervention is needed as well as what intervention might be most appropriate. The shared goal of both types of notifications is to promote healthy behavior and aid in the development of longer-term treatment policies. This is similar to the action-reward structure of reinforcement learning, with the long-term benefit of improved health serving as the reward for the action of responding to the health notifications.

In the study of neuroscience, the ability of machine learning to spot patterns and make predictions makes it a key tool in analyzing the vast quantities of neuroimaging data available today. Functional magnetic resonance imaging (fMRI) is used to track blood flow within the brain to deduce functional or physiological properties. When a patient or research subject examined using an fMRI machine is presented with an external stimulus, the responding brain area will have an increase in blood flow, which will be displayed clearly on an fMRI monitor. Classification can then be applied to a set of fMRI images to determine if patterns exist among participants within brain regions of interest. Neuroimaging applications range from diagnostic tools for neurological disorders to enhanced understanding of neural connectivity between brain regions. There are analogous opportunities for machine learning to help in many other areas of medical imaging.

While genomics, mobile health sciences, and neuroscience are typically categorized within the health domain, the challenges posed by each are domain-specific, and the machine learning techniques required for each are highly specialized. For example, the classification tools most appropriate for understanding neuroscience may not be as effective in genomics research because of substantial differences in the nature of the data, its structure and properties, and the relationships between data and outcome or decision. A similar trend can be seen in other disciplines when the machine learning applications in one subcategory may not work for a tangentially related area. For many applications, combining domain knowledge with machine learning expertise will be essential to create effective outputs.

Collaboration

The domain-specificity of the challenges at hand makes collaboration in machine learning particularly important. Furthermore, the key components of machine learning—expertise, computing power, data, and algorithms—are not concentrated in any one domain, and industry, academia, and government all play significant roles.

In the case of automated vehicles, for example, industry has resources and data that could help academics conduct better research. Government plays an important role as well through research funding and through its involvement in deploying infrastructure technology, such as traffic light sensors, that will affect the behaviors of automated vehicles.

While technical solutions can address the broader social and ethical implications of this technology, collaborations also contribute to addressing these challenges. Opportunities in the

Page 13 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

spirit of collaboration include the following:

Adding narratives to technical writing to help professionals in other disciplines better understand machine learning concepts,
Committing to producing research that can be understood across disciplines,
Raising awareness via public debate about potentially controversial machine learning issues, and
Convening debates among diverse stakeholders.

EDUCATIONAL BARRIERS

Curriculum

Companies are aggressively hiring new talent in machine learning. Although course offerings for the next generation of machine learners are expanding, building a stronger skills pipeline remains key to the health of the field:

If one hopes to be prepared for the future workforce and the real challenges associated with this level of innovation, data literacy is essential. As early as the elementary school level, students could benefit from greater encouragement to develop a science, technology, engineering, and mathematics skill set.
In many postsecondary curricula, only traditional mathematics is strongly emphasized, whereas students may benefit from a program that includes mathematics alongside computing and communication skills. Students should also gain an understanding of the ethical challenges associated with the use of data.

Curricular changes will not happen overnight, but conversations among interested stakeholders should begin and plans should be made to fund programs that could improve students’ chances of securing relevant work in the future.

Infrastructure

In addition to curriculum changes, modifications to university course materials and course content can help prepare both machine learning experts and those likely to work with machine learning systems in the future:

Technical programs will need to update the skills and knowledge with which they equip students to work in specific fields.
For those studying machine learning, an understanding of the methods of social sciences and liberal arts will become important for addressing the legal and societal challenges that the technology presents.

Page 14 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

For researchers and scholars in academia, journal and tenure guidelines often focus too narrowly on recognition and attribution in only the traditional form of published papers; it is important that innovative and applied work in machine learning is also both rewarded and shared.

SOCIETAL ISSUES

Fairness, Privacy, Consent, and Cybersecurity

Algorithmic decision-making systems may place a strain on some democratic values as machine learning enables new uses of data that challenge existing notions of fairness, privacy, and consent.

The advanced analytical capabilities offered by machine learning pose new challenges to managing privacy: in some applications, machine learning will use data containing sensitive information, while in other cases machine learning might create sensitive insights from seemingly mundane data. New privacy-preserving technologies (e.g., de-identification of data, differential privacy, homomorphic encryption) are being explored that can help lessen the risks of privacy breaches while enhancing the benefits of data sharing to society.⁵

New economic models based on data collection also have the potential to raise privacy concerns. Consumers frequently share their personal data when carrying out everyday tasks online—for example, when online shopping—and may not be aware how much data companies can collect about them based on these transactions. Such cases create trade-offs between privacy and convenience or the desire for certain services. Some researchers suggest that there should be an opt-out feature, which would be a relatively straightforward technical fix, in which the user can select that the site erases all information collected and thus will treat the customer as a new user every time that customer visits. Others propose that individuals should be able to control their personal data and “decide for themselves how to weigh the costs and benefits of the collection, use, or disclosure of their information.”⁶ In this context, machine

___________________

⁵ The Royal Society, 2017, Machine Learning: The Power and Promise of Computers That Learn by Example, https://royalsociety.org/~/media/policy/projects/machine-learning/publications/machine-learning-report.pdf.

⁶ D.J. Solove, 2013, Introduction: Privacy self-management and the consent dilemma, Harvard Law Review 126(7): 1880-1903.

Page 15 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

learning also raises questions about consent and the extent to which individuals are able to meaningfully consent to the use of their data, especially if those data are combined or processed in novel ways.

Legislative structures that seek to ensure fairness in the provision of some products and services—and balance data protection with innovation—already exist, and these current “fairness structures,” set up primarily for financial security, demonstrate points of tension. For example, the U.S. Equal Credit Opportunity Act protects applicants from discrimination (e.g., race, color, religion, national origin, sex, marital status, age) in a credit transaction. Modern credit scoring systems use machine learning to evaluate creditworthiness, and so a question arises about which factors a model can and should use to make a decision. The U.S. Fair Credit Reporting Act supports the accuracy, fairness, and privacy of consumer information included in consumer reporting agency files. Thus, individuals are granted access to the credit score, as well as to all of the factors that may have adversely affected the score. Because of this, the Fair Credit Reporting Act ensures transparency of factors that led to deriving the credit score. The EU General Data Protection Regulation, which is expected to go into effect in May 2018, protects data that expose racial or ethnic origin; political or religious beliefs; and genetic, biometric, and health data.⁷

This is not the case for all existing regulatory instruments: unlike the previously mentioned regulated scores, the unregulated Alternative Credit Score in the United States can use factors that would normally be prohibited under the Fair Credit Reporting Act because it uses those factors for marketing financial products instead of for determining creditworthiness.⁸ Unregulated data includes health data not subject to the Health Insurance Portability and Accountability Act of 1996 (HIPAA),⁹ transactional data, historic data sets, and commercial data sets.

In seeking to ensure that fairness and privacy are maintained in such analyses, researchers can apply statistical approaches to identify people for whom privacy or fair information practice principles may be violated, such as when an individual’s data are represented in long tails (i.e., a distribution with a large number of occurrences far from the central part of the distribution) or as outliers (i.e., data points that differ from most others) and are therefore easy to identify within a data set. Other approaches to managing issues relating to fairness include the following:

Fair information practice principles (i.e., best practices),
New approaches to remediation for those negatively affected, and

___________________

⁷ Information Commissioner’s Office, “Overview of the GDPR,” https://ico.org.uk/for-organisations/data-protection-reform/overview-of-the-gdpr/introduction/, accessed June 15, 2017.

⁸ Consumer Financial Protection Bureau, “CFPB Explores Impact of Alternative Data on Credit Access for Consumers Who Are Credit Invisible,” last update February 16, 2017, https://www.consumerfinance.gov/about-us/newsroom/cfpb-explores-impact-alternative-data-credit-access-consumers-who-are-credit-invisible/.

⁹ U.S. Department of Health and Human Services, Health Insurance Portability and Accountability Act of 1996, P.L. 104-191,104th Congress, https://aspe.hhs.gov/report/health-insurance-portability-and-account-ability-act-1996, accessed June 16, 2017.

Page 16 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

Use-based rules for data (i.e., new approaches to auditing outputs).

Cybersecurity issues also arise in many areas. For example, even though medical data subject to HIPAA are required to be encrypted when stored and protected in other ways, there is still a risk that sensitive information could end up in the wrong hands. Such cyberattacks and data thefts are also a concern for technology developers. All companies need to be vigilant in preparing for or preventing these situations, perhaps by drawing on the expertise of hackers.¹⁰

Trust, Transparency, and Interpretability

“Artificial Intelligence and Life in 2030” suggests that “well-deployed artificial intelligence prediction tools have the potential to provide new kinds of transparency about data and inferences, and may be applied to detect, remove, or reduce human bias, rather than reinforcing it.”¹¹

A key barrier to achieving transparency in artificial intelligence tools is the lack of a common language, as varying definitions of “transparency,” “explainability,” and “interpretability” exist across disciplines. Additionally, there are various lenses through which these concepts can be considered. For example, some view “explainability” as a technical issue, while others may view it as a legal or social issue.

Another important aspect to bear in mind when considering transparency is that it is not synonymous with trust: trust can be attained without transparency. Conversely, just because an algorithm is transparent does not mean that it is trusted or trustworthy. A user might trust an algorithm without questioning or understanding how the result was attained—just as travelers trust that it is safe to board airplanes every day, despite not knowing precisely how airplanes function. Individuals are also likely to trust certain types of medical treatment, without knowing how or why they work.

Humans and computer systems alike can be trained to provide explanations, but these explanations may be too complex to grasp without specialist knowledge or may, in some cases, be biased or untrue. For example, when a medical patient enters an examination room with a complicated set of symptoms, a medical professional may offer either a simplified explanation

___________________

¹⁰ National Academy of Sciences and the Royal Society, 2015, Cybersecurity Dilemmas: Technology, Policy, and Incentives: Summary of Discussions at the 2014 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum, https://www.nap.edu/catalog/21833/.

¹¹ Panel on the One Hundred Year Study on Artificial Intelligence, “Artificial Intelligence and Life in 2030,” Report of the 2015-2016 Study Panel, Stanford University, Stanford, Calif., September 2016, http://ai100.stanford.edu/2016-report.

Page 17 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

intended to put the patient at ease or one that would require a medical degree to understand fully. The patient may feel relief having an explanation, yet the patient may not actually have the most accurate or most clear explanation. Who performs an action or supplies a service can also play a role in the extent to which individuals trust a system.

The possibility of providing biased results or unfair outcomes—intentionally or unintentionally—presents two relevant questions: (1) To what extent is the level of interpretability influenced by the type of machine learning deployed and how can advances in research address this? (2) What type of interpretability is required in different contexts?

The human desire to provide accounts of how decisions are made prompts another important series of questions: When a computer system makes a mistake, who or what should be held accountable? Because computer systems can keep complete logs of their processing, they are in principle capable of producing much more complete and correct explanations. Should computer systems therefore be held to higher standards of explanation than humans?

An additional key factor is the level of explanation required. An explanation provided to an engineer who will debug software in an automated vehicle would be far more technical in nature than an explanation about a vehicle malfunction provided to a user. Related to this is the purpose for which an explanation is being provided. Explanation of failure cases and biases that will be used to improve the computational model is likely to require a different level of accuracy and detail than the explanations needed for legal purposes.

Another component of transparency is consistency—an understanding that similar cases will be treated similarly. Consistency is critical to enabling the users of a learning system to anticipate the future decisions of the system based on explanations of past decisions. Without consistency, it may be more difficult for users to gain trust in the system.

Ethical Challenges in Specific Applications

Ethical and legal concerns stemming from issues of fairness and bias relate to issues of trust and transparency.

One application area in which these questions are most palpable is in the criminal justice system. Scoring systems applied to predict the likelihood of repeat offending are increasingly data-driven, and some make use of machine learning.

These systems offer the hope of reducing bias in criminal sentencing, if the algorithm being used can be designed in a way that supports advanced analysis free of societal assumptions about factors such as race, gender, or socioeconomic status. However, the outputs of these systems reflect the data on which they are trained. If these data embody current societal biases or inequality, then—unless it is explicitly programmed otherwise—a machine learning system will replicate these biases.

There are different approaches to monitoring the biases of such systems, many of which are based on the interpretability or transparency of the system:

If the data used to create such models are publicly available, then the results can be tested or reproduced;

Page 18 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

Rule-based systems or other types of models may be more interpretable; and
Fairness adjustments or constraints can be built into models.

Identifying, controlling, and being transparent about bias in the system may be more manageable than seeking to eliminate it. In this sense, although interpretable models may not be as accurate as alternative approaches, being able to understand how a model functions and why it may be imperfect is likely to be helpful. Questions about how best to negotiate a tradeoff among interpretability, accuracy, and fairness are likely to endure.

The use of machine learning systems in recidivism scoring also offers a new lens through which to view questions about the concept of unique treatment versus legal precedent. The law says that people need to be treated as individuals with dignity and thus have a right to defend themselves in an effort to shape the decision being made about them. However, the legal system also relies on precedent, which encourages the law to be applied to new cases in a similar way to its application in previous cases.

Scoring via machine learning systems is also used in medicine. For example, there is a scoring system to predict mortality for patients with some illnesses and to recommend medical interventions accordingly.¹² Such systems may improve patient outcomes, but they still require careful management. Some scores will fall within the guidelines of existing governance mechanisms—for example, the Clinical Frailty Scale,¹³ which is protected by HIPAA in the United States—which means patients will continue to be able to see and object to such scores.

Living Alongside Machine Learning

In addition to specific challenges arising from the governance of data used in machine learning or the capabilities of current machine learning systems, and the benefits to be gained from such systems, there is a broader suite of questions raised by the increasing pervasiveness of machine learning systems. At a fundamental level, these questions ask how society will change as people live and work with automated systems, as well as with the new forms of human–computer interaction that could follow.

Concerns about automation and the workforce are already present in debates about machine learning, rooted in fears about humans being replaced by or becoming de-

___________________

¹² One such example is the use of machine learning to predict outcomes for patients with pneumonia.

¹³ The Clinical Frailty Scale is a means of assessing fitness levels of individuals. The scores range from 1 (very fit) to 9 (terminally ill), with higher numbers indicating increased frailness.

Page 19 Cite

Suggested Citation:"2 Machine Learning Challenges." National Academy of Sciences. 2018. The Frontiers of Machine Learning: 2017 Raymond and Beverly Sackler U.S.-U.K. Scientific Forum. Washington, DC: The National Academies Press. doi: 10.17226/25021.

×

pendent upon machines. In some cases, fears arise about how machine learning could hurt individuals or restrict human experiences by limiting the choices of products, services, and activities available or by reducing interactions between human beings.¹⁴

Most people are happy to have processes automated that are dirty, dull, or dangerous, but what about other tasks that fall outside these categories? Warfare is an example in which there are continuing ethical debates about the use of algorithms that may make a decision about taking human lives. Results from public dialogues on machine learning in the United Kingdom indicate that there is a general preference to have a “human in the loop” for decisions that affect people in a personal or sensitive way—for example, in healthcare.

There are also broader questions about how the benefits of machine learning can be shared across society and how society can capitalize on the opportunities these technologies present, whether in healthcare, transportation, or education. In seeking to address these questions, there may be lessons from previous technological advances that have transformed the ways people live and work through automation. During the Industrial Revolution, individuals had to adapt to new ways of communicating, traveling, and working. Will humans today be able to adapt similarly to the changes to work and other aspects of life resulting from advances in machine learning? Questions about how people live alongside machine learning will persist, and continuing dialogue will be important in negotiating them.

___________________

¹⁴ The Royal Society, 2017, Machine Learning: The Power and Promise of Computers That Learn by Example.