Fireside Chat—Using Artificial Intelligence to Predict the Occurrence of Sepsis
To conclude Day 1 of the workshop, Madeleine Clare Elish, senior research scientist at Google, and Michele Wucker, co-founder and chief executive officer of Gray Rhino & Company and planning committee member, engaged in a conversation about work Elish did with the Sepsis Watch program prior to joining Google.1 The conversation started with Elish explaining that sepsis, which develops when an individual’s immune system becomes overactive when fighting an infection, is not rare. By one measure, sepsis is the leading cause of death in U.S. hospitals, and the World Health Organization recognizes sepsis as a profound health problem worldwide. A key feature of sepsis is that it is treatable when caught in time, but it is notoriously difficult to detect and treat quickly. In fact, there is no one test for sepsis or even one definition. “There are risks, rubrics, and protocols, but effectively treating sepsis remains a challenge for most hospitals,” said Elish.
This, she stated, is where machine learning and AI can prove useful. Many groups have developed deep learning models trained on historical electronic health record data that will predict which patients are at high risk of developing sepsis, but most remain in the research phase. Sepsis Watch, however, has been integrated into the everyday operations of the Duke University emergency department to help improve patient care for those individuals who are at risk for developing sepsis.2 Sepsis Watch, as Elish described, involves a deep learning model that appears to be working smoothly, and clinicians feel it has improved patient care for sepsis, though the associated clinical trial was just concluding at the time of this workshop.
Wucker cited a line from the Sepsis Watch research publication3 that said, “AI interventions must always be thought of as sociotechnical systems in which social context, relationships, and power dynamics are central, not an afterthought,” and she asked Elish to explain the concept of sociotechnical systems and how that concept played out in Sepsis Watch. Elish, a cultural anthropologist by training, replied that she sees the world as complex social systems where understanding culture, values, and beliefs are essential to understanding what people are doing and why they are doing it. This view is critical from a qualitative social sciences perspective, but it is often
1 M.P. Sendak, W. Ratliff, D. Sarro, E. Alderton, J. Futoma, M. Gao, M. Nichols, M. Revoir, et al., 2020, “Real-World Integration of a Sepsis Deep Learning Technology into Routine Clinical Care: Implementation Study,” JMIR Medical Informatics 8(7):e15182, https://doi.org/10.2196/15182.
2 Additional information about Sepsis Watch is available at https://dihi.org/project/sepsiswatch and https://physicians.dukehealth.org/articles/dukes-augmented-intelligence-system-helps-prevent-sepsis-ed.
3 M.C. Elish and A.W. Watkins, 2020, “Repairing Innovation: A Study of Integrating AI in Clinical Care,” https://datasociety.net/wp-content/uploads/2020/09/Repairing-Innovation-DataSociety-20200930-1.pdf.
overlooked in technological interventions. The point she and other social scientists make in their research is that it is wrong to think about technology as neutral and separate from society. In fact, cautioned Elish, that perspective is dangerous because for a critical intervention to work, it needs to not just exist but actually work and achieve its goal. “It will fundamentally work better,” she observed, “if we take into account how technical and social systems are complexly intertwined.”
To Elish, the term sociotechnical is key because it shows how intricately and inextricably interrelated technical and social systems are. In the case of Sepsis Watch, this meant thinking not just about the deep learning model itself and the outputs of the model, but thinking about the larger system in which this model exists. That system, she added, includes the doctors and nurses who will use the model, as well as the power hierarchies, the existing organizational dynamics, and existing systemic issues or biases in health care systems generally.
Wucker shared that one insight in the paper describing Sepsis Watch that impressed her was that predictions do not exist in a vacuum, that if people respond to a prediction, the outcome will be different than when they do not respond to the prediction. She then asked Elish to talk more about that feedback loop. Elish added that when a patient arrives in the emergency department, their personal electronic health record is run through the Sepsis Watch model. If the model predicts that the individual is at high risk of developing sepsis, that patient’s information is displayed on a patient card in the Sepsis Watch iPad app. The nurse who monitors the app and regularly checks for new patient cards calls the emergency department physician caring for that patient and conveys that elevated risk to the physician over the phone. If the emergency department physician agrees that the patient requires preemptive treatment for sepsis, the patient is tracked further and monitored on the iPad app to ensure that the recommended treatment is complete on time.
Elish noted that the treating physician does not directly see the Sepsis Watch model output, nor does the model produce a pop-up notice in the electronic health record. This workflow, she said, may seem Byzantine, but it was intentionally designed this way because the clinician leading this project had tried previously to develop sepsis care advisories that included such pop-ups, and the nurses routinely ignored them because of alert fatigue. As a result, the project was designed not just to use deep learning technology to produce better risk predictions, but to pair it with a human-facilitated intervention. To Elish, this is perhaps the most important takeaway from her work on the project—while this is a sophisticated AI intervention, it is linked in a thoughtful way that considers how it integrates into the workflow, and it includes a human who is responsible for ensuring that the model output affects patient care.
When Wucker encouraged her to talk more about the feedback loop between what the project team learned from the prediction system and how they then tweaked the system to make it work better, Elish responded that a unique feature of this project is that clinicians, not computer science experts, led the project. The two clinicians who led the project, in collaboration with a data science innovation center, had been working on improving sepsis care for years and had published papers about the alarm fatigue that doomed their previous interventions. The research team, she noted, was thoughtful about involving the expertise of stakeholders beyond physicians and about how the human and technical components would work together. Nevertheless, her research found there was a large amount of what she called disruption—breakages of communication and organizational norms that this new tool introduced into the setting. Sepsis Watch, she remarked, created this disruption and it needed the expertise of nurses to do what she called “repair work” to ensure that the disruptions and breakages Sepsis Watch created were effectively repaired to make the system work again.
Elish explained that the disruption Sepsis Watch produced was on purpose in that it disrupted the usual way of doing things in order to refocus how and when patients at risk of sepsis received care. While this disruption was productive, it required addressing the power differential between doctors and nurses and the fact that the nurses and doctors were in two separate locations, which meant the nurses did not know the physicians’ rhythms and schedules. Initially, the nurses were calling at often inopportune times, but thanks to some impressive work, the nurses were able to develop effective communication strategies and understand better when the best times were to call the physicians. In this fashion, these issues were solved. This was not something the developers had thought of, she added.
Another factor the developers thought of was to design the output to simply categorize risk as high, medium, or low and not to provide an explicit diagnosis that might turn off some physicians. In this way, the physicians
could take the output into account without feeling they were being told what to do. At the same time, the nurses took it upon themselves to read the patient’s chart before talking to the physician so they could engage in a two-way conversation and be a partner rather than a threat to the physician’s authority.
The Sepsis Watch paper notes that ground truth is hard to find regarding sepsis, and Wucker encouraged Elish to talk about why ground truth is important and what to do when it cannot be quantified. Elish replied that the ground truth issue stems at least in part from the lack of a uniform definition of sepsis. The CDC, for example, has one definition, and many hospitals have their own, different definition, which means there is no ground truth against which to train and judge a machine learning model. The way the Sepsis Watch team addressed this was thorough, careful, locally contextualized, multi-stakeholder decision-making using the standard that made the most sense to them, which was the Centers for Medicare & Medicaid Services’ definition. The team modified the definition according to its own local knowledge and the local care context of the hospital’s patients.
When Christopher Barrett commented that looking for a ground truth in an application such as this is pointless, Elish agreed to a point. Setting a ground truth was necessary in terms of training and evaluating the model, so the team had to pick something. She agreed, though, that this is about different ways to be situationally aware and the role of human decision-making relative to a particular context. This is a lesson that the AI and machine learning world still needs to learn. “It is not the output of this model that is helping doctors treat sepsis better. It is the whole system that is treating and helping improve patient care,” she emphasized.
To end the conversation, Wucker encouraged Elish to talk about the role of power dynamics relative to acting on predictions. Elish responded that when she looked at the sepsis program and the idea of repair, it elevated the role of the human as being an important component of the system. The idea of a moral crumple zone, she said, arises when there is a human in the loop but not in an elevated position, which leads to underestimating the importance of the human and the situation going wrong. This is a situation that would be familiar to those who study complex systems with distributed control, something she has examined in the context of responsibility and liability associated with driverless cars. In that situation, the drivers end up in a moral crumple zone because they are charged with being responsible for the entire system, when control is actually distributed.