Skip to main content

Currently Skimming:

3 Opportunities and Challenges for Big Data and Analytics
Pages 7-32

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 7...
... Michael Edelstein, a research fellow at the Chatham House Centre on Global Health Security, spoke about the potential for digital disease detection, also known as digital epidemiology, to augment traditional "shoe leather" epidemiology and increase the speed at which disease outbreaks are spotted. Catherine Ordun, a deputy project manager for data science and health surveillance at Booz Allen Hamilton, provided an overview of the typical information technology architecture used to compile, organize, and analyze big datasets.
From page 8...
... For Chabot-Couture, the goal of using big data is not just to find something interesting for the sake of discovery but to find something interesting that is actionable at scale. One approach to taking advantage of the variety of data available in big datasets, he said, is to explore the data using data mining and feature engineering techniques and analyze the data using methods outside of the typical operational analyses to look at questions outside of the normal day-to-day use of the data.
From page 9...
... issued a challenge to preempt or combat at their source the first stage of emergence of zoonotic diseases -- those originating in animals -- that pose a significant threat to public and animal health and create and have the potential to produce pandemic infections. As Mazet, put it, "This was a pretty crazy challenge because it was attempting to stop everything we do not know might happen before it h ­ appens, and if we are successful, no one will know about it." Nonetheless, Mazet and her colleagues at the One Health Institute, with support from USAID, put together a consortium, now known as PREDICT, of ministries of health, agriculture, and environment, multiple universities, and nongovernmental organizations in 31 countries to tackle this problem using big data as a critical tool.
From page 10...
... Reducing the impact of zoonotic disease on humans. SOURCES: Mazet presentation; Karesh et al., 2012.
From page 11...
... reflects the number of connections to different transmission interfaces and the ecological plasticity of viruses through the use of multiple transmission opportunities. Highly connected and more central interfaces facilitated the transmission of more viruses, providing an epidemiologic picture of circumstances likely to promote future disease emergence, and important targets for disease surveillance and preventive measures.
From page 12...
... This amount is on the same order of the $6.7 billion that the World Bank projected would be saved annually by preventing zoonotic disease outbreaks and much less than the tens of billions of dollars spent containing the most recent Ebola outbreak in West Africa. An added benefit, aside from the eventual cost savings, would be the opportunity for researchers to develop more effective vaccines and countermeasures for families of viruses that would enable the world to be ahead of these outbreaks rather than always catching up.
From page 13...
... OPPORTUNITIES AND CHALLENGES FOR BIG DATA AND ANALYTICS 13 TABLE 3-2  PREDICT Virus Detection Results by Viral Family Novel Known Novel Known Novel Known Rodent/ Rodent/ Novel Known Viral Family Bat Bat Primate Primate Shrew Shrew Human Human Adenovirus 53 3 6 4 32 1 1 3 Astrovirus 153 33 19 3 31 1 0 1 Coronavirus 61 30 3 0 6 0 0 2 Dependovirus 0 0 11 0 0 0 0 0 Flavivirus 3 0 0 1 0 0 0 2 Hantavirus 3 1 0 0 0 2 0 1 Herpesvirus 46 0 48 25 43 6 0 5 Orbivirus 1 0 1 0 0 0 0 0 Paramyxovirus 63 7 0 2 11 2 0 3 Polyomavirus 27 1 4 3 8 0 0 1 Arenavirus 0 0 0 0 2 2 0 0 Rhabdovirus 19 0 2 0 7 0 1 0 Seadornavirus 1 0 0 0 0 0 0 0 Bocavirus 0 2 1 3 0 0 0 0 Enterovirus 0 0 5 4 2 0 0 5 Retrovirus 0 0 4 7 0 0 0 1 Alphavirus 0 0 0 1 0 0 0 0 Poxvirus 0 0 0 1 1 0 0 0 Influenza 0 2 0 0 0 1 0 5 Mononegavirales 0 0 1 0 0 0 0 0 Papillomavirus 0 0 1 0 0 0 0 0 Picobirnavirus 0 0 120 0 0 0 0 0 Picornavirus 0 0 4 0 0 0 0 0 Picornavirales 0 0 4 0 0 0 0 0 Phlebovirus 1 0 0 0 0 0 0 0 Rotavirus 0 0 1 0 0 0 0 0 Anellovirus 0 0 0 0 0 0 1 1 Hepadnavirus 0 0 0 0 0 0 0 1 NOTE: Numbers of viruses do not total to 984 as cited in text because viruses have been found in more than one wildlife host taxa. SOURCES: Re-created from Mazet presentation; PREDICT Consortium, 2014.
From page 14...
... At the same time as the recent Ebola outbreak in West Africa, the PREDICT team d ­ etected the Ebola virus in the Democratic Republic of the Congo, enabling the prime minister's office to restrict the movement of people, which, Mazet said, r ­ esulted in that outbreak ending quickly and minimized mortality there to under 100 people. In contrast, this outbreak lasted much longer and resulted in many more deaths in nearby countries that were not part of the PREDICT program.
From page 15...
... Geographically detailed information of this sort is not readily available, Hay said, and it is available at a crude national-scale resolution for only 350 of the 1,400 known infectious diseases. This raises the question of whether it is possible
From page 16...
... The Atlas of Baseline Risk Assessment for Infectious Disease (ABRAID) represents Hay's attempt to automate the mapping of disease risk using available big datasets (Hay et al., 2013b)
From page 17...
... other lower-provenance data sources. Hay explained that crowdsourcing and machining-learning systems trained by experts are used to validate the accumulated occurrence data, which is then fed into an automated mapping system.
From page 18...
... His hope is that such data sources can generate usable occurrence data for the many infectious diseases for which there is little or no information about geographical distribution. As an example of potential applications he hopes to see in the future, Hay concluded his talk by describing his group's most recent work mapping the envi­ ronmental suitability for the Zika virus (see Figure 3-5)
From page 19...
... . Lessons Learned It is not enough to simply input big datasets into models, Chabot-Couture said.
From page 20...
... 20 FIGURE 3-6  Retrospective predictive modeling of measles outbreaks compared to reported cases for the fourth quarter of 2015. For the model (left)
From page 21...
... Coverage surveys would provide a definitive result, but they are expensive and are only conducted every 5 years or so, Chabot-Couture said. The opportunity for big data in this case is to use disease surveillance data -- reported cases as well as asking patients how many doses of vaccine they received -- as a benchmark and then triangulate toward a more accu­ rate estimate of actual vaccination coverage.
From page 22...
... Looking Forward When asked by Lonnie King, a professor and dean emeritus of the College of Veterinary Medicine at The Ohio State University, to look a decade ahead and talk about innovative strategies for using big data, Chabot-Couture commented that even in an era where a growing percentage of the world's population uses a mobile phone, the reality is that a great deal of potentially useful data is still being collected on paper and, as Mazet added, lack any detail on where the data were collected geographically. As data collection moves to mobile devices, every data point will have global positioning system coordinates and time stamps.
From page 23...
... The reason public health surveillance is so important in the field of infectious diseases and why it is important to realize the promise of big data to augment public health surveillance, Edelstein said, is that infectious diseases have had such an tremendous impact on human populations over the ages. As recently as the 20th century, influenza outbreaks killed as many as 100 million people worldwide, and the 2014 Ebola outbreak killed at least 11,000 people.
From page 24...
... , Edelstein said.5 Another source of digital disease detection data is participatory surveillance, which enrolls volunteers to regularly report their health status online. Using digital data as a tool for disease detection is not easy, Edelstein said, but it is possible to produce valid public health data on a timely basis.
From page 25...
... Sweden's public health agency is using a digital disease detection tool -- a tool that Edelstein and his colleagues validated using 8 years of retrospective data -- to spot the onset of the norovirus season before hospitals report outbreaks. Another example that Edelstein described, which is unrelated to infectious disease detection, involves the use of the so-called dark Web to track trends in transactions of illegal substances or prescription drugs used for recreational purposes, creating a picture of a public health problem that could not be captured using traditional surveillance techniques.
From page 26...
... Confident that these issues will be addressed successfully, Edelstein said he believes that the country-level public health surveillance system will remain at the center of the surveillance system because the mandate to protect populations still rests with governments. However, digital disease detection organizations will become formal partners in this system, feeding actionable data into the surveillance system in a systematic manner.
From page 27...
... OPPORTUNITIES AND CHALLENGES FOR BIG DATA AND ANALYTICS 27 FIGURE 3-7  The many possible components of a data architecture. NOTE: The products and brands shown are exemplary and not exhaustive.
From page 28...
... Lessons for Building Analytics Applications Ordun has managed several projects: one that analyzed 2 years of tweets, a ­ pproximately 1 million tweets in total, using natural language processing to extract the number of people who were hospitalized or sick from food-borne illnesses; one that served as a digital disease detection dashboard for hypothesis testing and forecasting using multiple data sources, including CDC data, weather or climate data, as well as Census data; and a third that is a more complex geospatial analytics application that can superimpose hundreds of geospatial feeds like terrain, land use, or transportation for situational awareness in complex emergencies. Most recently, she helped lead a project with the Food and Drug Administration to create analytics to provide rapid signal detection of adverse events and medication errors using public mobile app reporting, as well as leading the team that developed the mobile application and the cloud architecture.
From page 29...
... She also recommended against over-engineering an architecture and going for the biggest datasets possible. COMBATING MICROBIAL RESISTANCE WITH BIG DATA As an example of how a large, geographically dispersed health care organization is using big data to combat microbial resistance to antibiotics, Lesho described efforts by the DoD to use big data to conduct epidemiologic surveillance of and applied research on multidrug resistant microorganisms.
From page 30...
... Lesho said that while whole-genome sequencing places a heavy demand on data storage and processing capabilities, it also replaces many conventional genetic fingerprinting tests that can require thousands of different polymerase chain reaction (PCR) runs to characterize all of an organism's resistance and virulence genes and the mobile genetic elements responsible for transmitting resistance mechanisms between organisms.
From page 31...
... Big Data Is Not Always Better Data As a cautionary tale illustrating the challenge of data veracity, Lesho dis cussed a study using big data to identify specific strains of the gram-­ ositive p organism Staphylococcus aureus that elude identification by automated ­vancomycin-susceptibility platforms. Vancomycin-resistant Staphylococcus ­aureus are rare, and vancomycin-susceptible Staphylococcus aureus are com mon.
From page 32...
... 32 BIG DATA AND ANALYTICS FOR INFECTIOUS DISEASE RESEARCH and government-funded laboratories to keep up with technological change. The growth of data is placing huge demands on both storage systems and the supply of information technology professionals with the skills to manage the influx of data from a large health care organization.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.