Skip to main content

Currently Skimming:

4 Case Studies in Big Data and Analysis
Pages 33-50

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 33...
... William DuMouchel, the chief statistical scientist at Oracle Health Sciences, then spoke about the use of Bayesian statistics to analyze FDA's database of spontaneous adverse drug events to spot potential adverse drug–drug interactions. Chicago's Chief Data Officer Tom Schenk d ­ escribed the city of Chicago's program to mine its databases in order to identify public health problems, and William So, a policy and program specialist with the Federal Bureau of Investigation's (FBI's)
From page 34...
... These datasets are useful for disease research because, for example, sea surface temperatures affect precipitation, which in turn affects land surface temperatures and vegetation, creating conditions under which different disease vectors emerge and are able to propagate and spread disease. In particular, Anyamba explained, long-term datasets such as these are valuable because they enable the detection of anomalies, which by themselves are not important but their persistence over time is.
From page 35...
... USDA publishes a monthly report based on the model's output on the Agricultural Research Service website.2 Early warning provided by this model has had a positive impact with regard to putting preventive measures in place, Anyamba said. In 2007, for example, the model gave an early warning some 3 months prior to an outbreak in Kenya and 5 months before Rift Valley Fever appeared in Tanzania.
From page 36...
... SOURCE: Anyamba presentation. 2016 the model warned of potential high-risk areas 1 year in advance, and early mitigation activities have resulted in no reports of Rift Valley Fever activity in the epizootic regions.
From page 37...
... In the infectious disease space, the FDA and the Centers for Disease Control and Prevention (CDC) have partnered to create the GenomeTrakr program, which uses big data for tracing foodborne pathogens back to their sources.
From page 38...
... This information is now included in the drug's product label and is used by clinicians to inform optimal treatment regimens. To support this type of data analysis, Borio said, the FDA's Division of Antiviral Products built a new data architecture and infrastructure using the type of process Catherine Ordun, deputy project manager for data science and health surveillance at Booz Allen Hamilton, had described earlier in the workshop.3 In the area of bacterial resistance, the FDA uses big data to help set what are known as breakpoints, the concentrations of an antibacterial drug at which bacterial species become resistant to a drug.
From page 39...
... Bayesian hierarchical models can also isolate problems arising from a drug taken commonly with many other drugs, as is the case for many patients with diabetes, heart failure, or AIDS. These models are also useful for the early detection of adverse events in more structured data, such as from clinical trials, and from unstructured data such as Web search logs (White et al., 2016)
From page 40...
... The resulting Web application now informs a city manager who dispatches city sanitation employees to bait and look for rodents. As an aside, Schenk noted that tests showed that the city manager, who had been at her job for 20 years, was just as good at predicting where rodent outbreaks would occur, but doing so took her 1 to 2 days per week of planning.
From page 41...
... Schenk noted, too, that citizens have created their own applications using city data, including a program that drivers whose cars have been towed can use to find their car. In another instance, a group of citizen scientists took an existing model that had been developed by the federal government and State of Michigan to predict Escherichia coli levels in Lake Michigan and improved the model by using Chicago's publicly available data.
From page 42...
... A report issued in 2016 (Independent Security Evaluators, 2016) based on a 2-year study of 12 health care facilities, two health care data facilities, two medical device companies, and two Web applications (such as the one operated by Chicago but for information on health and health care)
From page 43...
... . This survey generated heat maps for 44 different elements and a variety of minerals.9 From the perspective of a microbiologist, these heat maps can serve as a surrogate petri dish containing a specific type of culture media that some organisms will grow on and others will not.
From page 44...
... 44 FIGURE 4-3  Elemental distributions for calcium (left) and phosphorus (right)
From page 45...
... For example, Griffin's analysis identified strontium as an element present in soils where anthrax was found, and when an anthrax researcher questioned him about this, he was able to remind the researcher that strontium is critical to anthrax spore formation. GIS AND VECTOR-BORNE DISEASES By combining published information on a variety of climate and geographical data with outbreaks of various infectious disease and known locations of the vectors that transmit the infectious organism and by using a tool called similarity search, Attaway has been able to generate maps that relate environmental and climate conditions to the likelihood of future outbreaks (see Figure 4-4)
From page 46...
... Sadilek calls this an organic sensor network, and he believes it should be possible to mine the data generated by such a network and derive value from it. Because many of these data have a location component, it could be possible to draw inferences about related events from the data and use them to make predictions.
From page 47...
... Following the Spread of Influenza in a City The key challenge in using tweets, he explained, is to extract useful information from these public messages using some form of natural language processing. Simple algorithms that look for a word such as sick will not work because "I am sick of work" and "I feel sick" cannot both be interpreted as having something to do with illness.
From page 48...
... Prob ability of getting sick within 1 day of time t as a function of the number of estimated encounters with sick individuals within three different time windows around time t: 1 hour (blue) , 4 hours (red)
From page 49...
... In fact, a 3-month pilot program conducted with Las Vegas, Nevada, that selected inspections based on an analysis of Twitter feeds and compared the results with a traditional method of assigning restaurants for inspection identified 50 percent more problem restaurants and resulted in 70 percent more closures (Sadilek et al., 2016)


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.