Mathematical Foundations for Anticipating Rare Events
To begin his presentation, Alon Orlitsky, professor of electrical and computer engineering at the University of California, San Diego, noted that when two legendary researchers—Martin Helmut, who co-developed public key cryptography, and Vinton Cerf, who designed one of the first Internet protocols—argued in a recent paper1 about the probability of having a nuclear war in the following year, the only thing they could agree on was that the risk was too high. The fact that these two prominent thinkers could not even agree on whether there was a probability for this event, let alone what that probability is, suggests that the problem is unsolvable and that there is no mathematical solution for the probability that there will be a nuclear incident in the following year.
Rather than talk about the probability of nuclear war, Orlitsky said his presentation would focus on topics where it is possible to derive rigorous, mathematically provable results about rare and unseen events. The difference between a nuclear event and a solvable rare event is that the former is a binary event—it either does or does not happen—for which Orlitsky argues prediction is impossible. This is not just theory, he said, but something that is done frequently. Examples of what are known as “large domain problems” include machine translation and transcription, which involve predicting which rare word might come next in a sentence, and predicting where a terrorist attack might occur, given the many possible cities where an attack might occur and the many different terrorist groups that could launch an attack.
The first notable success at solving a large domain problem came during World War II, when I.J. Good and Alan Turing were trying to decipher the German enigma code. This effort involved trying to guess the approximate location of a password taken from a large book. In this case, each word in the book has a very low probability of being chosen, making it a rare event. The two mathematicians developed an estimation formula, known as the Good-Turing formula that even today is used in many applications, including speech recognition. This formula calculates the probability of rare events and unseen events when there are a large number of elements involved, such as predicting what the next word in a sentence will be given the large number of words in the English language and the frequency with which they occur in other sentences. Orlitsky and his students have used the Good-Turing formula to see if they could predict the number of terrorist incidents in different types of cities in the next year given the number of terrorist incidents in the current year. For data, they used the National Consortium for the Study of Terrorism and Responses to Terrorism (START) global terrorism database2 of more than 50,000 events
1 M.E. Hellman and V.G. Cerf, 2021, “An Existential Discussion: What Is the Probability of Nuclear War?,” Bulletin of the Atomic Scientists, March 18, 2021, https://thebulletin.org/2021/03/an-existential-discussion-what-is-the-probability-of-nuclear-war.
2 See the START global terrorism database at https://www.start.umd.edu/gtd, last updated May 2022.
in some 12,000 cities from 1992 to 2010, and they based their predictions for cities where no events or few events occurred in a given year on data from the previous year. The results, said Orlitsky, tracked well with the actual number of terrorist events that occurred in those cities.
Orlitsky then discussed methods for estimating the probabilities of multiple events simultaneously and probabilities over future time periods based on observations in a previous time period. By analogy, after observing the number of words in the first 10 percent of Shakespeare’s Hamlet, one could estimate the number of different words in the remainder of the play. He also described methods for estimating the probabilities of events occurring when the data come from many sources, some of which are likely to be wrong.
David Aldous, a member of the planning committee, commented that the methods Orlitsky discussed treat data as if they do not have any structure. As an example, he noted that the cities where terrorist attacks occur have a structure that would be good to include in calculating those probabilities. Similarly, Barrett noted that because systems are not stable and are constantly evolving, it will be necessary to perform these calculations repeatedly as the underlying estimates change. Orlitsky added that there are instances involving competition where these approaches do not work.