Skip to main content

Currently Skimming:

Using Computerized Text Analysis to Assess Threatening Communications and Behavior--Cindy K. Chung and James W. Pennebaker
Pages 3-32

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 3...
... . In the literature on humans, an impressive number of studies have analyzed threatening behaviors by studying posture, facial expression, tone of voice, and an array of biological changes (Hall et al., 2005)
From page 4...
... Despite the obvious importance of natural language as the delivery system of threats, very few social scientists have been able to devise simple systems to identify or calibrate language-based threats. Only recently, with the advent of computer technology and the availability of large language-based datasets, have scientists been able to start to identify and understand threatening communications and responses to them through the study of words (Cohn et al., 2001; Pennebaker and Chung, 2005, 2008; Smith, 2004, 2008; Smith et al., 2008)
From page 5...
... word count strategies. All are valid approaches to understanding threatening communications and can potentially yield complimentary results to both academic and nonacademic investigators.
From page 6...
... Computerized Word Pattern Analysis Rather than exploring text "top down" within the context of previously defined psychological content dimensions, word pattern strategies mathematically detect "bottom up" how words covary across large samples of text (Foltz, 1996; Poppin, 2000) or the degree to which words overlap within texts (e.g., Graesser et al., 2004)
From page 7...
... . LIWC is a computerized word counting tool that searches for approximately 4,000 words and word stems and categorizes them into grammatical (e.g., articles, numbers, pronouns)
From page 8...
... These studies provide evidence that word use is reflective of thoughts and behaviors that characterize psychological states. Word counts provide meaningful measures for a variety of thoughts and behaviors.
From page 9...
... . In fact, most of the language samples from word count studies come from sources in which natural language is recorded for purposes other than linguistic analysis and therefore have the advantage of being more externally valid than the majority of studies involving implicit measures.
From page 10...
... Also, situational features across multiple threats cannot be cleanly or confidently classified into discrete categories in order to generalize to new threats. Many of these difficulties in research on threatening communications overlap with the difficulties in research on deception, for which empirical and naturalistic research has made considerable progress through the use of computerized text analyses (for a review, see Hancock et al., 2008)
From page 11...
... With language samples of threatening communications, often the threatening message is revisited after the act. However, a threat, by definition, is received before the act of harm, and so the language samples analyzed to investigate threats versus deceptive messages typically come from different time points.
From page 12...
... By the same token, latent threats and nonthreats can variously be interpreted in both benign and threatening ways. Failure to adequately detect a real threat or to falsely perceive a true nonthreat may say as much about the perceiver as the message itself.
From page 13...
... REVIEW OF EMPIRICAL RESEARCH ON COMMuNICATED INTENT AND ACTuAL BEHAVIORS uSING TEXT ANALYSIS To distinguish between real threats and bluffs, or between latent threats and nonthreats, the first step is to assess whether or not a given communication is deceptive. To detect deception, computerized text analysis methods have been applied to natural language samples in both experimental laboratory tests and a limited number of real-world settings.
From page 14...
... Note that these numbers are likely inflated since estimates of the veracity of statements is dependent on the selection of statements themselves -- as opposed to a broader analysis of all statements made by the Bush administration. It is important to note that the strength of the language model is that it has been applied to a wide variety of natural language samples from low- to high-stakes situations.
From page 15...
... , and their empirically derived categories are based on psychoanalytic theories and clinical observations. The advantage of all word count tools for the analysis of therapeutic text is that word counts tend to be a less biased measure of therapeutic improvements than clinician's self-reports (Bucci and Maskit, 2007)
From page 16...
... . In one study, both computerized word pattern and word count analyses of public statements made by Osama bin Laden and Ayman al-Zawahiri, from the years 1988 through 2006, were examined (Pennebaker and Chung, 2008)
From page 17...
... In addition, this research would be more suitable for realtime or close to real-time analyses if the judge-based dimensions that are coded at high intercoder reliability rates could reliably be detected using computerized word pattern or word count indices. Bluffs Unlike a real threat, a bluff might contain markers of deception since it is one that is believed by the writer or speaker to be false.
From page 18...
... An example of a latent threat is that of President George W Bush's use of first-person singular pronouns during his (over 600)
From page 19...
... Analysis of the natural language of these political leaders highlights the ability of computerized word counts to reveal how people are attend ing and responding to their personal upheavals, relationship changes, and world events. A within-subject text analysis of public speeches over time bypassed the difficulties in traditional self-reports (i.e., personally seeking out these leaders in their top-secret hideouts to ask them to fill out questionnaires with minimal response biases)
From page 20...
... . FuTuRE DIRECTIONS The use of text analysis to understand the psychology of threatening communications is just beginning.
From page 21...
... Creation of Shared-Text Databases A pressing practical issue that must be addressed is access to data. First, by increasing access to data from naturally occurring threats from forensic investigations and across laboratories, researchers can start to build a more complete picture of threatening communications and compare text analytic methods in terms of their efficacy for assessing threat features.
From page 22...
... Finally, threat-related experimental laboratory studies must be run and archived. A significant concern of the large-database approach to linking language with threatening communications is that it is ultimately correlational.
From page 23...
... . Once empirical research has reliably identified the linguistic features of mental illness, future research can investigate the degree to which threats are communicated by individuals with various mental illnesses and disorders.
From page 24...
... In the case of computational linguistics, finding out whether, for example, language markers of sex differences are maintained in translations between Arabic and English could aid in investigations of author identification for translated documents. Note that computerized word patterning methods have already been successfully applied to authorship identification and characterization for Arabic and English extremist-group Web forums (Abbasi and Chen, 2005)
From page 25...
... Clearly, more validation work is required to assess the use of the Arabic LIWC dictionaries for cross-language investigations. However, the approach laid out here can help in beginning to see the world through Arabic and English eyes using a simple word counting program for assessing language style.
From page 26...
... Again, returning to the case of deception, the classifiers would be run on documents known to be deceptive or not, additional features that predict deception could be assessed, and then future documents of unknown verity could be assessed using the same classifier for the probability that the new document is deceptive or not. Note that there are several features of SLP that make it a suitable approach to be developed and applied for investigations of threatening communications and actual behavior.
From page 27...
... Note, again, that the ultimate contribution of text analyses of threatening communications will come from the degree to which text analysis informs us about the underlying psychology of the actors. CONCLuSION There has been little work so far on computerized text analysis of threatening communications.
From page 28...
... Predicting weight loss in blogs using computerized text analysis. Derived from Ph.D.
From page 29...
... Unpublished. Computer-based text analysis across cultures: Viewing language samples through English and Arabic eyes.
From page 30...
... 2008. Computerized text analysis of al-Qaeda statements.
From page 31...
... 2001. Linguistic Inquiry and Word Count: LIWC 2001.
From page 32...
... 2008. Computer gestuetzte quantitative Textanalyse: Aequivalenz und Robustheit der deutschen Ver sion des Linguistic Inquiry and Word Count [Computerized quantitative text analysis: Equivalence and robustness of the German adaptation of Linguistic Inquiry and Word Count]


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.