Skip to main content

Currently Skimming:

7 Speech, Physiology, and Other Interface Components
Pages 231-246

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 231...
... Computer generation of speech also suffers from problems that remain to be solved. In particular, currently available speech synthesis technology does not provide speech that sounds natural or that can be easily matched to the characteristics of an individual speaker.
From page 232...
... A comprehensive review of speech synthesis appears in Klatt (1987~; information on commercial speech synthesis systems is available in the same newsletters and magazines cited above for speech recognition. Speech Recognition There are at least three critical factors contributing to the complexity of speech recognition by machine.
From page 233...
... The parameters of the model, the probabilities, are deterrruned from labeled training speech data, presumably containing realizations of all the modeled utterances. Efficient algorithms exist for both the training and recognition tasks.
From page 234...
... For this purpose, the output of the recognizer is sent to a language understanding component, which analyzes and interprets the recognized word sequence. To allow for possible errors in recognition, the recognition component sends to the language understanding component not only the top-scoring word sequence, but also the N top-scoring word sequences (where N is typically 10-20~.
From page 235...
... For example, in a test using the ARPA Wall Street Journal continuous speech recognition corpus, word error rates of 11 percent have been achieved for speaker-independent performance on read speech (Pallett et al., 19941. Although this performance level may not be sufficient for a practical system today, continuing Improvements in performance are likely to make such systems of practical use In a few years.
From page 236...
... HMMs have proven to be very good for modeling variability in time and feature space and have resulted in tremendous advances in continuous speech recognition. However, some of the assumptions made by HMMs are known not to be strictly true for speech for example, the conditional independence assumptions in which the probability of bein in a state is dependent only on the previous state, and the output probability at a state is dependent only on that state and not on previous
From page 237...
... One major obstacle for advancement is the lack of a representation of semantics that is general and powerful enough to cover major applications of interest. And even if such a representation were available, there is still a strong need to develop automatic methods for interpreting word sequences, without having to rely on the currently dominant methods of labor-intensive crafting of detailed linguistic rules.
From page 238...
... For both concatenative and formant synthesis, the generation of natural-sounding utterances requires that rules be developed for controlling the temporal aspects of the speech and changes in fundamental frequency that indicate prominent syllables and that delineate groupings of words. The most successful devices for synthesis of speech from text produce speech with reasonably high intelligibility, although not quite as intelligible as human production of speech, and with some lack of naturalness.
From page 239...
... to help individuals with severe motor disabilities control computers and telerobots, because this topic is included in our discussion of the medicine and health care application domain in Chapter 12. Most practical work on physiological responses has been conducted in the laboratory for purposes of establishing the effects of selected experimental conditions on an individual's emotional state or for designing systems and system tasks that take human capabilities and limitations into consideration.
From page 240...
... Thus, an increase in a physiological response such as heart rate or muscle tension may be interpreted as an increase in sympathetic activity or as a decrease in parasympathetic activity. Physiological responses that are interpreted as sympathetic nervous system activities are often cited as indicators of emotional response.
From page 241...
... There are many other cardiovascular variables that can be assessed (e.g., pulse transit time, forearm blood flow Smith and Kampine, 1984~. Depending on the precise nature of the research question, additional measures may be needed in order to identify the underlying mechanisms that have produced any observed changes in blood pressure or heart rate.
From page 242...
... may be monitored as a way of assessing muscle tension, or facial expression. Even changes too fleeting or slight to be observed by a human judge can be detected by placing electrodes near key facial muscles (Hassett, 1978; Cacioppo and Petty, 1983~.
From page 243...
... Research on pupil dilation has shown a change in pupil size as mental workload increases: during high levels of workload the pupil dilates, but when the operator becomes overloaded, pupil size is reduced. Pupil dilation can be recorded by a motion picture or video camera; analysis is accomplished by measuring each frame through manual or automated techniques.
From page 244...
... This time delay limits the potential of ERPs for NIE applications requiring real-time, closed-loop control. Functional and Structural Information Functional and structural information about the brain can be obtained by a combination of methods, including magnetic stimulation mapping, positron emission tomography, and magnetic resonance imaging.
From page 245...
... Essentially, the studies have shown that subjects have been able to accurately roll to the left or right in the simulator by controlling brain resonance signals. Although this research is in early stages, it does appear to hold some promise for training individuals to use physiological responses as simple control signals (e.g., on-off, left-right, etc)
From page 246...
... ; Polhemus Fastrak position information is processed by a PC and communicated to one of the Indigos via an RS-232 serial bus; another Indigo serial bus controls the odor delivery subsystem; a third serial connection on the second Indigo drives a MIDI bus to control the radiant heat subsystem; and an Ethernet connection between the two Indigos is used for synchronization purposes and for sharing peripheral information. The demonstration program Pyro makes use of two virtual human arms and hands, Polhemus trackers attached to the user's wrists, a virtual torch, and some virtual flammable spheres.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.