Skip to main content

Currently Skimming:

3 The Auditory Channel
Pages 134-160

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 134...
... Accordingly, most of the material presented in this section is concerned not so much with auditory interfaces as with other aspects of signal presentation in the auditory channel. STATUS OF THE RELEVANT HUMAN RES~RCH There is no topic in the area of auditory perception that is not relevant to the use of the auditory channel in some kind of SE system.
From page 135...
... It appears that the upper limits on information transfer rates for spoken speech and Morse code, two methods of encoding information acoustically for which there exist subjects who have had extensive training in deciphering the code, are roughly 60 bits/e and 20 bits/e, respectively. Unfortunately, we are unaware of any estimates of the information transfer rate for the perception of music.
From page 136...
... Certainly, none of the individuals we know would be willing to spend an equivalent amount of time attempting to learn an arbitrary nonspeech code developed for purposes of general research or for use in SE systems. Unfortunately, there is no theory yet available that enables one to reliably predict the dependence of information transfer rates on the coding scheme or training procedures employed.
From page 137...
... . Spatial Perception The topic of auditory spatial perception is important because the perception of the spatial properties of a sound field is an important component of the overall perception of real sound fields, because the location of a sound source is a variable that can be used to increase the information transfer rate achieved with an auditory display, and because it has been a central research focus in the simulation of acoustic environments for virtual environment (VE)
From page 138...
... . For isolated sound sources in an anechoic environment, the just-noticeable difference (IND)
From page 139...
... Again considering sources in front of the listener, source distance must be large enough that the maximum angular positional error (equal to the angular tracker error plus the error in angular position due to translational tracker error) is smaller than the [ND in angular position.
From page 140...
... suggests that the computational latencies currently found in fairly simple auditory VEs are acceptable for moderate velocities. For example, latencies of 50 ms (associated with positional update rates of 20 Hz as found in the Convolvotron system from Crystal River Engineering)
From page 141...
... Somehow, the higher centers in the auditory system must decompose the output of each auditory filter in each ear into elements that, after the decomposition, can be recombined into representations of individual sources. In general, understanding auditory scene analysis is important to the design of SE systems because properties of this analysis play an important role in determining the effectiveness of auditory displays.
From page 142...
... While such illusions may seem merely like perceptual curiosities, these kinds of effects can have a profound impact on whether information is perceived and organized correctly in a display in which acoustic signals are used to convey meaning about discrete events or ongoing actions in the world and their relationships to one another. For example, if two sounds are assigned to the same perceptual stream, simultaneous acoustic masking may be greater but temporal comparisons might be easier if relative temporal changes cause a change in the Gestalt or overall percept or cause the single stream to perceptually split into two streams.
From page 143...
... In the context of display design, the notion of auditory scene analysis has been most influential in the recent interest in using abstract sounds, environmental sounds, and sonification for information display (Kramer, 19944. The idea is that one can systematically manipulate various features of auditory streams, effectively creating an auditory symbology that operates on a continuum from literal everyday sounds, such as the rattling of bottles being processed in a bottling plant (Gayer et al., 1991)
From page 144...
... In particular, a number of transformations are being studied that present the listener with magnified perceptual cues that approximate in various ways the cues that would be present if the listener had a much larger head (Durlach and Pang, 1986; Van Veen and lenison, 1991; Durlach et al., 1993~. Also being studied now are transformations that enable the listener to perceive the distance of sound sources much more accurately (see the review by Durlach et al., 1993~.
From page 145...
... Although commercial high-fidelity firms often claim substantial imaging ability with loudspeakers, the user is restricted to a single listening position within the room, only azimuthal imaging is achieved (with no compensation for head rotation) , and the acoustic characteristics of the listening room cannot be easily manipulated.
From page 146...
... Even including such a protector, it is unlikely that the cost of such a sound delivery system will exceed $1,000. Most of the past work on auditory interfaces for virtual environments has been directed toward the provision of spatial attributes for the sounds.
From page 147...
... Estimates of HRTFs for different source locations are obtained by direct measurements using probe microphones in the listener's ear canals, by roughly the same procedure using mannequins, or by the use of theoretical models (Wightman and Kistler, 1989a,b; Wenzel, 1992; Gierlich and Genuit, 1989~. Once HRTEs are obtained, the simulation is achieved by monitoring head position and orientation and providing, on a more or less continuous basis, the appropriate HRTFs for the given source location and head position/orientation.
From page 148...
... employed simple time-domain processing schemes to "spatialize" input sound sources. The Acoustetron (successor to the Convolvotron)
From page 149...
... As the computational power of real-time systems increases, the use of these detailed models will become feasible for the simulation of realistic environments. The most common approach to modeling the sound field is to generate a spatial map of secondary sound sources (Lehnert and Blauert, 1989~.
From page 150...
... Two methods are commonly used to find secondary sound sources: the mirror-image method (Allen and Berkley, 1979; Borish, 1984) and variants of ray tracing (Krokstadt et al., 1968; Lehnert, 1993a)
From page 151...
... Efficient algorithms and signalprocessing techniques for real-time synthesis of complex sound fields are currently being investigated in a collaborative Droiect be Crv.~tal River Engineering and NASA. a -- ~ -- ad Off-Head, Hear-Through, and Augmented-Reality Displays Apart from work being conducted with entertainment applications in mind, most of the research and development concerned with auditory displays in the SE area has been focused on stimulation by means of earphones.
From page 152...
... . For SE applications, the main problem with loudspeaker systems, as it is with earphones, is that of achieving the desired spatialization of the sounds (including both the perceived localization of the sound sources and the perceived acoustic properties of the space in which the sources are located)
From page 153...
... How can we build a real-time system that is general enough to produce the quasi-musical sounds usually used for auditory icons, as well as purely environmental sounds, like doors opening or glass breaking? The ideal synthesis device would be able to flexibly generate the entire continuum of nonspeech sounds described above as well as be able to continuously modulate various acoustic parameters associated with these sounds in real time.
From page 154...
... In synthesizing music, typically the goals are not as specific or restricted: they are defined in terms of some subjective criteria of the composer. Usually, the goal is to produce an acoustic waveform with specific perceptual qualities: either to simulate some traditional, physical acoustic source or to produce some new, unique sound with appropriate attributes.
From page 155...
... Subtractive synthesis is a term in music synthesis that refers to the shaping of a desired acoustic spectrum by one or more filtering operations and is a precursor to the more modern approaches that have come to be known as physical modeling. Subtractive synthesis is often used when a simple physical model of an instrument can be described by an excitatory source that drives some filterts)
From page 156...
... As noted above, subtractive synthesis was the first attempt at this type of modeling of aggregate properties. Several more recent physical modeling techniques based on aggregate properties have been developed, including the digital waveguide technique, transfer function techniques, modal synthesis, and mass spring models to synthesize sounds ranging from musical instruments to the singing voice of the human vocal tract (Borin et al., 1992; Cadoz et al., 1993; Cook, 1993; D,oharian, 1993; Keefe, 1992; Morrison and Adrien, 1993; Smith, 1992; Wawrzynek, 1991; Woodhouse, 1992~.
From page 157...
... , and Richards (1988~. Synthesis Technology Current devices available for generating nonspeech sounds tend to fall into two general categories: samplers, which digitally store sounds for later real-time playback, and synthesizers, which rely on analytical or algorithmically based sound generation techniques originally developed for imitating musical instruments (see Cook et al., 1991; Scaletti, 1994; and the discussion above)
From page 158...
... Recently, several commercial synthesizer companies have announced new products based on physical modeling techniques. A sound card being developed by MediaNlision is based on digital waveguides (Smith, 1992~; the Yamaha VL1 keyboard synthesizer is based on an unspecified physical modeling approach; and the Macintosh-based Korg SynthKit allows construction of sounds via interconnection of a visual programming language composed of modular units representing hammer-strikes, bows, reeds, etc.
From page 159...
... Another major area in the perceptual domain that requires substantial work falls under the heading of auditory information displays. Current knowledge of how to encode information for the auditory channel in ways that produce relatively high information transfer rates with relatively small amounts of training is still rather meager.
From page 160...
... Subsequently, it will be necessary to develop a user-friendly system for providing speech, music, and environmental sounds all in one integrated package. And eventually, of course, the whole sound generation system will have to be appropriately integrated with the software and hardware system for generating visual and haptic images.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.