The National Academies Press

Currently Skimming:

3 Input/Output Technologies: Current Status and Research Needs
Pages 71-120

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.

From page 71... ... Some involve advances in basic underlying display and interface technologies (higherresolution visual displays, three-dimensional displays, better voice recognition, better tactile displays, and so on) Read the entire page →
From page 72... ... Language Contrasts and Continuum There are two language classes of interest in the design of interfaces: natural languages (e.g., English, Spanish, Japanese) and artificial lan Read the entire page →
From page 74... ... Although natural language processing remains a challenging long-range problem in artificial intelligence (as discussed under "Natural Language Processing" below in this chapter) , progress continues to be made, and better understanding of the ways in which it makes communication easier may be used to inform the design of more restricted languages. Read the entire page →
From page 75... ... . Speaking can involve isolated words or continuous speech recognition. Read the entire page →
From page 76... ... Comparisons Among Graphical User Interfaces, Natural Language, and Speech The language-expression-device framework can be used to gain perspective on current standard interface types and on the research opportunities and challenges presented by ECIs. For example, it makes clear that natural language processing and speech recognition (and other technologies that may be associated colloquially) Read the entire page →
From page 77... ... Similarly, it is possible to improve restricted language interfaces by applying principles from natural language communication. Current graphical user interface/menu/icon systems tightly constrain what one can say, both by starting with a very constrained language and by having a structured way in which one can express things in that language. Read the entire page →
From page 78... ... Current direct manipulation interfaces with two-dimensional display and mouse input make use, minimally, of one arm with two fingers and a thumb and one eye about what is used to control a television remote. It was considered a stroke of genius, of course, to reduce all computer interactions to this simple set as a transition mechanism to enable people to learn to use computers without much training. Read the entire page →
From page 79... ... The choice of language natural, restricted, or direct manipulation influences but does not dictate the technologies discussed here. The exception is the subsection, "Natural Language Processing," which also encompasses the language layer of the model and discusses how choices along a spectrum from fully natural languages to relatively restricted languages influence the performance of various expression modes, particularly speech input. Read the entire page →
From page 80... ... The gains have come from the convergence of several technologies: higher-accuracy continuous speech recognition based on better speech modeling techniques, better recognition search strategies that reduce the time needed for high-accuracy recognition, and increased power of audio-capable, off-the-shelf workstations. As a result of these advances, real-time, speaker-independent, continuous speech recognition, with vo Read the entire page →
From page 81... ... Word error rates for speaker-independent continuous speech recognition vary a great deal, depending on the difficulty of the task: from less than 0.3 percent for connected digits, to 3 percent for a 2,500-word travel information task, to 10 percent for articles read from the Wall Street Journal, to 27 percent for transcription of broadcast news programs, to 40 percent for conversational speech over the telephone. Although word error rates in the laboratory can be quite small for some tasks, error rates can increase by a factor of four or more when the same systems are used in the field. Read the entire page →
From page 82... ... (Several vendors have been shipping speech recognition capabilities with personal computers, but there is little evidence of wide usage.) Optimism for general use of speech technologies comes from the facts that performance levels are continuing to improve and that many applications do not require large vocabulary sizes. Read the entire page →
From page 83... ... As a barometer of how much progress we may need for certain advanced applications, experiments have shown that human speech recognition performance is still at least an order of magnitude better than that of machines. One optimistic note, however, is that commercialization of the technology is proceeding very vigorously and is lagging the corresponding research capabilities by only a few years, so that any advances in the laboratory can be expected to appear on the market with a delay of only a few years. Read the entire page →
From page 84... ... For those people, alternate means of verification will be necessary if they are to use systems that rely on voice verification. Alternate Keying/Typing Approaches: Strategies and Accelerators As speech recognition becomes accurate and reliable, it will play a much larger role in future interface systems than it does today. Read the entire page →
From page 85... ... However, it is not clear what the best techniques are for combining these input techniques for using keyboard input in connection with speech and other virtual reality and gestural input systems. What is the best way to use a minimalist keyboard with a voice response system either in a keyin/voice-out paradigm or to help handle error correction in voice recognition systems? Read the entire page →
From page 86... ... In the current practice, several hundred rules may need to be hand-coded for a new application, even in a limited domain.~3 In the early l990s, NLP took several new directions, largely at the instigation of a succession of DARPA program managers. First, after years of working in parallel, researchers in speech recognition and NLP were encouraged to construct integrated speech understanding systems, for which the chosen task was to answer spoken queries to databases (e.g., of air travel information) Read the entire page →
From page 87... ... The domain specificity of rule-based NLP systems suggests that it would be attractive to be able to automatically train an NLP system, as is done with the hidden Markov models used in speech recognition. Significant effort is being devoted to this direction. Read the entire page →
From page 88... ... Also, there is compelling evidence that spoken language systems can have sophisticated models of dialogue and can benefit from them. Gesture Recognition Gesture input can come in many forms from a variety of devices (e.g., mouse, pen, data glove) Read the entire page →
From page 89... ... , machine recognition of American Sign Language gestures is the equivalent of speech recognition for those of us who can speak. Machine Vision and Passive Input Machine vision is likely to play a number of roles in future interface systems. Read the entire page →
From page 90... ... However, advances in artificial intelligence, neural networks, and image processing in combination with large data banks of image information may make it possible in the future to provide verbal interpretation or description for many types of information. A major impetus comes from the desire to make image information Read the entire page →
From page 91... ... Visual Displays Visual display progress begins with the screen design (graphics, layouts, icons, metaphors, widget sets, animation, color, fisheye views, overviews, zooming) and other aspects of how information is visualized. Read the entire page →
From page 92... ... Less futuristic displays still have a long way to go to enable natural-appearing virtual reality (VR) Read the entire page →
From page 93... ... Low-frequency sound can vibrate the user's body to somewhat simulate physical displacement. Speakers and headphones as output devices for synthesized sound match the ears well, unlike the case with visual displays. Read the entire page →
From page 94... ... Haptic and Tactile Displays Human touch is achieved by the parallel operation of many sensor systems in the body (Kandel and Schwartz, 1981~. The hand alone has 19 bones, 19 joints, and 20 muscles with 22 degrees of freedom and many classes of receptors and nerve endings in the joints, skin, tendons, and muscles. Read the entire page →
From page 95... ... Most of this new computing power has supported enriched high-bandwidth user interfaces. Haptics is a sensory/motor interaction modality that is just now being exploited in the quest for seamless interaction with computers. Read the entire page →
From page 96... ... These key areas include the following: � Better understanding of the biomechanics of human interaction with haptic displays. For example, stability of the haptic interaction goes beyond the traditional control analysis to include simulated geometry and nonlinear time-varying properties of human biomechanics. Read the entire page →
From page 97... ... Research is necessary now to provide the intellectual capital upon that such an industry can be based. Tactile Displays for Low- or No-Vision Environments or Users Tactile displays can help add realism to multisensory virtual reality environments. Read the entire page →
From page 98... ... Vibration has been used for adding realism to movies and virtual reality environments and also as a signaling technique for people with hearing impairments. It can be used for alarm clocks or doorbells, but is limited in the information it can present even when different frequencies are used for different signals. Read the entire page →
From page 99... ... An even better solution, both for blind people and for virtual reality applications, would be a glove that somehow provided both full tactile sensation over the palm and fingertips and force feedback. Elements of this have been demonstrated, but nothing approaching full tactile sensation or any free-field force feedback. Read the entire page →
From page 100... ... Virtual reality involves the integration of multiple input and output technologies into an immersive experience that, ideally, will permit people to interact with systems as naturally as they do with real-world places and objects. Multimodal Interfaces People effortlessly integrate information gathered across modalities during conversational interactions. Read the entire page →
From page 101... ... Extremely wide variation in human sensory motor abilities can be accommodated without changing the user interface for people without disabilities. For example, by providing a "touch and hear" feature, a kiosk can be made usable by individuals who cannot read or by those who have low vision. Read the entire page →
From page 102... ... By adding interface enhancements such as these, it is possible to create a single public kiosk that looks and operates like any traditional touchscreen kiosk but is also accessible and usable by individuals who cannot read, who have low vision, who are blind, who are hearing impaired, who are deaf, who have physical disabilities, who are paralyzed, or who are deaf and blind. Kiosks with flexible userconfigurable interfaces have been distributed in Minnesota (including the Mall of America) Read the entire page →
From page 103... ... In this context, input and output devices with more than 2 degrees of freedom are being developed to support true direct manipulation of objects, as opposed to the indirect control provided by two- and three-dimensional widgets, and user interfaces appear to require support for many degrees of freedom, higher-bandwidth input and output, real-time response, continuous response and feedback, probabalistic input, and multiple simultaneous input and output streams from multiple users (Herndon et al., 1994~. Note that virtual reality also expands on the challenges posed by speech synthesis to include synthesis of arbitrary sounds, a problem Read the entire page →
From page 104... ... Virtual reality technology, deriving from 30 years of government and industry funding, will see its cost plummet as development is amortized over millions of chip sets, allowing it to come into the mainstream. Initially, the software for these new chips will be crafted and optimized by legions of video game programmers driven by teenage mass-market consumption of the best play and graphics attainable. Read the entire page →
From page 105... ... Currently, the personal computer clone is the universal input/output adapter because of its open architecture and the availability of cheap mail-order input/output devices, but many personal computers, each doing one filtering task, trying to communicate with one another on serial lines, are not directly adaptable to the ECI set of needs. Both software and hardware need to be provided in a form that allows "plug and play." Custom chip sets will drive the cost down to consumer level; adapting video game input/ output devices where possible will help in achieving similar price performance improvements as computing itself. Read the entire page →
From page 106... ... Possible solutions are wearable chord keyboards, voice recognition, and gesture recognition. Issues include whether training will be essential, ranging from the effort needed to learn a video game or new word processor to that required to play a musical instrument or to drive a bulldozer. Read the entire page →
From page 107... ... (This is probably the least of the problems because the microprocessor industry, having nearly achieved the capability of 1990 vintage Crays in single chips, is now ganging them together by fours and eights into packages.) Gigaflop personal computers are close; teraflop desktop units are clearly on the horizon as massive parallelism becomes understood. Read the entire page →
From page 108... ... and the "point and click" Web browser. These are so widely accepted and accessible to all kinds of people that they can already be regarded as "almost" every-citizen user interfaces. Read the entire page →
From page 109... ... , adaptation to the increasing skill of a user in features such as multiple windows and navigation speed, and adapting to a variety of devices and communication resources that will offer more or less processing power and communications performance. The Network Hierarchy and How It Affects User End-to-End Performance Among the elements of communications infrastructure that affect performance, the access network is one among several network elements (including networking in the local area of the user and networking within the public network) Read the entire page →
From page 110... ... A residence will be able to simultaneously operate not only several human-oriented user interfaces in personal computers, heating/cooling and appliance controls, light switches, communicating pocket calendars and watches, and so on, but also user interfaces used by such devices as furnaces, garage doors, and washing machines. The introduction of IPv6 in the next decade will create an extremely large pool of Internet addresses, allowing each human being in the world to own hundreds or thousands of them. Read the entire page →
From page 111... ... : telephone company services via the twisted-pair subscriber line, cable company services via a coaxial cable (coax) feed, wireless access via higher-powered cellular mobile or lower-powered PCS (personal communications services) Read the entire page →
From page 112... ... An Internet service provider offers access service to the Internet and some access facilities such as TCP/IP software, but may not provide the physical pipe into the home. For the moment, the discussion is restricted to access networks that include the physical transmission facilities but returns later to Internet service provider facilities because they have a critical influence on the performance of Web browsers and other Internet-oriented user interfaces. Read the entire page →
From page 113... ... . If the service, including getting started and customer premise setup,25 is done well, the popular conception of Internet service as difficult to get started and unreliable after that could change radically, and the Web browser could indeed become a universal user interface. Read the entire page →
From page 114... ... Satellite services could augment wired facilities to improve the performance of the user interface. In particular, downloading of large information files to proxy servers in nearby network offices or in the end-user's equipment itself would reduce the delays of access to information in distant servers. Read the entire page →
From page 115... ... ATM is already widely deployed in the core network. Research and development on QoS control is already extensive, and further work, on topics such as renegotiation of offered capacity and dynamic user control over QoS, would improve the performance of future user interfaces. Read the entire page →
From page 116... ... Although they do not, in general, provide the access transmission facilities, Internet service providers do supply other access facilities that have a large influence on the performance of user interfaces. These include at least the following: � Adequate modem pools and fast log-on for dial-up service; � Direct low-level packet interconnection to the Internet, as well as higher-level services such as e-mail, UseNet servers, domain name servers, and proxy Web servers; � Gateway services between Internet telephony and public network telephony (evolving in the near future to multimedia real-time communications) Read the entire page →
From page 117... ... Transportable software also has great potential for "programmable networks" in which communications protocols and services are not fixed but can be changed on user request by sending the appropriate applets to network elements, such as switches, where they execute. This, too, can improve performance where alternative protocols are better matched to applications needs, making the user interface more responsive and pleasant to use. Read the entire page →
From page 118... ... , the DARPA Spoken Language and MUC workshops, and the journals Artificial Intelligence, Computational Linguistics, and Machine Translation. Read the entire page →
From page 119... ... Although ADSL could vastly improve the performance of multimedia user interfaces, it should be recognized and this will hold for the other broadband access mechanisms as well that contention for capacity on networks upstream, and congestion at servers, may also seriously constrain performance. HDSL, which provides symmetric capacity of 1.5 Mbps and up and usually is designed to work over two twisted pair lines, is not generally associated with residential users but could quickly overtake ADSL if households begin to generate high-capacity traf Read the entire page →
From page 120... ... The core network must deploy technologies such as edge switches and access multiplexers that aggregate traffic arriving under various communications protocols, and must closely control QoS parameters for multiswitch toutings. Read the entire page →

From page 71...

... Some involve advances in basic underlying display and interface technologies (higherresolution visual displays, three-dimensional displays, better voice recognition, better tactile displays, and so on)

3 Input/Output Technologies: Current Status and Research Needs Pages 71-120

3 Input/Output Technologies: Current Status and Research Needs
Pages 71-120